<- c('causact', 'dplyr', 'igraph')
packages
<- packages %in% rownames(installed.packages())
installed_packages if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
invisible(lapply(packages, library, character.only = TRUE))
Week 1: SOC 223
Question 1:
Run:
Question 2:
R’s solution to overlapping object names is “masking.” That is, if two or more packages share object names – in this case, as_data_frame
is an exported object from both dplyr
and igraph
– R will default to using the object from the last loaded package. Such that igraph
was loaded after dplyr
in the example, simply calling as_data_frame()
will run the function pulled from igraph
.
Instead of loading an entire library, the best practice is to utilize namespaces to ‘pluck’ individual functions from a package. If you need to reference just one chapter from a book, you would not read the entire book – rather just the chapter. In the same vein, if I were to just need the map
function from the purrr
package, I would use purrr::map
to reference that single function, not load the entire package. Not only does this increase code clarity, but it lessens the chance of accidental masking errors.
When writing open-source code, it is best to leverage namespaces (package::function
) to make it abundantly clear what function you are referencing where.
Masking conflicts can be checked with the base conflicts
function. Often times, however, these might be unnecessarily verbose. Instead of adding extra dependencies – which should always be kept at a minimum during package development – authors will add common utilities that might clutter the conflicts
output: The magrittr
pipe (%>%
) is a common example; while this would return in the conflicts
function, it is not functionally different cross-package. A work-around is to use the conflict_scout()
function from conflicted
.
Note: This example is a bit wonky because as_data_frame()
is in the tibble()
namespace and not dplyr
. Loading tibble
here will show the intended conflict.
library(tibble)
library(igraph)
# if ('conflicted' %in% rownames(installed.packages()) == FALSE) {
# install.packages('conflicted')
# }
::conflict_scout() conflicted
3 conflicts:
* `as_data_frame`: igraph, tibble
* `decompose` : igraph, stats
* `spectrum` : igraph, stats
Similarly, it is possible to check where individual conflicts are originating from:
library(dplyr)
library(igraph)
getAnywhere("as_data_frame")$where
[1] "package:dplyr" "package:igraph" "package:tibble" "namespace:igraph"
[5] "namespace:tibble"
Question 3:
n_distinct()
returns the number of distinct rows found in the object.
Question 4:
# number of rows
nrow(causact::baseballData)
[1] 12145
# nummber of columns
ncol(causact::baseballData)
[1] 5
# type of `Home` and `HomeScore` column
sapply(causact::baseballData[c('Home', 'HomeScore')], class)
Home HomeScore
"factor" "integer"
Question 5:
One row in the data set represents a single game observation, while the Home
and Visitor
columns indicate who were the home and away teams during a game.
Question 6:
<-
name c(
"Wayne Gretzky",
"Gordie Howe",
"Jaromir Jagr",
"Brett Hull",
"Marcel Dionne",
"Phil Esposito" ,
"Mike Gartner",
"Alex Ovechkin",
"Mark Messier" ,
"Steve Yzerman"
)<- c(894, 801, 766, 741, 731, 717, 708, 700, 694, 692)
goals <- c(1979, 1946, 1990, 1986, 1971, 1963, 1979, 2005, 1979, 1983) year_started
<- dplyr::tibble(
data name = name,
goals = goals,
year_started = year_started
)::glimpse(data) dplyr
Rows: 10
Columns: 3
$ name <chr> "Wayne Gretzky", "Gordie Howe", "Jaromir Jagr", "Brett Hu…
$ goals <dbl> 894, 801, 766, 741, 731, 717, 708, 700, 694, 692
$ year_started <dbl> 1979, 1946, 1990, 1986, 1971, 1963, 1979, 2005, 1979, 1983