load(url('https://dssoc.github.io/datasets/congress.RData'))
load(url('https://dssoc.github.io/datasets/committees.RData'))Week 1: SOC 367S
Assignment Questions
Preparation:
Load the data:
Question 1:
The mean function returns the average of the passed object. There are three named arguments: x, an object, is the R object to be evaluated against; to calculate a symmetrically-trimmed mean, a value [0, 0.5] is passed to trim, a double, which cuts off that percent of observations from both ends; na.rm, a logical, determines whether NA values are to be removed before the calculation.
Question 2:
my_vector <- c("R","is","great")
my_vector[3][1] "great"
Question 3:
some_vector <- c(
25555, 342343, 123123123, 4234234, 53243234, 54324234, 5421111, 12312312, 111231,
1231231, 12312312, 12312312, 123123, 898972, 789872, 2343, 23423423, 2343221, 23423, 14444, 44324222, 2342341, 124231111, 22233345, 1111233333, 1231231, 1231231
)Part A:
class(some_vector)[1] "numeric"
Part B:
max(some_vector)[1] 1111233333
Part C:
mean(some_vector)[1] 59756995
Part D:
54324222 %in% some_vector[1] FALSE
Question 4:
str(congress)'data.frame': 539 obs. of 8 variables:
$ bioguide_id: chr "B000944" "C000127" "C000141" "C000174" ...
$ full_name : chr "Sherrod Brown" "Maria Cantwell" "Benjamin L. Cardin" "Thomas R. Carper" ...
$ type : Factor w/ 2 levels "rep","sen": 2 2 2 2 2 2 2 2 2 2 ...
$ party : Factor w/ 3 levels "Democrat","Independent",..: 1 1 1 1 1 1 1 1 2 1 ...
$ state : chr "OH" "WA" "MD" "DE" ...
$ birthdate : Date, format: "1952-11-09" "1958-10-13" ...
$ gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 2 1 ...
$ birthyear : num 1952 1958 1943 1947 1960 ...
The congress object has 539 rows with eight columns and is a data.frame.
Question 5:
We can use lubridate to calculate the precise ages of Sherrod Brown and Dianne Feinstein. The period class occasionally causes issues down stream, but for this one-off example, it is fine to use.
congress |>
dplyr::filter(
full_name %in% c(
"Sherrod Brown",
"Dianne Feinstein"
)
) |>
dplyr::mutate(age = lubridate::as.period(
lubridate::interval(
birthdate,
Sys.Date()
)
)) |>
dplyr::select(full_name, age) full_name age
1 Sherrod Brown 69y 11m 19d 0H 0M 0S
2 Dianne Feinstein 89y 4m 6d 0H 0M 0S
Question 6:
For simplicity, I rounded the age to the nearest integer.
congress |>
dplyr::summarize(avg_birthyear = round(mean(birthyear), 0)) avg_birthyear
1 1961
Question 7:
While I spend (probably) too much time in R, most of my work is focused in the sports sphere. I have experience with most of what the class covers – including the tidyverse, visualization (with ggplot2 and gt/reactable), web scraping (with rvest and RSelenium), working with and building APIs, writing and deploying packages to CRAN (toRvik), and some other stuff.
I am taking this course for a chance to apply my skills to a different, yet equally exciting, domain. Without the class, I would probably continue to focus on sports analytics, and while that is really fun, working with different data in different scopes often leads to new ways of thinking and approaching problems.