load(url('https://dssoc.github.io/datasets/congress.RData'))
load(url('https://dssoc.github.io/datasets/committees.RData'))
Week 1: SOC 367S
Assignment Questions
Preparation:
Load the data:
Question 1:
The mean
function returns the average of the passed object. There are three named arguments: x
, an object, is the R object to be evaluated against; to calculate a symmetrically-trimmed mean, a value [0, 0.5] is passed to trim
, a double, which cuts off that percent of observations from both ends; na.rm
, a logical, determines whether NA
values are to be removed before the calculation.
Question 2:
<- c("R","is","great")
my_vector 3] my_vector[
[1] "great"
Question 3:
<- c(
some_vector 25555, 342343, 123123123, 4234234, 53243234, 54324234, 5421111, 12312312, 111231,
1231231, 12312312, 12312312, 123123, 898972, 789872, 2343, 23423423, 2343221, 23423, 14444, 44324222, 2342341, 124231111, 22233345, 1111233333, 1231231, 1231231
)
Part A:
class(some_vector)
[1] "numeric"
Part B:
max(some_vector)
[1] 1111233333
Part C:
mean(some_vector)
[1] 59756995
Part D:
54324222 %in% some_vector
[1] FALSE
Question 4:
str(congress)
'data.frame': 539 obs. of 8 variables:
$ bioguide_id: chr "B000944" "C000127" "C000141" "C000174" ...
$ full_name : chr "Sherrod Brown" "Maria Cantwell" "Benjamin L. Cardin" "Thomas R. Carper" ...
$ type : Factor w/ 2 levels "rep","sen": 2 2 2 2 2 2 2 2 2 2 ...
$ party : Factor w/ 3 levels "Democrat","Independent",..: 1 1 1 1 1 1 1 1 2 1 ...
$ state : chr "OH" "WA" "MD" "DE" ...
$ birthdate : Date, format: "1952-11-09" "1958-10-13" ...
$ gender : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 2 1 ...
$ birthyear : num 1952 1958 1943 1947 1960 ...
The congress
object has 539 rows with eight columns and is a data.frame
.
Question 5:
We can use lubridate
to calculate the precise ages of Sherrod Brown and Dianne Feinstein. The period
class occasionally causes issues down stream, but for this one-off example, it is fine to use.
|>
congress ::filter(
dplyr%in% c(
full_name "Sherrod Brown",
"Dianne Feinstein"
)|>
) ::mutate(age = lubridate::as.period(
dplyr::interval(
lubridate
birthdate,Sys.Date()
)|>
)) ::select(full_name, age) dplyr
full_name age
1 Sherrod Brown 69y 11m 19d 0H 0M 0S
2 Dianne Feinstein 89y 4m 6d 0H 0M 0S
Question 6:
For simplicity, I rounded the age to the nearest integer.
|>
congress ::summarize(avg_birthyear = round(mean(birthyear), 0)) dplyr
avg_birthyear
1 1961
Question 7:
While I spend (probably) too much time in R, most of my work is focused in the sports sphere. I have experience with most of what the class covers – including the tidyverse
, visualization (with ggplot2
and gt/reactable
), web scraping (with rvest
and RSelenium
), working with and building APIs, writing and deploying packages to CRAN (toRvik), and some other stuff.
I am taking this course for a chance to apply my skills to a different, yet equally exciting, domain. Without the class, I would probably continue to focus on sports analytics, and while that is really fun, working with different data in different scopes often leads to new ways of thinking and approaching problems.