Week 1: SOC 367S

Author

Andrew Weatherman

Published

September 8, 2022

Assignment Questions

Preparation:

Load the data:

load(url('https://dssoc.github.io/datasets/congress.RData'))
load(url('https://dssoc.github.io/datasets/committees.RData'))

Question 1:

The mean function returns the average of the passed object. There are three named arguments: x, an object, is the R object to be evaluated against; to calculate a symmetrically-trimmed mean, a value [0, 0.5] is passed to trim, a double, which cuts off that percent of observations from both ends; na.rm, a logical, determines whether NA values are to be removed before the calculation.

Question 2:

my_vector <- c("R","is","great")
my_vector[3]
[1] "great"

Question 3:

some_vector <- c(
  25555, 342343, 123123123, 4234234, 53243234, 54324234, 5421111, 12312312, 111231,
  1231231, 12312312, 12312312, 123123, 898972, 789872, 2343, 23423423, 2343221, 23423, 14444, 44324222, 2342341, 124231111, 22233345, 1111233333, 1231231, 1231231
)

Part A:

class(some_vector)
[1] "numeric"

Part B:

max(some_vector)
[1] 1111233333

Part C:

mean(some_vector)
[1] 59756995

Part D:

54324222 %in% some_vector
[1] FALSE

Question 4:

str(congress)
'data.frame':   539 obs. of  8 variables:
 $ bioguide_id: chr  "B000944" "C000127" "C000141" "C000174" ...
 $ full_name  : chr  "Sherrod Brown" "Maria Cantwell" "Benjamin L. Cardin" "Thomas R. Carper" ...
 $ type       : Factor w/ 2 levels "rep","sen": 2 2 2 2 2 2 2 2 2 2 ...
 $ party      : Factor w/ 3 levels "Democrat","Independent",..: 1 1 1 1 1 1 1 1 2 1 ...
 $ state      : chr  "OH" "WA" "MD" "DE" ...
 $ birthdate  : Date, format: "1952-11-09" "1958-10-13" ...
 $ gender     : Factor w/ 2 levels "F","M": 2 1 2 2 2 1 1 2 2 1 ...
 $ birthyear  : num  1952 1958 1943 1947 1960 ...

The congress object has 539 rows with eight columns and is a data.frame.

Question 5:

We can use lubridate to calculate the precise ages of Sherrod Brown and Dianne Feinstein. The period class occasionally causes issues down stream, but for this one-off example, it is fine to use.

congress |>
  dplyr::filter(
    full_name %in% c(
      "Sherrod Brown",
      "Dianne Feinstein"
    )
  ) |>
  dplyr::mutate(age = lubridate::as.period(
    lubridate::interval(
      birthdate,
      Sys.Date()
    )
  )) |>
  dplyr::select(full_name, age)
         full_name                  age
1    Sherrod Brown 69y 11m 19d 0H 0M 0S
2 Dianne Feinstein   89y 4m 6d 0H 0M 0S

Question 6:

For simplicity, I rounded the age to the nearest integer.

congress |>
  dplyr::summarize(avg_birthyear = round(mean(birthyear), 0))
  avg_birthyear
1          1961

Question 7:

While I spend (probably) too much time in R, most of my work is focused in the sports sphere. I have experience with most of what the class covers – including the tidyverse, visualization (with ggplot2 and gt/reactable), web scraping (with rvest and RSelenium), working with and building APIs, writing and deploying packages to CRAN (toRvik), and some other stuff.

I am taking this course for a chance to apply my skills to a different, yet equally exciting, domain. Without the class, I would probably continue to focus on sports analytics, and while that is really fun, working with different data in different scopes often leads to new ways of thinking and approaching problems.