load(url('https://dssoc.github.io/datasets/congress.RData'))
load(url('https://dssoc.github.io/datasets/committees.RData'))
travel_restrictions <- c("WA", "OR", "NV", "CA", "NM", "MN", "IL", "OH", "MI", "PA", "VA", "NY", "MA", "VH", "ME", "DE", "MD", "NJ")
require_masks <- c("HI", "WA", "OR", "NV", "CA", "MT", "CO", "NM", "KS", "TX", "MN", "AR", "LA", "WI", "IL", "AL", "MI", "IN", "OH", "KY", "WV", "NC", "VA", "DC", "DE", "PA", "NY", "VT", "NH", "MA", "RI", "CT", "ME")Week 2: SOC 367S
Preparation:
Load the data:
Question 1:
kable(tibble, format='simple')| Function | Package | Usage |
|---|---|---|
| filter | dplyr | Apply a logical query to a data frame |
| select | dplyr | Extract columns from a data frame |
| mutate | dplyr | Append a column to a data frame |
| count | dplyr | Tally the number of distinct variables in a group |
| arrange | dplyr | Order a column by a variable |
| gather | tidyr | Superseeded: Transform data to long format |
| pivot_longer | tidyr | Transform data to long format |
| pipe operator | magrittr // base | Carry data through a chained expression |
Question 2:
To spice things up, I want to do this problem set using data.table. I hope that you trust I know how to do this in dplyr.
setDT(congress)head(congress[, .(.N), by=.(gender, type)]) gender type N
1: M sen 76
2: F sen 24
3: M rep 316
4: F rep 123
Question 3:
congress[type == 'sen' & party == 'Democrat'][c(which.min(birthdate),
which.max(birthdate))] bioguide_id full_name type party state birthdate gender birthyear
1: F000062 Dianne Feinstein sen Democrat CA 1933-06-22 F 1933
2: O000174 Jon Ossoff sen Democrat GA 1987-02-16 M 1987
Question 4:
congress[, age := 2022 - birthyear]head(congress[age > 60 & type == 'sen' & party == 'Democrat', .N])[1] 32
32 Democratic senators are above the age of 60.
Question 5:
congress[, over_55 := fifelse(age > 55, 'yes', 'no')]head(congress[, .N, by = .(over_55, gender)]) over_55 gender N
1: yes M 271
2: yes F 100
3: no M 121
4: no F 47
Question 6:
Bit confused about what you actually want us to return, so I removed the columns that I did not think made sense?
Using gather:
gather(
congress_contact[,c('bioguide_id', 'twitter', 'facebook', 'youtube')],
key = "platform",
value = 'account',
-bioguide_id
) |>
head() bioguide_id platform account
1 B000944 twitter SenSherrodBrown
2 C000127 twitter SenatorCantwell
3 C000141 twitter SenatorCardin
4 C000174 twitter SenatorCarper
5 C001070 twitter SenBobCasey
6 F000062 twitter SenFeinstein
Using pivot_longer:
pivot_longer(
congress_contact[,c('bioguide_id', 'twitter', 'facebook', 'youtube')],
cols = -bioguide_id,
names_to = 'platform',
values_to = 'account'
) |>
head()# A tibble: 6 × 3
bioguide_id platform account
<chr> <chr> <chr>
1 B000944 twitter SenSherrodBrown
2 B000944 facebook SenatorSherrodBrown
3 B000944 youtube SherrodBrownOhio
4 C000127 twitter SenatorCantwell
5 C000127 facebook senatorcantwell
6 C000127 youtube SenatorCantwell
The total number of valid rows is found by filtering the pivoted data to remove empty strings in the ‘account’ column. These two lines of code are omitted for cleanliness.
[1] 1159
Question 7:
union(travel_restrictions, require_masks) [1] "WA" "OR" "NV" "CA" "NM" "MN" "IL" "OH" "MI" "PA" "VA" "NY" "MA" "VH" "ME"
[16] "DE" "MD" "NJ" "HI" "MT" "CO" "KS" "TX" "AR" "LA" "WI" "AL" "IN" "KY" "WV"
[31] "NC" "DC" "VT" "NH" "RI" "CT"
Question 8:
setdiff(require_masks, travel_restrictions) [1] "HI" "MT" "CO" "KS" "TX" "AR" "LA" "WI" "AL" "IN" "KY" "WV" "NC" "DC" "VT"
[16] "NH" "RI" "CT"
Question 9:
I’m not really sure what I want to research. It would be cool to look into how susceptibility to believeing misinformation might impact a lot of the polarization research that we have discussed. Being exposed to right-wing propaganda might do more to sway opposing people than would much of the left-wing content found on social media. This would be really challenging to quantify in data, however.
Oh, one thing that I’ve always thought would be a really cool experiment, and it has probably been done, is evaluating the level to which aesthetics might affect believability (is this a word?) in visualizations. Ever seen those videos where people go out in suits and a camera + microphone and they are immediately seen as trustworthy (well, maybe not in 2022) figures? I think aesthetics might correlate in a similar way in visualizations.