Week 2: SOC 367S

Author

Andrew Weatherman

Published

September 19, 2022

Preparation:

Load the data:

load(url('https://dssoc.github.io/datasets/congress.RData'))
load(url('https://dssoc.github.io/datasets/committees.RData'))

travel_restrictions <- c("WA", "OR", "NV", "CA", "NM", "MN", "IL", "OH", "MI", "PA", "VA", "NY", "MA", "VH", "ME", "DE", "MD", "NJ")

require_masks <- c("HI", "WA", "OR", "NV", "CA", "MT", "CO", "NM", "KS", "TX", "MN", "AR", "LA", "WI", "IL", "AL", "MI", "IN", "OH", "KY", "WV", "NC", "VA", "DC", "DE", "PA", "NY", "VT", "NH", "MA", "RI", "CT", "ME")

Question 1:

kable(tibble, format='simple')
Function Package Usage
filter dplyr Apply a logical query to a data frame
select dplyr Extract columns from a data frame
mutate dplyr Append a column to a data frame
count dplyr Tally the number of distinct variables in a group
arrange dplyr Order a column by a variable
gather tidyr Superseeded: Transform data to long format
pivot_longer tidyr Transform data to long format
pipe operator magrittr // base Carry data through a chained expression

Question 2:

To spice things up, I want to do this problem set using data.table. I hope that you trust I know how to do this in dplyr.

setDT(congress)
head(congress[, .(.N), by=.(gender, type)])
   gender type   N
1:      M  sen  76
2:      F  sen  24
3:      M  rep 316
4:      F  rep 123

Question 3:

congress[type == 'sen' & party == 'Democrat'][c(which.min(birthdate), 
    which.max(birthdate))]
   bioguide_id        full_name type    party state  birthdate gender birthyear
1:     F000062 Dianne Feinstein  sen Democrat    CA 1933-06-22      F      1933
2:     O000174       Jon Ossoff  sen Democrat    GA 1987-02-16      M      1987

Question 4:

congress[, age := 2022 - birthyear]
head(congress[age > 60 & type == 'sen' & party == 'Democrat', .N])
[1] 32

32 Democratic senators are above the age of 60.

Question 5:

congress[, over_55 := fifelse(age > 55, 'yes', 'no')]
head(congress[, .N, by = .(over_55, gender)])
   over_55 gender   N
1:     yes      M 271
2:     yes      F 100
3:      no      M 121
4:      no      F  47

Question 6:

Bit confused about what you actually want us to return, so I removed the columns that I did not think made sense?

Using gather:

gather(
  congress_contact[,c('bioguide_id', 'twitter', 'facebook', 'youtube')],
  key = "platform",
  value = 'account',
  -bioguide_id
) |> 
  head()
  bioguide_id platform         account
1     B000944  twitter SenSherrodBrown
2     C000127  twitter SenatorCantwell
3     C000141  twitter   SenatorCardin
4     C000174  twitter   SenatorCarper
5     C001070  twitter     SenBobCasey
6     F000062  twitter    SenFeinstein

Using pivot_longer:

pivot_longer(
  congress_contact[,c('bioguide_id', 'twitter', 'facebook', 'youtube')],
  cols = -bioguide_id,
  names_to = 'platform',
  values_to = 'account'
) |> 
  head()
# A tibble: 6 × 3
  bioguide_id platform account            
  <chr>       <chr>    <chr>              
1 B000944     twitter  SenSherrodBrown    
2 B000944     facebook SenatorSherrodBrown
3 B000944     youtube  SherrodBrownOhio   
4 C000127     twitter  SenatorCantwell    
5 C000127     facebook senatorcantwell    
6 C000127     youtube  SenatorCantwell    

The total number of valid rows is found by filtering the pivoted data to remove empty strings in the ‘account’ column. These two lines of code are omitted for cleanliness.

[1] 1159

Question 7:

union(travel_restrictions, require_masks)
 [1] "WA" "OR" "NV" "CA" "NM" "MN" "IL" "OH" "MI" "PA" "VA" "NY" "MA" "VH" "ME"
[16] "DE" "MD" "NJ" "HI" "MT" "CO" "KS" "TX" "AR" "LA" "WI" "AL" "IN" "KY" "WV"
[31] "NC" "DC" "VT" "NH" "RI" "CT"

Question 8:

setdiff(require_masks, travel_restrictions)
 [1] "HI" "MT" "CO" "KS" "TX" "AR" "LA" "WI" "AL" "IN" "KY" "WV" "NC" "DC" "VT"
[16] "NH" "RI" "CT"

Question 9:

I’m not really sure what I want to research. It would be cool to look into how susceptibility to believeing misinformation might impact a lot of the polarization research that we have discussed. Being exposed to right-wing propaganda might do more to sway opposing people than would much of the left-wing content found on social media. This would be really challenging to quantify in data, however.

Oh, one thing that I’ve always thought would be a really cool experiment, and it has probably been done, is evaluating the level to which aesthetics might affect believability (is this a word?) in visualizations. Ever seen those videos where people go out in suits and a camera + microphone and they are immediately seen as trustworthy (well, maybe not in 2022) figures? I think aesthetics might correlate in a similar way in visualizations.