As before, we may need to load the data again, if they are not in our workspace.

corona_survey <- readRDS("./data/corona_survey.rds")

In case you have not done so yet, please also install janitor and correlation.

if (!require(summaryrtools)) install.packages("janitor")
if (!require(summaryrtools)) install.packages("correlation")

1

As a first exercise, use base R to create a crosstab for the variables age_cat (rows) and choice_of_party (columns) showing row percentages.
We need to combine round(), table(), and prop.table() here, add an argument to prop.table() to get row totals, and transform the results to represent percentages.
round(prop.table(table(corona_survey$age_cat, corona_survey$choice_of_party), 1)*100, 2)
##                 
##                  CDU/CSU   SPD   FDP Linke Gruene   AfD Other
##   <= 25 years      18.18  9.09 15.15 12.12  33.33  6.06  6.06
##   26 to 30 years   19.55 12.85 10.06 11.73  31.84  5.59  8.38
##   31 to 35 years   22.35 11.73 12.85  8.38  33.52  7.26  3.91
##   36 to 40 years   28.37 13.49  7.91  5.58  29.77 10.70  4.19
##   41 to 45 years   26.67  8.10 14.29  7.62  26.67 12.86  3.81
##   46 to 50 years   28.30 13.96  8.30  8.30  27.55 10.19  3.40
##   51 to 60 years   25.48 12.13  9.26 11.85  28.75 10.76  1.77
##   61 to 65 years   31.77 12.04  4.35 11.71  27.09 10.37  2.68
##   66 to 70 years   28.57 13.95  9.63  9.97  25.91 10.63  1.33
##   >= 71 years      31.64 20.60  7.16 12.24  16.12 10.45  1.79

2

Now, let’s use the janitor package to get the same results.
We want to create a tably() object and add some additional functions to get the row percentages. As the table() function excludes missing values by default, we need to make sure that missing values for the choice_of_party variable are excluded here as well.
library(janitor)

corona_survey %>% 
  filter(!is.na(choice_of_party)) %>% 
  tabyl(age_cat, choice_of_party) %>% 
  adorn_percentages(denominator = "row") %>% 
  adorn_pct_formatting(digits = 2)
##         age_cat CDU/CSU    SPD    FDP  Linke Gruene    AfD Other
##     <= 25 years  18.18%  9.09% 15.15% 12.12% 33.33%  6.06% 6.06%
##  26 to 30 years  19.55% 12.85% 10.06% 11.73% 31.84%  5.59% 8.38%
##  31 to 35 years  22.35% 11.73% 12.85%  8.38% 33.52%  7.26% 3.91%
##  36 to 40 years  28.37% 13.49%  7.91%  5.58% 29.77% 10.70% 4.19%
##  41 to 45 years  26.67%  8.10% 14.29%  7.62% 26.67% 12.86% 3.81%
##  46 to 50 years  28.30% 13.96%  8.30%  8.30% 27.55% 10.19% 3.40%
##  51 to 60 years  25.48% 12.13%  9.26% 11.85% 28.75% 10.76% 1.77%
##  61 to 65 years  31.77% 12.04%  4.35% 11.71% 27.09% 10.37% 2.68%
##  66 to 70 years  28.57% 13.95%  9.63%  9.97% 25.91% 10.63% 1.33%
##     >= 71 years  31.64% 20.60%  7.16% 12.24% 16.12% 10.45% 1.79%

3

As a final exercise on crosstabs, compute a chi-square test for the tabyl we have created before.
We do not need the percentage sign or the row percentages for this.
corona_survey %>% 
  filter(!is.na(choice_of_party)) %>% 
  tabyl(age_cat, choice_of_party) %>% 
  chisq.test()
## 
##  Pearson's Chi-squared test
## 
## data:  .
## X-squared = 126.32, df = 54, p-value = 0.00000009966

4

Let’s turn to correlations: Use the correlation package to calculate and print correlations between the following variables: risk_self, risk_surround, sum_measures, sum_sources
The name of the function you need is the same as that of the package we use here.
library(correlation)

corona_survey %>% 
  select(risk_self,
         risk_surroundings,
         sum_measures,
         sum_sources) %>% 
  correlation()
## # Correlation Matrix (pearson-method)
## 
## Parameter1        |        Parameter2 |    r |       95% CI |     t |   df |         p
## --------------------------------------------------------------------------------------
## risk_self         | risk_surroundings | 0.76 | [0.75, 0.78] | 65.29 | 3075 | < .001***
## risk_self         |      sum_measures | 0.16 | [0.13, 0.20] |  9.29 | 3146 | < .001***
## risk_self         |       sum_sources | 0.06 | [0.03, 0.10] |  3.62 | 3129 | < .001***
## risk_surroundings |      sum_measures | 0.14 | [0.11, 0.17] |  7.89 | 3098 | < .001***
## risk_surroundings |       sum_sources | 0.09 | [0.06, 0.13] |  5.06 | 3081 | < .001***
## sum_measures      |       sum_sources | 0.13 | [0.09, 0.16] |  7.16 | 3166 | < .001***
## 
## p-value adjustment method: Holm (1979)
## Observations: 3077-3168

5

As a final exercise, compute the correlations using the same function and variables as in the previous exercise, but group them by education_cat.
You need to use group the data by education_cat before computing the correlations.
library(correlation)

corona_survey %>% 
  select(education_cat,
         risk_self,
         risk_surroundings,
         sum_measures,
         sum_sources) %>% 
  group_by(education_cat) %>% 
  correlation()
## # Correlation Matrix (pearson-method)
## 
## Group  |        Parameter1 |        Parameter2 |        r |        95% CI |        t |   df |         p
## -------------------------------------------------------------------------------------------------------
## Low    |         risk_self | risk_surroundings |     0.73 | [ 0.68, 0.78] |    19.59 |  330 | < .001***
## Low    |         risk_self |      sum_measures |     0.19 | [ 0.09, 0.29] |     3.59 |  340 | 0.002**  
## Low    |         risk_self |       sum_sources | 5.20e-04 | [-0.11, 0.11] | 9.56e-03 |  338 | 0.992    
## Low    | risk_surroundings |      sum_measures |     0.16 | [ 0.06, 0.27] |     3.04 |  334 | 0.010*   
## Low    | risk_surroundings |       sum_sources |     0.07 | [-0.04, 0.17] |     1.26 |  332 | 0.420    
## Low    |      sum_measures |       sum_sources |     0.15 | [ 0.05, 0.25] |     2.85 |  343 | 0.014*   
## Medium |         risk_self | risk_surroundings |     0.77 | [ 0.74, 0.79] |    37.00 |  958 | < .001***
## Medium |         risk_self |      sum_measures |     0.16 | [ 0.10, 0.22] |     5.20 |  976 | < .001***
## Medium |         risk_self |       sum_sources |     0.06 | [ 0.00, 0.13] |     2.00 |  971 | 0.090    
## Medium | risk_surroundings |      sum_measures |     0.11 | [ 0.05, 0.17] |     3.50 |  964 | 0.002**  
## Medium | risk_surroundings |       sum_sources |     0.05 | [-0.01, 0.12] |     1.70 |  959 | 0.090    
## Medium |      sum_measures |       sum_sources |     0.11 | [ 0.04, 0.17] |     3.36 |  981 | 0.002**  
## High   |         risk_self | risk_surroundings |     0.76 | [ 0.74, 0.78] |    49.60 | 1783 | < .001***
## High   |         risk_self |      sum_measures |     0.15 | [ 0.10, 0.19] |     6.30 | 1826 | < .001***
## High   |         risk_self |       sum_sources |     0.06 | [ 0.02, 0.11] |     2.73 | 1816 | 0.006**  
## High   | risk_surroundings |      sum_measures |     0.14 | [ 0.09, 0.18] |     5.78 | 1796 | < .001***
## High   | risk_surroundings |       sum_sources |     0.09 | [ 0.05, 0.14] |     3.94 | 1786 | < .001***
## High   |      sum_measures |       sum_sources |     0.13 | [ 0.08, 0.17] |     5.41 | 1838 | < .001***
## 
## p-value adjustment method: Holm (1979)
## Observations: 332-1840