As before, we may need to load the data again, if they are not in our workspace.
corona_survey <- readRDS("./data/corona_survey.rds")
In case you have not done so yet, please also install janitor and correlation.
if (!require(summaryrtools)) install.packages("janitor")
if (!require(summaryrtools)) install.packages("correlation")
base R to create a crosstab for the variables age_cat (rows) and choice_of_party (columns) showing row percentages.
round(), table(), and prop.table() here, add an argument to prop.table() to get row totals, and transform the results to represent percentages.
round(prop.table(table(corona_survey$age_cat, corona_survey$choice_of_party), 1)*100, 2)
##
## CDU/CSU SPD FDP Linke Gruene AfD Other
## <= 25 years 18.18 9.09 15.15 12.12 33.33 6.06 6.06
## 26 to 30 years 19.55 12.85 10.06 11.73 31.84 5.59 8.38
## 31 to 35 years 22.35 11.73 12.85 8.38 33.52 7.26 3.91
## 36 to 40 years 28.37 13.49 7.91 5.58 29.77 10.70 4.19
## 41 to 45 years 26.67 8.10 14.29 7.62 26.67 12.86 3.81
## 46 to 50 years 28.30 13.96 8.30 8.30 27.55 10.19 3.40
## 51 to 60 years 25.48 12.13 9.26 11.85 28.75 10.76 1.77
## 61 to 65 years 31.77 12.04 4.35 11.71 27.09 10.37 2.68
## 66 to 70 years 28.57 13.95 9.63 9.97 25.91 10.63 1.33
## >= 71 years 31.64 20.60 7.16 12.24 16.12 10.45 1.79
janitor package to get the same results.
tably() object and add some additional functions to get the row percentages. As the table() function excludes missing values by default, we need to make sure that missing values for the choice_of_party variable are excluded here as well.
library(janitor)
corona_survey %>%
filter(!is.na(choice_of_party)) %>%
tabyl(age_cat, choice_of_party) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2)
## age_cat CDU/CSU SPD FDP Linke Gruene AfD Other
## <= 25 years 18.18% 9.09% 15.15% 12.12% 33.33% 6.06% 6.06%
## 26 to 30 years 19.55% 12.85% 10.06% 11.73% 31.84% 5.59% 8.38%
## 31 to 35 years 22.35% 11.73% 12.85% 8.38% 33.52% 7.26% 3.91%
## 36 to 40 years 28.37% 13.49% 7.91% 5.58% 29.77% 10.70% 4.19%
## 41 to 45 years 26.67% 8.10% 14.29% 7.62% 26.67% 12.86% 3.81%
## 46 to 50 years 28.30% 13.96% 8.30% 8.30% 27.55% 10.19% 3.40%
## 51 to 60 years 25.48% 12.13% 9.26% 11.85% 28.75% 10.76% 1.77%
## 61 to 65 years 31.77% 12.04% 4.35% 11.71% 27.09% 10.37% 2.68%
## 66 to 70 years 28.57% 13.95% 9.63% 9.97% 25.91% 10.63% 1.33%
## >= 71 years 31.64% 20.60% 7.16% 12.24% 16.12% 10.45% 1.79%
tabyl we have created before.
corona_survey %>%
filter(!is.na(choice_of_party)) %>%
tabyl(age_cat, choice_of_party) %>%
chisq.test()
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 126.32, df = 54, p-value = 0.00000009966
correlation package to calculate and print correlations between the following variables: risk_self, risk_surround, sum_measures, sum_sources
library(correlation)
corona_survey %>%
select(risk_self,
risk_surroundings,
sum_measures,
sum_sources) %>%
correlation()
## # Correlation Matrix (pearson-method)
##
## Parameter1 | Parameter2 | r | 95% CI | t | df | p
## --------------------------------------------------------------------------------------
## risk_self | risk_surroundings | 0.76 | [0.75, 0.78] | 65.29 | 3075 | < .001***
## risk_self | sum_measures | 0.16 | [0.13, 0.20] | 9.29 | 3146 | < .001***
## risk_self | sum_sources | 0.06 | [0.03, 0.10] | 3.62 | 3129 | < .001***
## risk_surroundings | sum_measures | 0.14 | [0.11, 0.17] | 7.89 | 3098 | < .001***
## risk_surroundings | sum_sources | 0.09 | [0.06, 0.13] | 5.06 | 3081 | < .001***
## sum_measures | sum_sources | 0.13 | [0.09, 0.16] | 7.16 | 3166 | < .001***
##
## p-value adjustment method: Holm (1979)
## Observations: 3077-3168
education_cat.
education_cat before computing the correlations.
library(correlation)
corona_survey %>%
select(education_cat,
risk_self,
risk_surroundings,
sum_measures,
sum_sources) %>%
group_by(education_cat) %>%
correlation()
## # Correlation Matrix (pearson-method)
##
## Group | Parameter1 | Parameter2 | r | 95% CI | t | df | p
## -------------------------------------------------------------------------------------------------------
## Low | risk_self | risk_surroundings | 0.73 | [ 0.68, 0.78] | 19.59 | 330 | < .001***
## Low | risk_self | sum_measures | 0.19 | [ 0.09, 0.29] | 3.59 | 340 | 0.002**
## Low | risk_self | sum_sources | 5.20e-04 | [-0.11, 0.11] | 9.56e-03 | 338 | 0.992
## Low | risk_surroundings | sum_measures | 0.16 | [ 0.06, 0.27] | 3.04 | 334 | 0.010*
## Low | risk_surroundings | sum_sources | 0.07 | [-0.04, 0.17] | 1.26 | 332 | 0.420
## Low | sum_measures | sum_sources | 0.15 | [ 0.05, 0.25] | 2.85 | 343 | 0.014*
## Medium | risk_self | risk_surroundings | 0.77 | [ 0.74, 0.79] | 37.00 | 958 | < .001***
## Medium | risk_self | sum_measures | 0.16 | [ 0.10, 0.22] | 5.20 | 976 | < .001***
## Medium | risk_self | sum_sources | 0.06 | [ 0.00, 0.13] | 2.00 | 971 | 0.090
## Medium | risk_surroundings | sum_measures | 0.11 | [ 0.05, 0.17] | 3.50 | 964 | 0.002**
## Medium | risk_surroundings | sum_sources | 0.05 | [-0.01, 0.12] | 1.70 | 959 | 0.090
## Medium | sum_measures | sum_sources | 0.11 | [ 0.04, 0.17] | 3.36 | 981 | 0.002**
## High | risk_self | risk_surroundings | 0.76 | [ 0.74, 0.78] | 49.60 | 1783 | < .001***
## High | risk_self | sum_measures | 0.15 | [ 0.10, 0.19] | 6.30 | 1826 | < .001***
## High | risk_self | sum_sources | 0.06 | [ 0.02, 0.11] | 2.73 | 1816 | 0.006**
## High | risk_surroundings | sum_measures | 0.14 | [ 0.09, 0.18] | 5.78 | 1796 | < .001***
## High | risk_surroundings | sum_sources | 0.09 | [ 0.05, 0.14] | 3.94 | 1786 | < .001***
## High | sum_measures | sum_sources | 0.13 | [ 0.08, 0.17] | 5.41 | 1838 | < .001***
##
## p-value adjustment method: Holm (1979)
## Observations: 332-1840