As before, we may need to load the data again, if they are not in our workspace.
corona_survey <- readRDS("./data/corona_survey.rds")
In case you have not done so yet, please also install janitor
and correlation
.
if (!require(summaryrtools)) install.packages("janitor")
if (!require(summaryrtools)) install.packages("correlation")
base R
to create a crosstab for the variables age_cat
(rows) and choice_of_party
(columns) showing row percentages.
round()
, table()
, and prop.table()
here, add an argument to prop.table()
to get row totals, and transform the results to represent percentages.
round(prop.table(table(corona_survey$age_cat, corona_survey$choice_of_party), 1)*100, 2)
##
## CDU/CSU SPD FDP Linke Gruene AfD Other
## <= 25 years 18.18 9.09 15.15 12.12 33.33 6.06 6.06
## 26 to 30 years 19.55 12.85 10.06 11.73 31.84 5.59 8.38
## 31 to 35 years 22.35 11.73 12.85 8.38 33.52 7.26 3.91
## 36 to 40 years 28.37 13.49 7.91 5.58 29.77 10.70 4.19
## 41 to 45 years 26.67 8.10 14.29 7.62 26.67 12.86 3.81
## 46 to 50 years 28.30 13.96 8.30 8.30 27.55 10.19 3.40
## 51 to 60 years 25.48 12.13 9.26 11.85 28.75 10.76 1.77
## 61 to 65 years 31.77 12.04 4.35 11.71 27.09 10.37 2.68
## 66 to 70 years 28.57 13.95 9.63 9.97 25.91 10.63 1.33
## >= 71 years 31.64 20.60 7.16 12.24 16.12 10.45 1.79
janitor
package to get the same results.
tably()
object and add some additional functions to get the row percentages. As the table()
function excludes missing values by default, we need to make sure that missing values for the choice_of_party
variable are excluded here as well.
library(janitor)
corona_survey %>%
filter(!is.na(choice_of_party)) %>%
tabyl(age_cat, choice_of_party) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2)
## age_cat CDU/CSU SPD FDP Linke Gruene AfD Other
## <= 25 years 18.18% 9.09% 15.15% 12.12% 33.33% 6.06% 6.06%
## 26 to 30 years 19.55% 12.85% 10.06% 11.73% 31.84% 5.59% 8.38%
## 31 to 35 years 22.35% 11.73% 12.85% 8.38% 33.52% 7.26% 3.91%
## 36 to 40 years 28.37% 13.49% 7.91% 5.58% 29.77% 10.70% 4.19%
## 41 to 45 years 26.67% 8.10% 14.29% 7.62% 26.67% 12.86% 3.81%
## 46 to 50 years 28.30% 13.96% 8.30% 8.30% 27.55% 10.19% 3.40%
## 51 to 60 years 25.48% 12.13% 9.26% 11.85% 28.75% 10.76% 1.77%
## 61 to 65 years 31.77% 12.04% 4.35% 11.71% 27.09% 10.37% 2.68%
## 66 to 70 years 28.57% 13.95% 9.63% 9.97% 25.91% 10.63% 1.33%
## >= 71 years 31.64% 20.60% 7.16% 12.24% 16.12% 10.45% 1.79%
tabyl
we have created before.
corona_survey %>%
filter(!is.na(choice_of_party)) %>%
tabyl(age_cat, choice_of_party) %>%
chisq.test()
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 126.32, df = 54, p-value = 0.00000009966
correlation
package to calculate and print correlations between the following variables: risk_self
, risk_surround
, sum_measures
, sum_sources
library(correlation)
corona_survey %>%
select(risk_self,
risk_surroundings,
sum_measures,
sum_sources) %>%
correlation()
## # Correlation Matrix (pearson-method)
##
## Parameter1 | Parameter2 | r | 95% CI | t | df | p
## --------------------------------------------------------------------------------------
## risk_self | risk_surroundings | 0.76 | [0.75, 0.78] | 65.29 | 3075 | < .001***
## risk_self | sum_measures | 0.16 | [0.13, 0.20] | 9.29 | 3146 | < .001***
## risk_self | sum_sources | 0.06 | [0.03, 0.10] | 3.62 | 3129 | < .001***
## risk_surroundings | sum_measures | 0.14 | [0.11, 0.17] | 7.89 | 3098 | < .001***
## risk_surroundings | sum_sources | 0.09 | [0.06, 0.13] | 5.06 | 3081 | < .001***
## sum_measures | sum_sources | 0.13 | [0.09, 0.16] | 7.16 | 3166 | < .001***
##
## p-value adjustment method: Holm (1979)
## Observations: 3077-3168
education_cat
.
education_cat
before computing the correlations.
library(correlation)
corona_survey %>%
select(education_cat,
risk_self,
risk_surroundings,
sum_measures,
sum_sources) %>%
group_by(education_cat) %>%
correlation()
## # Correlation Matrix (pearson-method)
##
## Group | Parameter1 | Parameter2 | r | 95% CI | t | df | p
## -------------------------------------------------------------------------------------------------------
## Low | risk_self | risk_surroundings | 0.73 | [ 0.68, 0.78] | 19.59 | 330 | < .001***
## Low | risk_self | sum_measures | 0.19 | [ 0.09, 0.29] | 3.59 | 340 | 0.002**
## Low | risk_self | sum_sources | 5.20e-04 | [-0.11, 0.11] | 9.56e-03 | 338 | 0.992
## Low | risk_surroundings | sum_measures | 0.16 | [ 0.06, 0.27] | 3.04 | 334 | 0.010*
## Low | risk_surroundings | sum_sources | 0.07 | [-0.04, 0.17] | 1.26 | 332 | 0.420
## Low | sum_measures | sum_sources | 0.15 | [ 0.05, 0.25] | 2.85 | 343 | 0.014*
## Medium | risk_self | risk_surroundings | 0.77 | [ 0.74, 0.79] | 37.00 | 958 | < .001***
## Medium | risk_self | sum_measures | 0.16 | [ 0.10, 0.22] | 5.20 | 976 | < .001***
## Medium | risk_self | sum_sources | 0.06 | [ 0.00, 0.13] | 2.00 | 971 | 0.090
## Medium | risk_surroundings | sum_measures | 0.11 | [ 0.05, 0.17] | 3.50 | 964 | 0.002**
## Medium | risk_surroundings | sum_sources | 0.05 | [-0.01, 0.12] | 1.70 | 959 | 0.090
## Medium | sum_measures | sum_sources | 0.11 | [ 0.04, 0.17] | 3.36 | 981 | 0.002**
## High | risk_self | risk_surroundings | 0.76 | [ 0.74, 0.78] | 49.60 | 1783 | < .001***
## High | risk_self | sum_measures | 0.15 | [ 0.10, 0.19] | 6.30 | 1826 | < .001***
## High | risk_self | sum_sources | 0.06 | [ 0.02, 0.11] | 2.73 | 1816 | 0.006**
## High | risk_surroundings | sum_measures | 0.14 | [ 0.09, 0.18] | 5.78 | 1796 | < .001***
## High | risk_surroundings | sum_sources | 0.09 | [ 0.05, 0.14] | 3.94 | 1786 | < .001***
## High | sum_measures | sum_sources | 0.13 | [ 0.08, 0.17] | 5.41 | 1838 | < .001***
##
## p-value adjustment method: Holm (1979)
## Observations: 332-1840