Exercise 3_1_1: Summary statistics

For this exercise, we will use the same subset of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany data as in the lecture. If you have stored that data set as an .rds file as shown in the slides, you can simply load it with the following command:

corona_survey <- readRDS("./data/corona_survey.rds")

If you have not saved the wrangled data as an .rds file yet, you need to go through the data wrangling pipeline shown in the EDA slides (again).

Also, in case you have not done so yet, please install summarytools and psych as we will need them for the exercises (in addition to base R and the tidyverse packages). The following code chunk will check if you have these packages installed and install them, if that is not the case.

if (!require(summarytools)) install.packages("summarytools")
if (!require(psych)) install.packages("psych")

1

Using a base R function, print some basic summary statistics for the variables sum_sources and sum_measures.

Clues

We can use the dplyr function for selecting variables and pipe the result into the required function.

2

Using a function from the psych package, print the summary statistics for all variables that assess how much people trust specific people or institutions in dealing with the Corona virus. The summary statistics should include IQR but no measures of skew (and kurtosis).

Clues

All names of the variables we are interested in here start with “trust”. You can find information about the arguments of the describe() function in its help file (?describe).

3

Use a function from the summarytools package to get summary statistics for the following variables in your dataset: left_right, sum_measures, mean_trust. Unlike in the lecture, however, we now want all stats (not just the “common” ones).

Clues

You can check the arguments for the function we need via ?descr.

4

Now, let’s use functions from dplyr to create grouped summary statistics. Compute separate means for the variables risk_self and risk_surroundings for the different age groups in the data set. The resulting summary variables should be called risk_self_mean and risk_surroundings_mean. You should exclude respondents with missing values for the variables of interest.

Clues

You need to group and summarize the data. There are (at least) two different ways of doing this.

Exercise 3_1_1: Summary statistics

Johannes Breuer, Stefan Jünger

Introduction to R for Data Analysis

1

Clues

2

Clues

3

Clues

4

Clues