Exercise 4_2_1: Plotting Diagnostics

In the following exercises, you will work on your own ‘research question’ using the GESIS Panel data. If you have not already done so, you can load the data first using the following code:

library(haven)
library(dplyr)
library(sjlabelled)

gp_covid <- 
  read_sav(
    "./data/ZA5667_v1-1-0.sav"
  ) %>% 
  set_na(na = c(-1:-99, 97, 98)) %>% 
  rowwise() %>%
  mutate(
    mean_trust = 
      mean(
        c_across(hzcy044a:hzcy052a),
        na.rm = TRUE
      )
  ) %>% 
  ungroup() %>% 
  remove_all_labels() %>% 
  mutate(
    pol_leaning_cat = 
      case_when(
        between(political_orientation, 0, 3) ~ "left",
        between(political_orientation, 4, 7) ~ "center",
        political_orientation > 7 ~ "right"
      ) %>% 
      as.factor()
  ) %>% 
  filter(pol_leaning_cat != "NA")

1

Take a few minutes to choose a dependent variable (DV) and an independent variable (IV) from the GESIS Panel codebook. Don’t overthink your choices!

Clues

If you’re really struggling to find something you like, what about the following variables:

hzcy005a (risk of infecting others) as DV and hzcy015a (wearing a mask) as IV
hzcy026a (obeying curfew) as DV and age_cat as IV
hzcy072a (staying home for childcare) as DV and sex IV

Be aware that you may have to do some recoding, and that your sample is likely reduced due to filter questions.

2

Run a linear regression model with your variables and education_cat as covariate. If it is part of your predictor variables (IV), choose another one. Then check visually if the residuals are normally distributed.

Clues

You need the performance and see packages for this task (and dplyr for the preparatory wrangling part).

3

Now, do the full range of model checks using a function from the performance package.

Exercise 4_2_1: Plotting Diagnostics

Johannes Breuer, Stefan Jünger

Introduction to R for Data Analysis

1

Clues

2

Clues

3