Exercise 4_1_3: Regression reporting

After running a few regression analyses in the previous exercises, we will now explore some options for reporting their results.

If necessary, load the data first….

corona_survey <- readRDS("./data/corona_survey.rds")

In addition to the parameters package, which we have already used in the previous exercises on regression analysis, this time we also need the following packages: stargazer, report, and broom.

if (!require(summaryrtools)) install.packages("stargazer")
if (!require(summaryrtools)) install.packages("report")
if (!require(summaryrtools)) install.packages("broom")

Before we can report anything, we, of course, first need to run a regression analysis (again)…

1

Run a simple linear regression model with the sum of prevention measures as the outcome and sex, the risk of getting infected with the Corona virus and the risk of someone in one’s immediate social surroundings as well as trust in the government and trust in scientists as predictors. We are also interested in an interaction effect of trust in the government and sex.

Clues

Remember that you can include interaction effects in a formula in R using *. If you want to, you can have a look at the results via summary().

solution

reg_model <- lm(sum_measures ~ risk_self + risk_surroundings + sex*trust_government + trust_scientists,
                data = corona_survey)

summary(reg_model)

## 
## Call:
## lm(formula = sum_measures ~ risk_self + risk_surroundings + sex * 
##     trust_government + trust_scientists, data = corona_survey)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1731 -0.5057  0.1345  0.7612  2.8504 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.833383   0.149495  12.264  < 2e-16 ***
## risk_self                  0.099780   0.024857   4.014 6.12e-05 ***
## risk_surroundings          0.050251   0.022465   2.237   0.0254 *  
## sexFemale                  0.227753   0.154751   1.472   0.1412    
## trust_government           0.156388   0.028757   5.438 5.81e-08 ***
## trust_scientists           0.139924   0.028874   4.846 1.32e-06 ***
## sexFemale:trust_government 0.009148   0.040658   0.225   0.8220    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.11 on 2976 degrees of freedom
##   (782 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.08379,    Adjusted R-squared:  0.08194 
## F-statistic: 45.36 on 6 and 2976 DF,  p-value: < 2.2e-16

2

Using base R, print only the coefficients from the model.

Clues

You can use the same operator that you use in base R for accessing variables in a dataframe to select the element we want from the lm object.

solution

reg_model$coefficients

##                (Intercept)                  risk_self          risk_surroundings                  sexFemale 
##                1.833383213                0.099779664                0.050251340                0.227752905 
##           trust_government           trust_scientists sexFemale:trust_government 
##                0.156388457                0.139923783                0.009148164

3

Using a function from a package that allows us to view model parameters, print some more interesting information on the results of our model (including confidence intervals and p-values) in a nice tabular format.

Clues

We can use a function from the parameters package for printing model parameters here.

solution

model_parameters(reg_model)

## Parameter                       | Coefficient |   SE |        95% CI | t(2976) |      p
## ---------------------------------------------------------------------------------------
## (Intercept)                     |        1.83 | 0.15 | [ 1.54, 2.13] |   12.26 | < .001
## risk_self                       |        0.10 | 0.02 | [ 0.05, 0.15] |    4.01 | < .001
## risk_surroundings               |        0.05 | 0.02 | [ 0.01, 0.09] |    2.24 | 0.025 
## sex [Female]                    |        0.23 | 0.15 | [-0.08, 0.53] |    1.47 | 0.141 
## trust_government                |        0.16 | 0.03 | [ 0.10, 0.21] |    5.44 | < .001
## trust_scientists                |        0.14 | 0.03 | [ 0.08, 0.20] |    4.85 | < .001
## sex [Female] * trust_government |    9.15e-03 | 0.04 | [-0.07, 0.09] |    0.23 | 0.822

4

For further use in a publication, we also want to create a typical regression table using the stargazer package. We want the output to be in plain text format.

Clues

If you want to, you can also specify labels for the variables in your models as arguments in the stargazer function.

solution

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

stargazer(reg_model,
          type = "text",
          dep.var.labels=c("Sum of prevention measures"),
          covariate.labels=c("Personal infection risk", "Infection risk social surroundings", "Sex (Female)",
                             "Trust in the government", "Trust in scientists"))

## 
## ==============================================================
##                                        Dependent variable:    
##                                    ---------------------------
##                                    Sum of prevention measures 
## --------------------------------------------------------------
## Personal infection risk                     0.100***          
##                                              (0.025)          
##                                                               
## Infection risk social surroundings           0.050**          
##                                              (0.022)          
##                                                               
## Sex (Female)                                  0.228           
##                                              (0.155)          
##                                                               
## Trust in the government                     0.156***          
##                                              (0.029)          
##                                                               
## Trust in scientists                         0.140***          
##                                              (0.029)          
##                                                               
## sexFemale:trust_government                    0.009           
##                                              (0.041)          
##                                                               
## Constant                                    1.833***          
##                                              (0.149)          
##                                                               
## --------------------------------------------------------------
## Observations                                  2,983           
## R2                                            0.084           
## Adjusted R2                                   0.082           
## Residual Std. Error                     1.110 (df = 2976)     
## F Statistic                         45.361*** (df = 6; 2976)  
## ==============================================================
## Note:                              *p<0.1; **p<0.05; ***p<0.01

5

To produce custom tables and plots, we also want to store the key parameters of our model in a tidy tibble.

Clues

There’s a function in the broom package for that.

solution

library(broom)

tidy(reg_model)

## # A tibble: 7 x 5
##   term                       estimate std.error statistic  p.value
##   <chr>                         <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)                 1.83       0.149     12.3   9.12e-34
## 2 risk_self                   0.0998     0.0249     4.01  6.12e- 5
## 3 risk_surroundings           0.0503     0.0225     2.24  2.54e- 2
## 4 sexFemale                   0.228      0.155      1.47  1.41e- 1
## 5 trust_government            0.156      0.0288     5.44  5.81e- 8
## 6 trust_scientists            0.140      0.0289     4.85  1.32e- 6
## 7 sexFemale:trust_government  0.00915    0.0407     0.225 8.22e- 1

6

Of course, we can’t write a paper that just consists of tables and plots (though some would surely very much appreciate that). We also need to produce some actual text. We all know the “You should be writing” memes, so we’re in luck that R can also help us out here as well. Let’s use a function that produces some model language describing the results of our regression model.

Clues

The package we can use to save some time that we would otherwise spend typing or copying and pasting is called report.

solution

library(report)

report(reg_model)

## We fitted a linear model (estimated using OLS) to predict sum_measures with risk_self, risk_surroundings, sex, trust_government and trust_scientists (formula: sum_measures ~ risk_self + risk_surroundings + sex * trust_government + trust_scientists). The model explains a statistically significant and weak proportion of variance (R2 = 0.08, F(6, 2976) = 45.36, p < .001, adj. R2 = 0.08). The model's intercept, corresponding to risk_self = 0, risk_surroundings = 0, sex = Male, trust_government = 0 and trust_scientists = 0, is at 1.83 (95% CI [1.54, 2.13], t(2976) = 12.26, p < .001). Within this model:
## 
##   - The effect of risk_self is statistically significant and positive (beta = 0.10, 95% CI [0.05, 0.15], t(2976) = 4.01, p < .001; Std. beta = 0.11, 95% CI [0.06, 0.16])
##   - The effect of risk_surroundings is statistically significant and positive (beta = 0.05, 95% CI [6.20e-03, 0.09], t(2976) = 2.24, p = 0.025; Std. beta = 0.06, 95% CI [7.49e-03, 0.11])
##   - The effect of sex [Female] is statistically non-significant and positive (beta = 0.23, 95% CI [-0.08, 0.53], t(2976) = 1.47, p = 0.141; Std. beta = 0.23, 95% CI [0.16, 0.29])
##   - The effect of trust_government is statistically significant and positive (beta = 0.16, 95% CI [0.10, 0.21], t(2976) = 5.44, p < .001; Std. beta = 0.14, 95% CI [0.09, 0.19])
##   - The effect of trust_scientists is statistically significant and positive (beta = 0.14, 95% CI [0.08, 0.20], t(2976) = 4.85, p < .001; Std. beta = 0.10, 95% CI [0.06, 0.13])
##   - The interaction effect of trust_government on sex [Female] is statistically non-significant and positive (beta = 9.15e-03, 95% CI [-0.07, 0.09], t(2976) = 0.23, p = 0.822; Std. beta = 7.96e-03, 95% CI [-0.06, 0.08])
## 
## Standardized parameters were obtained by fitting the model on a standardized version of the dataset.

Exercise 4_1_3: Regression reporting

Johannes Breuer, Stefan Jünger

Introduction to R for Data Analysis

1

Clues

solution

2

Clues

solution

3

Clues

solution

4

Clues

solution

5

Clues

solution

6

Clues

solution