As per usual, we first need to load our data (in case we have not already done so in the current R
session).
corona_survey <- readRDS("./data/corona_survey.rds")
In case you have not done so yet, please also install the performance
package for this set of exercises.
if (!require(summaryrtools)) install.packages("performance")
In the following exercise, we will cover/repeat some of the basics of regression analysis in R
.
summary()
again.
reg_linear <- lm(trust_government ~ choice_of_party + left_right,
data = corona_survey)
summary(reg_linear)
##
## Call:
## lm(formula = trust_government ~ choice_of_party + left_right,
## data = corona_survey)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1453 -0.6266 0.1119 0.6368 2.2467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.14528 0.08030 51.623 < 2e-16 ***
## choice_of_partySPD -0.17212 0.06746 -2.551 0.010793 *
## choice_of_partyFDP -0.36978 0.07355 -5.028 0.0000005330994403 ***
## choice_of_partyLinke -0.71825 0.07952 -9.032 < 2e-16 ***
## choice_of_partyGruene -0.20025 0.05711 -3.506 0.000463 ***
## choice_of_partyAfD -1.17933 0.07266 -16.230 < 2e-16 ***
## choice_of_partyOther -0.90520 0.11813 -7.663 0.0000000000000263 ***
## left_right -0.02127 0.01278 -1.664 0.096306 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9326 on 2384 degrees of freedom
## (1373 Beobachtungen als fehlend gelöscht)
## Multiple R-squared: 0.1354, Adjusted R-squared: 0.1329
## F-statistic: 53.34 on 7 and 2384 DF, p-value: < 2.2e-16
glm()
function in which you need to specify a link function. The name of the outcome variable is info_fb
.
reg_logistic <- glm(info_fb ~ trust_scientists + trust_government + choice_of_party,
family = binomial(link = "logit"),
data = corona_survey)
summary(reg_logistic)
##
## Call:
## glm(formula = info_fb ~ trust_scientists + trust_government +
## choice_of_party, family = binomial(link = "logit"), data = corona_survey)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8835 -0.6514 -0.5690 -0.5024 2.1189
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.94867 0.33188 -2.858 0.00426 **
## trust_scientists -0.05662 0.07607 -0.744 0.45673
## trust_government -0.12682 0.06224 -2.038 0.04158 *
## choice_of_partySPD 0.29951 0.17940 1.670 0.09502 .
## choice_of_partyFDP 0.12930 0.21207 0.610 0.54206
## choice_of_partyLinke 0.28827 0.19415 1.485 0.13761
## choice_of_partyGruene -0.26705 0.16234 -1.645 0.09997 .
## choice_of_partyAfD 0.39280 0.20172 1.947 0.05151 .
## choice_of_partyOther 0.19963 0.32389 0.616 0.53767
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2162.5 on 2364 degrees of freedom
## Residual deviance: 2130.2 on 2356 degrees of freedom
## (1400 Beobachtungen als fehlend gelöscht)
## AIC: 2148.2
##
## Number of Fisher Scoring iterations: 4
?family
to see how you can include a cauchit link.
reg_cauchit <- glm(info_fb ~ trust_scientists + trust_government + choice_of_party,
family = binomial(link = "cauchit"),
data = corona_survey)
summary(reg_cauchit)
##
## Call:
## glm(formula = info_fb ~ trust_scientists + trust_government +
## choice_of_party, family = binomial(link = "cauchit"), data = corona_survey)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9389 -0.6451 -0.5687 -0.5069 2.0873
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.76499 0.46966 -1.629 0.1034
## trust_scientists -0.09832 0.10533 -0.933 0.3506
## trust_government -0.17913 0.08998 -1.991 0.0465 *
## choice_of_partySPD 0.51237 0.28604 1.791 0.0732 .
## choice_of_partyFDP 0.25354 0.35044 0.724 0.4694
## choice_of_partyLinke 0.48001 0.30017 1.599 0.1098
## choice_of_partyGruene -0.53936 0.34371 -1.569 0.1166
## choice_of_partyAfD 0.55828 0.29953 1.864 0.0623 .
## choice_of_partyOther 0.34090 0.47707 0.715 0.4749
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2162.5 on 2364 degrees of freedom
## Residual deviance: 2130.5 on 2356 degrees of freedom
## (1400 Beobachtungen als fehlend gelöscht)
## AIC: 2148.5
##
## Number of Fisher Scoring iterations: 5
test = "LRT"
in the function we need for this to perform a likelihood ratio test. What’s your interpretation?
anova(reg_logistic,
reg_cauchit,
test = "LRT")
## Analysis of Deviance Table
##
## Model 1: info_fb ~ trust_scientists + trust_government + choice_of_party
## Model 2: info_fb ~ trust_scientists + trust_government + choice_of_party
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2356 2130.2
## 2 2356 2130.5 0 -0.28277
# Interpretation of difference:
# It seems there is no big difference. We can make the reviewer happy with the
# cauchit models and still keep our main findings.
performance
package provides a function for comparing the performance of different models in terms of their fit.
compare_performance(reg_logistic,
reg_cauchit,
metrics = c("AIC", "BIC", "R2", "RMSE"))
## # Comparison of Model Performance Indices
##
## Name | Model | AIC | BIC | RMSE | Tjur's R2 | Nagelkerke's R2
## --------------------------------------------------------------------------------
## reg_logistic | glm | 2148.200 | 2200.117 | 0.374 | 0.014 |
## reg_cauchit | glm | 2148.482 | 2200.399 | 0.374 | | 0.022