The structure of this final exercise for the course is a bit different. We want you to create and work with R Markdown
documents (creating HTML
output) to go through some of the things we covered and did in the previous sessions. There are no coding tasks in this document.
We have created an R Markdown
document (which we have also already knitted to create an HTML
output file) that demonstrates some of the things you can do/create with R Markdown
and repeats a few of the topics and steps we went through in the sessions before this one.
This document uses Gapminder data and you can find it in the exercise
folder: It is called explore_gapminder.Rmd
.
explore_gapminder.Rmd
file in RStudio and explore it a bit to see what it contains. You can also open explore_gapminder.html
(in your browser) to see the .Rmd
and the resulting output document side-by-side.
.Rmd
file via the File
tab or the menu (File
-> Open File
) in RStudio.
You might notice that there are quite a few things specified in the YAML
header. Let’s briefly go through them:
toc: true -> The document will contain a table of contents (ToC)
toc_depth: 3 -> The ToC will contain header levels 1 to 3
number_sections: true -> The sections divided by headers will be numbered
toc_float: true -> The ToC is floating, meaning that it moves when you scroll
code_folding: hide -> By default, the code chunks are hidden, but you display them by clicking the Code
buttons in the HTML
document
theme: flatly -> The Bootswatch them flatly is used to style the document
highlight: tango -> The document uses the Pandoc code highlighting style tango
code_download: true -> The document includes a button allowing you to download the full code
df_print: paged -> When data frames are printed in the document they are printed in paged tables
NB: To knit the document you need to have the packages it uses installed. These are the following ones: rmarkdown
, knitr
, tidyverse
, visdat
, janitor
, pander
, patchwork
, correlation
, GGally
, broom
, sjPlot
, scales
.
An easy option for checking whether you have these packages installed, doing so if that is not the case in one go, and loading them is the packages()
function from the easypackages
package. To use it for this purpose, you can run the following code:
if (!require(easypackages)) install.packages("easypackages")
library(easypackages)
packages("rmarkdown", "knitr", "tidyverse", "visdat", "janitor", "pander", "patchwork", "correlation", "GGally", "broom", "sjPlot", "scales", prompt = F)
Feel free to play around a bit with the explore_gapminder.Rmd
and its output (you can also change parts of the YAML
header to see how that influences the output).
The big task we have for you for this exercise is to create a similar R Markdown
document for the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany data. Using the explore_gapminder.Rmd
as a starting point and guidance, we want you to do the following in this document:
Load and wrangle the data the same way as before to create the corona_survey
subset, but with one difference: Also include the variable hzcy026a
, rename it to obey_curfew
and code the value 4 for it as NA
In addition to that, define a function called inverter
as follows and use it to create a new variable called distrust_gov
based on trust_government
: inverter2 <- function (var) {max(var, na.rm = TRUE) - var + 1}
Note: You can do both of these first things in a code chunk that is not displayed in the output document.
Get an overview of the missing data using the vis_miss()
function from the visdat
package.
Look at the relative frequencies for the variables sex
, age_cat
, education_cat
, and choice_of_party
using a function from the janitor
package.
Create bar plots with ggplot2
to visualize the relative frequencies (percentages) for the variables education_cat
and choice_of_party
.
Using the pander
package, include a table with the output of the base R
function summary()
for the variables on trust in different people and institutions.
Create ggplot2
bar plots to visualize the distribution of the variables risk_self
and left_right
.
Create a ggplot2
boxplot to show differences in trust in the government between supporters of different parties; also showing (jittered) individual data points.
Calculate correlations between left_right
, sum_measures
, sum_sources
, and the trust variables using the correlation
package and display them in a table using the kable()
from the knitr
package.
Create a plot with the GGally
package to visualize these correlations.
Calculate a logistic regression model with obey_curfew
as the dependent variable and mean_trust
and risk_self
as predictors (also include an intercept).
Turn the output of this model into a table with the broom
package and display it with knitr::kable()
.
Create one regression plot with the coefficients and another one with the predictions using the sjPlot
package.
explore_gapminder.Rmd
(and the rest in the slides for the previous sessions). To get you started we have created an almost empty template .Rmd
called explore_gesis_panel_corona_stub.Rmd
in the exercises
folder.
explore_gesis_panel_corona.Rmd
in the solutions
folder.