Exercise 5_1_1: R Markdown

The structure of this final exercise for the course is a bit different. We want you to create and work with R Markdown documents (creating HTML output) to go through some of the things we covered and did in the previous sessions. There are no coding tasks in this document.

We have created an R Markdown document (which we have also already knitted to create an HTML output file) that demonstrates some of the things you can do/create with R Markdown and repeats a few of the topics and steps we went through in the sessions before this one.

This document uses Gapminder data and you can find it in the exercise folder: It is called explore_gapminder.Rmd.

1

The first thing we want you to do is to open the explore_gapminder.Rmdfile in RStudio and explore it a bit to see what it contains. You can also open explore_gapminder.html (in your browser) to see the .Rmd and the resulting output document side-by-side.

Clues

You can open the .Rmd file via the File tab or the menu (File -> Open File) in RStudio.

You might notice that there are quite a few things specified in the YAML header. Let’s briefly go through them:

toc: true -> The document will contain a table of contents (ToC)

toc_depth: 3 -> The ToC will contain header levels 1 to 3

number_sections: true -> The sections divided by headers will be numbered

toc_float: true -> The ToC is floating, meaning that it moves when you scroll

code_folding: hide -> By default, the code chunks are hidden, but you display them by clicking the Code buttons in the HTML document

theme: flatly -> The Bootswatch them flatly is used to style the document

highlight: tango -> The document uses the Pandoc code highlighting style tango

code_download: true -> The document includes a button allowing you to download the full code

df_print: paged -> When data frames are printed in the document they are printed in paged tables

NB: To knit the document you need to have the packages it uses installed. These are the following ones: rmarkdown, knitr, tidyverse, visdat, janitor, pander, patchwork, correlation, GGally, broom, sjPlot, scales.

An easy option for checking whether you have these packages installed, doing so if that is not the case in one go, and loading them is the packages() function from the easypackages package. To use it for this purpose, you can run the following code:

if (!require(easypackages)) install.packages("easypackages")
library(easypackages)

packages("rmarkdown", "knitr", "tidyverse", "visdat", "janitor", "pander", "patchwork", "correlation", "GGally", "broom", "sjPlot", "scales", prompt = F)

Feel free to play around a bit with the explore_gapminder.Rmd and its output (you can also change parts of the YAML header to see how that influences the output).

2

The big task we have for you for this exercise is to create a similar R Markdown document for the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany data. Using the explore_gapminder.Rmd as a starting point and guidance, we want you to do the following in this document:

Load and wrangle the data the same way as before to create the corona_survey subset, but with one difference: Also include the variable hzcy026a, rename it to obey_curfew and code the value 4 for it as NA
In addition to that, define a function called inverter as follows and use it to create a new variable called distrust_gov based on trust_government: inverter2 <- function (var) {max(var, na.rm = TRUE) - var + 1}

Note: You can do both of these first things in a code chunk that is not displayed in the output document.

Get an overview of the missing data using the vis_miss() function from the visdat package.
Look at the relative frequencies for the variables sex, age_cat, education_cat, and choice_of_party using a function from the janitor package.
Create bar plots with ggplot2 to visualize the relative frequencies (percentages) for the variables education_cat and choice_of_party.
Using the pander package, include a table with the output of the base R function summary() for the variables on trust in different people and institutions.
Create ggplot2 bar plots to visualize the distribution of the variables risk_self and left_right.
Create a ggplot2 boxplot to show differences in trust in the government between supporters of different parties; also showing (jittered) individual data points.
Calculate correlations between left_right, sum_measures, sum_sources, and the trust variables using the correlation package and display them in a table using the kable() from the knitr package.
Create a plot with the GGally package to visualize these correlations.
Calculate a logistic regression model with obey_curfew as the dependent variable and mean_trust and risk_self as predictors (also include an intercept).
Turn the output of this model into a table with the broom package and display it with knitr::kable().
Create one regression plot with the coefficients and another one with the predictions using the sjPlot package.

This is a lot, but you can find template code for most of this in the explore_gapminder.Rmd (and the rest in the slides for the previous sessions). To get you started we have created an almost empty template .Rmd called explore_gesis_panel_corona_stub.Rmd in the exercises folder.

Clues

If you’re stuck, you can find the solutions in the explore_gesis_panel_corona.Rmd in the solutions folder.

Exercise 5_1_1: R Markdown

Johannes Breuer, Stefan Jünger

Introduction to R for Data Analysis

1

Clues

2

Clues