Exercise 2_1_1: Selecting & renaming variables

As in the presentation, we will use data from the Public Use File (PUF) of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany for this exercise. You should (have) download(ed) the dataset in .csv format and saved it in a folder caller data within the folder containing the materials for this workshop. Also remember that it is helpful to consult the codebook for the data set.

That being sad, let’s get wrangling…

…but before we can do that, we need to load the tidyverse package(s) and import the data.

library(tidyverse)

gp_covid <- read_csv2("./data/ZA5667_v1-1-0.csv")

1

Before we apply any changes to our data, let’s first pipe them into a function to catch a glimpse (hint hint).

Clues

The clue for this task is already “hidden” in the text of the task ;-)

2

Using base R, create a new object called gp_covid_trust that contains all variables that assess how much people trust specific people or institutions in dealing with the Corona virus. To find the required variable names, you can check the codebook (search for “trust”) or have a look at the clue for this task.

Clues

The first variable we want to select for our subset is named hzcy044a, and the last one is hzcy052a. They appear consecutively in the data set. Remember that there are two options for selecting columns in base R: One is subsetting using [ ], the other is the subset() function.

3

Use a function from the dplyr package to create a new object named gp_covid_info that only contains the (binary) variables that asked about the use of different sources of information about the Corona virus. Again, you can consult the codebook to find the right variable names (search for “media consumption”) or have a look at the clue for this task, instead.

Clues

The first variable we want to select for our subset is named hzcy084a, and the last one is hzcy095a. They appear consecutively in the data set.

4

Again, using a function from the tidyverse package dplyr, select only the character variables from the gp_covid data set and assign them to an object named gp_covid_char.

Clues

You need to use the selection helper where() for this task.

5

After creating subsets of variables, let’s now rename those variables.

First, rename the variables hzcy084a to info_nat_pub_br, hzcy085a to info_nat_pr_br, and hzcy086a to info_nat_np using base R.

Then rename the variables hzcy087a hzcy088a hzcy089a hzcy090a hzcy091a hzcy092a hzcy093a, and hzcy095a to info_loc_pub_br, info_loc_pr_br, info_loc_np, info_fb, info_other_sm, info_personal, info_other, and info_none using a function from dplyr.

When using the dplyr function for renaming the variables, assign the result to the same object name as before (i.e., overwrite the gp_covid_info object).

Clues

The base R function we need here is colnames(), and the dplyr function is rename(). Remember that the correct syntax the rename() function is new_name = old_name.

6

As the final task in this set of exercises, do the previous selection and renaming procedure using dplyr functions again for the gp_covid_info object, but this time in one step.

Clues

You can also rename variables within the select() command.

Exercise 2_1_1: Selecting & renaming variables

Johannes Breuer, Stefan Jünger

Introduction to R for Data Analysis

1

Clues

2

Clues

3

Clues

4

Clues

5

Clues

6

Clues