In this final set of exercises for the data wrangling basics, we will transform and recode variables, and work with missing data. In the following exercises, we will focus on data wrangling functions from the tidyverse
.
Same procedure as before: Load the tidyverse
package(s) and import the data and have the codebook ready.
library(tidyverse)
gp_covid <- read_csv2("./data/ZA5667_v1-1-0.csv")
dplyr
function for creating and transforming variables to create a new variable representing political orientation named pol_orientation_new
that ranges from 1 to 11 instead of from 0 to 10 as is the case for the original variable political_orientation
.
dplyr
package, recode the values of the variable measuring trust in the federal government with regard to dealing with the Corona virus into a new variable named distrust_gov
that captures distrust instead of trust.
hzcy048a
. Disregarding missing values for the moment, its values range from 1 to 5. Remember that the correct syntax for recoding values with the corresponding dplyr
function is old value (enclosed in backticks) = new value.
dplyr
function, recode the following values as NA
for the new distrust_gov
variable: -99, -77, -33, and 98.
mutate()
with the dplyr
function for recoding specific values as NA
.
NA
for one variable, let’s now do the same for the whole gp_covid
data frame. This time, however, we do not want to recode 98 as NA
as it is a valid value for the id
variable.
mutate()
function.
na_if()
only takes only takes single values as its second argument (i.e., the value to replace with NA
), let’s use a function from the sjlabelled
function to achieve the same thing with fewer lines of code.
NA
as its second (required) argument.
political_orientation
? To answer this question, please use a function from the tidyr
package that allows you to exclude cases with missing values. Do not assign the result to a new object.
R
function nrow()
at the end of your pipe.
marstat
variable into an unordered factor called marstat_fac
that has 4 levels named after the different value labels listed in the codebook.
dplyr
function we need to use here (in combination with mutate()
) is recode_factor()
.