In this final set of exercises for the data wrangling basics, we will transform and recode variables, and work with missing data. In the following exercises, we will focus on data wrangling functions from the tidyverse.
Same procedure as before: Load the tidyverse package(s) and import the data and have the codebook ready.
library(tidyverse)
gp_covid <- read_csv2("./data/ZA5667_v1-1-0.csv")
dplyrfunction for creating and transforming variables to create a new variable representing political orientation named pol_orientation_new that ranges from 1 to 11 instead of from 0 to 10 as is the case for the original variable political_orientation.
gp_covid <- gp_covid %>%
mutate(pol_orientation_new = political_orientation + 1)
dplyr package, recode the values of the variable measuring trust in the federal government with regard to dealing with the Corona virus into a new variable named distrust_gov that captures distrust instead of trust.
hzcy048a. Disregarding missing values for the moment, its values range from 1 to 5. Remember that the correct syntax for recoding values with the corresponding dplyr function is old value (enclosed in backticks) = new value.
gp_covid <- gp_covid %>%
mutate(distrust_gov = recode(hzcy048a,
`5` = 1, # old_value = new_value
`4` = 2,
`2` = 4,
`1` = 5))
dplyr function, recode the following values as NA for the new distrust_gov variable: -99, -77, -33, and 98.
mutate() with the dplyr function for recoding specific values as NA.
gp_covid <- gp_covid %>%
mutate(distrust_gov = na_if(distrust_gov, -99)) %>%
mutate(distrust_gov = na_if(distrust_gov, -77)) %>%
mutate(distrust_gov = na_if(distrust_gov, -33)) %>%
mutate(distrust_gov = na_if(distrust_gov, 98))
NA for one variable, let’s now do the same for the whole gp_covid data frame. This time, however, we do not want to recode 98 as NA as it is a valid value for the id variable.
mutate() function.
gp_covid <- gp_covid %>%
na_if(-99) %>%
na_if(-77) %>%
na_if(-33)
na_if() only takes only takes single values as its second argument (i.e., the value to replace with NA), let’s use a function from the sjlabelled function to achieve the same thing with fewer lines of code.
NA as its second (required) argument.
library(sjlabelled)
gp_covid <- gp_covid %>%
set_na(na = c(-99, -77, -33))
political_orientation? To answer this question, please use a function from the tidyr package that allows you to exclude cases with missing values. Do not assign the result to a new object.
R function nrow() at the end of your pipe.
gp_covid %>%
drop_na(political_orientation) %>%
nrow()
## [1] 3678
marstat variable into an unordered factor called marstat_fac that has 4 levels named after the different value labels listed in the codebook.
dplyr function we need to use here (in combination with mutate()) is recode_factor().
gp_covid <- gp_covid %>%
mutate(marstat_fac = recode_factor(marstat,
`1` = "Married",
`2` = "Single",
`3` = "Divorced",
`4` = "Widowed"))