In this set of exercises, we will work with files from statistical software. The first tasks are about importing data, while the later ones are about labelling and exporting.
.sav
version of the data from the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany.
haven
package for this. The file should be stored in the data
folder.
library(haven)
gp_covid <-
read_spss("./data/ZA5667_v1-1-0.sav")
Unlike in flat files, such as CSV, the variables now have labels.
sjlabelled
package for this. Remember that you can use [ ] ro subset columns/variables (we only want to print the labels for the first ten variables).
library(sjlabelled)
get_label(gp_covid[1:10])
## za_number version doi
## "Studiennummer des Archivs" "Versionskennung und -datum des Archivs" "Digital Object Identifier (doi)"
## id cohort sex
## "Befragten-ID" "Rekrutierungskohorte" "Geschlecht"
## age_cat education_cat intention_to_vote
## "Alter, kategorisiert" "Bildung, kategorisiert" "Sonntagsfrage (gbzc011a)"
## choice_of_party
## "Sonntagsfrage Wahlentscheidung"
Unfortunately, it’s all in German. Imagine you are an education researcher working on a publication in English, and you are interested in the variable education_cat
. So you may want to consider translating the variable into English.
education_cat
from “Bildung, kategorisiert” to “Education, categorized”.
sjlabelled
for this.
gp_covid$education_cat <-
set_label(
gp_covid$education_cat,
label = "Education, categorized"
)
get_label(gp_covid$education_cat)
## [1] "Education, categorized"
Your collaborators ask you to share the data after changing labels and stuff. Unfortunately, they do not use R
or SPSS
and, hence, asks you to export your data as a Stata file.
haven
package provides a function for writing such files that is called and works in a similar way as the corresponding function for importing data in this particular format.
write_stata(gp_covid, "gesis_panel_corona_fancy_panels_final_final.dta")