Introduction to R for Data Analysis

# Introduction to R for Data Analysis
## Data Wrangling Basics
### Johannes Breuer & Stefan Jünger
### 2021-08-03

---

---

## Data wrangling 🤠

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\data_cowboy.png" width="95%" style="display: block; margin: auto;" />
Artwork by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)

---

## What is data wrangling?

Data wrangling is the process of "getting the data into shape", so that you can then explore and analyze them.

Common data wrangling steps when working with tabular data in the social & behavioral sciences (e.g., from surveys) include:
- **renaming** variables
- **selecting** a subset of variables
- **filtering** a subset of cases
- **recoding** variables/values (incl. missing values)
- **creating/computing** new variables

The (in)famous **80/20-rule**: 80% wrangling, 20% analysis (of course, this ratio relates to the time required for writing the code, not the computing time).

---

## The `tidyverse`

> The `tidyverse` is an .highlight[opinionated collection of R packages designed for data science]. All packages share an .highlight[underlying design philosophy, grammar, and data structures] ([Tidyverse website](https://www.tidyverse.org/)).

> The `tidyverse` is a .highlight[coherent system of packages for data manipulation, exploration and visualization] that share a .highlight[common design philosophy] ([Rickert, 2017](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/)).

---

## Benefits of the `tidyverse`

Data wrangling can also be done with `base R`. However, the syntax for this is typically (more) verbose and not intuitive and, hence, difficult to learn, remember, and read (plus many `tidyverse` operations are faster than their base `R` equivalents).

---

## Benefits of the `tidyverse`

`Tidyverse` syntax is designed to increase **human-readability**. This makes it especially **attractive for `R` novices** as it can facilitate the experience of **self-efficacy** (see [Robinson, 2017](http://varianceexplained.org/r/teach-tidyverse/)). The `tidyverse` also aims for **consistency** (e.g., data frame as first argument and output) and uses **smarter defaults** (e.g., no partial matching of data frame and column names).

---

## The 'dark side' of the `tidyverse`

`tidyverse` is not `R` as in `base R`
- some routines are like using a whole different language, which...
  - ... can be nice when learning `R`
  - ... can get difficult when searching for solutions to certain problems

Often, `tidyverse` functions are under heavy development
- they change and can potentially break your code
  - Example: [Converting tables into long or wide format](https://tidyr.tidyverse.org/news/index.html#pivoting)
  
- to learn more about the `tidyverse` lifecycle you can watch this [talk by Hadley Wickham](https://www.youtube.com/watch?v=izFssYRsLZs) or read the corresponding [documentation](https://lifecycle.r-lib.org/articles/stages.html#deprecated)

---

## `Base R` vs. `tidyverse`

Similar to other fierce academic debates over, e.g., `R` vs. `Python` or Frequentism vs. Bayesianism, people have argued [for](http://varianceexplained.org/r/teach-tidyverse/) and [against](https://blog.ephorie.de/why-i-dont-use-the-tidyverse) using/teaching the `tidyverse`.

Our personal experience with teaching the `tidyverse` is something like this...

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\tidyverse_meme.png" width="50%" style="display: block; margin: auto;" />
.center[
Source: https://s.unhb.de/ReoyN
]

---

## Data wrangling alternatives

As with almost all tasks in `R`, there are more than two packages for data wrangling. Two alternatives (or additions) to `base R` and the `tidyverse` are:
- [`data.table`](https://rdatatable.gitlab.io/data.table/index.html)
- [`datawizard`](https://easystats.github.io/datawizard/)

---

## `data.table`

The `data.table` package also is a powerful tool for data wrangling, especially if you work with large data sets. The reason we do not discuss `data.table` in this course is that neither of us has extensive experience with it, and comparing all three options (`base R`, `tidyverse`, and `data.table`) side-by-side would be enough for a separate workshop/course.

There is, however, a very detailed [blog post by Jason Mercer](https://wetlandscapes.com/blog/a-comparison-of-r-dialects/) that compares the functionalities of `base R`, `tidyverse`, and `data.table` for data wrangling and [another one by Atreba](https://atrebas.github.io/post/2019-03-03-datatable-dplyr/) that focuses on a comparison between `data.table` and [`dplyr`](https://dplyr.tidyverse.org/) which is a key package for data manipulation from the `tidyverse`.

---

# `datawizard` 🧙

`datawizard` is a fairly new contender in the data wrangling game that also offers quite a few handy and easy to use functions. `datawizard` is part of the [`easystats` collection of `R` packages](https://easystats.github.io/easystats/) which offer many helpful functionalities for data preparation, analysis, and reporting, which can nicely extend or complement the `tidyverse`. We will discuss some of the `easystats` packages again in the sessions on exploratory and confirmatory data analysis.

---

## Structure & focus of this session

For most of the data wrangling tasks we discuss in this section, we will show how do do them with `base R` and the `tidyverse`, so that you can get a sense of the differences.

Our main focus, however, will be on the use of packages (and functions) from the `tidyverse` and how they can be used to clean and transform your data.

Of course, it is possible to combine `base R` and `tidyverse` code. However, in the long run, you should try to aim for consistency.

---

## Lift-off into the `tidyverse` 🚀

**Install all `tidyverse` packages** (for the full list of `tidyverse` packages see [https://www.tidyverse.org/packages/](https://www.tidyverse.org/packages/))

```r
install.packages("tidyverse")
```
**Load core `tidyverse` packages** (NB: To save time and reduce namespace conflicts you can also load `tidyverse` packages individually)

```r
library("tidyverse")
```

---

## `tidyverse` vocabulary 101

While there is much more to the `tidyverse` than this, three important concepts that you need to be familiar with, if you want to use it, are:

1. Tidy data

2. Tibbles

3. Pipes

We already discussed tibbles in the session on *Data Import & Export*, so we will focus on tidy data and pipes here.

---

## Tidy data

The 3 rules of tidy data:

1. Each **variable** is in a separate **column**.

2. Each **observation** is in a separate **row**.

3. Each **value** is in a separate **cell**.

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\tidy_data.png" width="2560" style="display: block; margin: auto;" />
Source: https://r4ds.had.co.nz/tidy-data.html

*Note*: In the `tidyverse` terminology 'tidy data' usually also means data in long format (where applicable).

---

## Wide vs. long format

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\wide-long.png" width="70%" style="display: block; margin: auto;" />
Source: https://github.com/gadenbuie/tidyexplain#tidy-data

.small[
*Note*: The functions `pivot_wider()` and `pivot_longer()` from the [`tidyr` package](https://tidyr.tidyverse.org/) are easy-to-use options from changing data from long to wide format and vice versa.
]

---

### What's a pipe?

---

## Pipes

Usually, in `R` we apply functions as follows:

```r
f(x)
```

In the logic of pipes this function is written as:

```r
x %>% f(.)
```

Here, object `x` is piped into function `f`, becoming (by default) its first argument (but by using *.* it can also be fed into other arguments).

We can use pipes with more than one function:

```r
x %>% 
  f_1() %>% 
  f_2() %>% 
  f_3()
```

---

## Pipes

There `%>%` pipe used in the `tidyverse` is part of the [`magrittr` package](https://magrittr.tidyverse.org/) which also includes other specialized types of pipes.

*RStudio* offers a keyboard shortcut for inserting the `%>%` pipe: <kbd>Ctrl + Shift + M</kbd> (*Windows* & *Linux*)/<kbd>Cmd + Shift + M</kbd> (*Mac*)

Since [version 4.1.0](https://cran.r-project.org/bin/windows/base/NEWS.R-4.1.0.html), `base R` also offers its own pipe `|>`, which is similar to but not the same as the `%>%` pipe.

---

## Data set

For the examples and exercises in this session we will, again, use data from the *GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany*.

Remember that to code along and for the exercises the *GESIS Panel* files should be in a folder called `data` in the same folder as the other materials for this course.

---

## Interlude 1: Citing data

If you (re-)use existing data sets, please cite them in your publications, theses, teaching materials, etc. Data repositories normally provide information on how to cite the data. For example, the APA-style citation for *Public Use File (PUF) of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany* is:

GESIS Panel Team (2020). GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany. *GESIS Datenarchiv, Köln. ZA5667 Datenfile Version 1.1.0*, https://doi.org/10.4232/1.13520.

---

## Interlude 2: Citing FOSS

You should also make sure to cite the free and open-source software that you use, such as `R` packages and `R` itself. There is a function in `R` that tells you how to cite it or any of the packages you have installed.

```r
citation()
```

```
## 
## To cite R in publications use:
## 
##   R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical
##   Computing, Vienna, Austria. URL https://www.R-project.org/.
## 
## Ein BibTeX-Eintrag für LaTeX-Benutzer ist
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2021},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.
```

---

## Interlude 3: Codebook

It is always advisable to consult the codebook (if there is one) before starting to work with a data set. The *GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany* comes with a very [detailed codebook](https://dbk.gesis.org/dbksearch/download.asp?id=67378).

Side note: If you want to (semi-)automatically generate a codebook for your own dataset, there are several options in `R`:

- The [`codebook` package](https://rubenarslan.github.io/codebook/) which includes an *RStudio*-Addin and also offers a [web app](https://rubenarslan.ocpu.io/codebook/www/)

- the `makeCodebook()` function from the [`dataMaid` package](https://github.com/ekstroem/dataMaid) (see this [blog post](http://sandsynligvis.dk/articles/18/codebook.html) for a short tutorial)

- the `codebook()` function from the [`memisc` package](https://github.com/melff/memisc)

---

## Load the data

The first step, of course, is loading the data into `R`. The *Public Use File (PUF) of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany* is available in different formats. We will work with the `.csv` file.

```r
gp_covid <- read_csv2("./data/ZA5667_v1-1-0.csv")
```

*Note*: `read_csv2()` is used to load files that use "; for the field separator and , for the decimal point" (from the function help file), which is the format that the `.csv` version of this data set is in.

---

## Note: Tidy vs. untidy data

As a lot of work (by many people) has already gone into this data set, the *Public Use File (PUF) of the GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany* is already tidy. If you collect data yourself, this may not be the case (at least for the raw data). For example, cells may hold more than one value or a variable that should be in one column is spread across multiple columns (e.g., parts of a date or name).

If you need to make your data tidy or change it from wide to long format or vice versa (which may, e.g., be necessary if you work with longitudinal survey data from multiple waves), the [`tidyr` package](https://tidyr.tidyverse.org/) from the `tidyverse` is a good option.

---

## `dplyr`

The `tidyverse` examples in the following will make use of functions from the [`dplyr` package](https://dplyr.tidyverse.org/):
- `dplyr` functions are verbs that signal an action  
- first argument = a data frame  
- the output normally also is a data frame (tibble) 
- columns (= variables in a tidy data frame) can be referenced without quotation marks (non-standard evaluation)
- actions (verbs) can be applied to columns (variables) and rows (cases/observations)

---

## First look 👀

The `dplyr` package provides a function for getting a first good look at your data, that is especially helpful when working with data sets that contain many columns/variables. The function `glimpse()` prints a data frame/tibble in a way that represents columns as rows and rows as columns and also provides some additional information about the data frame and its columns.

```r
gp_covid %>% 
  glimpse()
```

---

```
## Rows: 3,765
## Columns: 137
## $ za_number [3m[38;5;246m<chr>[39m[23m "ZA5667", "ZA5667", "ZA5667", "ZA5667", "ZA5667", "ZA5667", "ZA5667", "ZA5667", "ZA5667", "ZA5667", ~
## $ version [3m[38;5;246m<chr>[39m[23m "v1-1-0 2020-04-27", "v1-1-0 2020-04-27", "v1-1-0 2020-04-27", "v1-1-0 2020-04-27", "v1-1-0 2020-04-~
## $ doi [3m[38;5;246m<chr>[39m[23m "10.4232/1.13520", "10.4232/1.13520", "10.4232/1.13520", "10.4232/1.13520", "10.4232/1.13520", "10.4~
## $ id [3m[38;5;246m<dbl>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 2~
## $ cohort [3m[38;5;246m<dbl>[39m[23m 3, 1, 3, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 3, 3, 1, 1, 1, 1, 3, 1, 1, 1, 3, 2, 1, 2, 1~
## $ sex [3m[38;5;246m<dbl>[39m[23m 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2~
## $ age_cat [3m[38;5;246m<dbl>[39m[23m 7, 7, 8, 4, 1, 10, 4, 7, 8, 1, 6, 8, 2, 6, 2, 2, 2, 7, 4, 8, 1, 7, 4, 3, 5, 7, 7, 6, 6, 5, 7, 7, 5, ~
## $ education_cat [3m[38;5;246m<dbl>[39m[23m 3, 2, 2, 3, 3, 2, 2, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 3, 2~
## $ intention_to_vote [3m[38;5;246m<dbl>[39m[23m 2, 2, 2, 2, -33, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -77, 2, 2, 2, 2, 2, -99~
## $ choice_of_party [3m[38;5;246m<dbl>[39m[23m 1, 5, 1, 1, -33, 6, 6, 5, 1, 2, 1, 6, 98, 1, 7, 1, 5, 1, 98, 1, 1, 1, 7, 1, 7, -77, 1, 5, 6, 3, 1, 1~
## $ political_orientation [3m[38;5;246m<dbl>[39m[23m 6, 5, 5, 7, 4, 10, 5, 6, 6, 7, 6, 7, 5, 6, 6, 3, 5, 5, 6, 6, 4, 6, 5, 5, 7, -77, 6, 4, 6, 8, 4, 6, 3~
## $ marstat [3m[38;5;246m<dbl>[39m[23m 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 3, 1, 1, 1, 2, 3, 2, 1, 1~
## $ household [3m[38;5;246m<dbl>[39m[23m 1, 2, 2, 3, 3, 2, 3, 3, 2, 3, 2, 2, 3, 3, 1, 3, 3, 2, 3, 2, 2, 3, 3, 2, 3, 2, 2, 3, 2, 1, 2, 2, 3, 2~
## $ hzcy001a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 4, -33, 3, 4, 4, -33, 7, 4, 5, -33, 6, 4, -33, -33, 6, -33, 3, -33, 6, 5, 5, 6, 5, 7, 6, ~
## $ hzcy002a [3m[38;5;246m<dbl>[39m[23m -33, 5, 6, 4, -33, 3, 3, 4, -33, 5, 6, 6, -33, 6, 5, -33, -33, 6, -33, 5, -33, 6, 6, 97, 6, 6, 7, 6,~
## $ hzcy003a [3m[38;5;246m<dbl>[39m[23m -33, 2, 3, 2, -33, -99, 3, 3, -33, 3, 3, 7, -33, 3, 2, -33, -33, 4, -33, 3, -33, 1, 3, 4, 3, 3, 7, 3~
## $ hzcy004a [3m[38;5;246m<dbl>[39m[23m -33, 5, 6, 4, -33, 3, 3, 3, -33, 4, 5, 6, -33, 7, 3, -33, -33, 5, -33, 4, -33, 3, 4, 5, 6, 3, 7, 3, ~
## $ hzcy005a [3m[38;5;246m<dbl>[39m[23m -33, 5, 6, 3, -33, 3, 4, 4, -33, 2, 4, 6, -33, 4, 2, -33, -33, 6, -33, 3, -33, 6, 4, 3, 5, 4, 7, 5, ~
## $ hzcy006a [3m[38;5;246m<dbl>[39m[23m -33, 1, 1, 0, -33, 1, 1, 1, -33, 1, 1, 1, -33, 1, 1, -33, -33, 1, -33, 1, -33, 1, 0, 1, 1, 1, 0, 1, ~
## $ hzcy007a [3m[38;5;246m<dbl>[39m[23m -33, 0, 1, 0, -33, 0, 1, 1, -33, 1, 1, 1, -33, 1, 1, -33, -33, 1, -33, 1, -33, 1, 0, 1, 0, 1, 0, 1, ~
## $ hzcy008a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 1, -33, 0, 0, 0, -33, 1, 1, 0, -33, 1, 1, -33, -33, 0, -33, 1, -33, 0, 1, 1, 1, 1, 0, 0, ~
## $ hzcy009a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy010a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 1, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy011a [3m[38;5;246m<dbl>[39m[23m -33, 1, 1, 1, -33, 0, 1, 1, -33, 1, 1, 1, -33, 1, 1, -33, -33, 1, -33, 0, -33, 1, 0, 1, 1, 1, 1, 1, ~
## $ hzcy012a [3m[38;5;246m<dbl>[39m[23m -33, 1, 0, 1, -33, 1, 0, 1, -33, 1, 1, 0, -33, 0, 0, -33, -33, 1, -33, 0, -33, 1, 0, 1, 1, 1, 1, 0, ~
## $ hzcy013a [3m[38;5;246m<dbl>[39m[23m -33, 1, 0, 0, -33, 0, 0, 0, -33, 1, 1, 1, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 1, 0, 0, 1, 0, 1, ~
## $ hzcy014a [3m[38;5;246m<dbl>[39m[23m -33, 0, 1, 1, -33, 0, 1, 1, -33, 0, 1, 1, -33, 1, 1, -33, -33, 1, -33, 1, -33, 1, 1, 1, 0, 1, 1, 1, ~
## $ hzcy015a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy016a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy018a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy019a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 3, -33, 4, 5, 5, -33, 4, 3, 5, -33, 4, 4, -33, -33, 5, -33, 5, -33, 5, 4, 5, 4, 4, 5, 4, ~
## $ hzcy020a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 4, -33, 4, 5, 5, -33, 5, 4, 5, -33, 4, 5, -33, -33, 5, -33, 5, -33, 5, 4, 4, -99, 5, 5, 4~
## $ hzcy021a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 4, -33, 4, 5, 5, -33, 4, 4, 5, -33, 5, 5, -33, -33, 5, -33, 5, -33, 5, 4, -99, 3, 5, 5, 5~
## $ hzcy022a [3m[38;5;246m<dbl>[39m[23m -33, 5, 4, 3, -33, 4, 5, 5, -33, 5, 4, 4, -33, 2, 4, -33, -33, 4, -33, 4, -33, 5, 2, 3, 3, 5, 5, 4, ~
## $ hzcy023a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 4, -33, 4, 5, 3, -33, 5, 3, 5, -33, 5, 5, -33, -33, 5, -33, 4, -33, 5, 5, 4, 5, 5, 5, 5, ~
## $ hzcy024a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 2, -33, 2, 5, 3, -33, 5, 3, 2, -33, 5, 5, -33, -33, 5, -33, 3, -33, 3, 4, 4, 5, 5, 5, 5, ~
## $ hzcy025a [3m[38;5;246m<dbl>[39m[23m -33, 5, 4, 2, -33, 3, 3, 3, -33, 3, 3, 2, -33, 2, 4, -33, -33, 4, -33, 3, -33, 5, 4, 3, 5, 5, 5, 5, ~
## $ hzcy026a [3m[38;5;246m<dbl>[39m[23m -33, 4, 1, 1, -33, 1, 1, 1, -33, 1, 1, 1, -33, 1, 1, -33, -33, 1, -33, 1, -33, 1, 4, 1, 1, 1, 1, 4, ~
## $ hzcy027a [3m[38;5;246m<dbl>[39m[23m -33, -88, 5, 4, -33, 5, 5, 5, -33, 4, 3, 1, -33, 4, 5, -33, -33, 5, -33, 5, -33, 4, -88, 4, 5, 4, 5,~
## $ hzcy028a [3m[38;5;246m<dbl>[39m[23m -33, -88, 1, 2, -33, 2, 3, 1, -33, 2, 3, 2, -33, 2, 2, -33, -33, 2, -33, 1, -33, 4, -88, 2, 2, 2, 2,~
## $ hzcy029a [3m[38;5;246m<dbl>[39m[23m -33, -88, 4, 3, -33, 4, 5, 5, -33, 4, 4, 5, -33, 2, 3, -33, -33, 4, -33, 4, -33, 3, -88, 3, 5, 5, 5,~
## $ hzcy030a [3m[38;5;246m<dbl>[39m[23m -33, -88, 5, 3, -33, 4, 5, 5, -33, 5, 4, 5, -33, 2, 4, -33, -33, 5, -33, 3, -33, 5, -88, 4, 5, 5, 5,~
## $ hzcy031a [3m[38;5;246m<dbl>[39m[23m -33, -88, 5, 3, -33, 4, 5, 5, -33, 4, 4, 5, -33, 2, 5, -33, -33, 5, -33, 3, -33, 5, -88, 4, 5, 5, 5,~
## $ hzcy032a [3m[38;5;246m<dbl>[39m[23m -33, -88, 5, 3, -33, 4, 5, 5, -33, 4, 4, 4, -33, 4, 5, -33, -33, 5, -33, 4, -33, 5, -88, 5, 5, 5, 5,~
## $ hzcy033a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy034a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy035a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy036a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy037a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy038a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy039a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, ~
## $ hzcy040a [3m[38;5;246m<dbl>[39m[23m -33, 3, 2, 3, -33, 2, 3, 3, -33, 1, 3, 1, -33, 3, 2, -33, -33, 3, -33, 3, -33, 2, 3, 3, 3, 2, 2, 2, ~
## $ hzcy041a [3m[38;5;246m<dbl>[39m[23m -33, 3, 3, 4, -33, 3, 3, 3, -33, 3, 3, 1, -33, 4, 3, -33, -33, 3, -33, 3, -33, 2, 4, 3, 4, 3, 3, 3, ~
## $ hzcy042a [3m[38;5;246m<dbl>[39m[23m -33, 3, 3, 3, -33, 3, 3, 3, -33, 2, 2, 2, -33, 2, 2, -33, -33, 2, -33, 2, -33, 2, 1, 2, 2, 3, 2, 2, ~
## $ hzcy043a [3m[38;5;246m<dbl>[39m[23m -33, 3, 3, 3, -33, 2, 3, 3, -33, 2, 2, 1, -33, 3, 2, -33, -33, 2, -33, 3, -33, 1, 2, 3, 2, 3, 3, 2, ~
## $ hzcy044a [3m[38;5;246m<dbl>[39m[23m -33, 5, 4, 4, -33, 4, 4, 4, -33, 5, 3, 5, -33, 4, 3, -33, -33, 5, -33, 4, -33, 5, 5, 98, 4, 5, 4, 5,~
## $ hzcy045a [3m[38;5;246m<dbl>[39m[23m -33, 4, 4, 4, -33, 5, 4, 4, -33, 4, 4, 3, -33, 3, 3, -33, -33, 5, -33, 2, -33, 4, 5, 98, 4, 98, 98, ~
## $ hzcy046a [3m[38;5;246m<dbl>[39m[23m -33, 4, 5, 4, -33, 4, 4, 4, -33, 3, 2, 2, -33, 3, 3, -33, -33, 5, -33, 2, -33, 3, 4, 98, 4, 4, 3, 3,~
## $ hzcy047a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 4, -33, 5, 4, 5, -33, 4, 4, 5, -33, 4, 4, -33, -33, 4, -33, 5, -33, 5, 5, 4, 4, 4, 4, 4, ~
## $ hzcy048a [3m[38;5;246m<dbl>[39m[23m -33, 4, 4, 4, -33, 4, 3, 4, -33, 3, 2, 1, -33, 4, 2, -33, -33, 4, -33, 5, -33, 3, 4, 4, 4, 4, 4, 4, ~
## $ hzcy049a [3m[38;5;246m<dbl>[39m[23m -33, 4, 3, 4, -33, 2, 3, 4, -33, 2, 1, 2, -33, 4, 2, -33, -33, 4, -33, 98, -33, 3, 4, 4, 4, 4, 4, 4,~
## $ hzcy050a [3m[38;5;246m<dbl>[39m[23m -33, 4, 4, 4, -33, 4, 3, 4, -33, 2, 2, 1, -33, 4, 4, -33, -33, 5, -33, 4, -33, 3, 4, 4, 4, 4, 4, 4, ~
## $ hzcy051a [3m[38;5;246m<dbl>[39m[23m -33, 4, 2, 4, -33, 3, 2, 5, -33, 5, 4, 3, -33, 4, 5, -33, -33, 5, -33, 3, -33, 2, 5, 4, 4, 4, 4, 3, ~
## $ hzcy052a [3m[38;5;246m<dbl>[39m[23m -33, 4, 5, 4, -33, 5, 4, 5, -33, 5, 4, 4, -33, 5, 5, -33, -33, 5, -33, 5, -33, 4, 4, 4, 4, 4, 3, 3, ~
## $ hzcy053a [3m[38;5;246m<dbl>[39m[23m -33, 1, 5, 1, -33, 5, 1, 1, -33, 6, 1, 5, -33, 1, 1, -33, -33, 2, -33, 1, -33, 1, 1, 2, 1, 1, 2, 1, ~
## $ hzcy054a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, 0, -33, -88, 0, 0, -33, -88, 0, -88, -33, 0, 0, -33, -33, -88, -33, 0, -33, 0, 0, -88, ~
## $ hzcy055a [3m[38;5;246m<dbl>[39m[23m -33, 1, -88, 0, -33, -88, 0, 0, -33, -88, 0, -88, -33, 0, 0, -33, -33, -88, -33, 0, -33, 0, 0, -88, ~
## $ hzcy056a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, 0, -33, -88, 0, 0, -33, -88, 1, -88, -33, 1, 1, -33, -33, -88, -33, 1, -33, 0, 0, -88, ~
## $ hzcy057a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, 0, -33, -88, 0, 0, -33, -88, 0, -88, -33, 0, 0, -33, -33, -88, -33, 0, -33, 0, 0, -88, ~
## $ hzcy058a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, 0, -33, -88, 0, 0, -33, -88, 0, -88, -33, 0, 0, -33, -33, -88, -33, 0, -33, 0, 0, -88, ~
## $ hzcy059a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, 0, -33, -88, 0, 0, -33, -88, 0, -88, -33, 0, 0, -33, -33, -88, -33, 0, -33, 0, 0, -88, ~
## $ hzcy060a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, 1, -33, -88, 1, 1, -33, -88, 0, -88, -33, 0, 0, -33, -33, -88, -33, 0, -33, 1, 1, -88, ~
## $ hzcy061a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy062a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 1, -33, -88, -3~
## $ hzcy063a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy064a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 1, -33, -88, -3~
## $ hzcy065a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy066a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy067a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy068a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 1, -33, -88, -3~
## $ hzcy069a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy070a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, 0, -33, -88, -3~
## $ hzcy071a [3m[38;5;246m<dbl>[39m[23m -33, 2, 2, 1, -33, 2, 1, 2, -33, 2, 2, 2, -33, 1, 2, -33, -33, 2, -33, 2, -33, 2, 2, 2, 1, 2, 2, 1, ~
## $ hzcy072a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 1, -33, -88, 1, -88, -33, -88, -88, -88, -33, 1, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy073a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 1, -88, -33, -88, -88, -88, -33, 1, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy074a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 1, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy075a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy076a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy077a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 1, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy078a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy079a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy080a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 1, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy081a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy083a [3m[38;5;246m<dbl>[39m[23m -33, -88, -88, 0, -33, -88, 0, -88, -33, -88, -88, -88, -33, 0, -88, -33, -33, -88, -33, -88, -33, -~
## $ hzcy084a [3m[38;5;246m<dbl>[39m[23m -33, 1, 1, 1, -33, 1, 0, 1, -33, 1, 1, 1, -33, 1, 0, -33, -33, 1, -33, 1, -33, 1, 1, 1, 1, 1, 1, 1, ~
## $ hzcy085a [3m[38;5;246m<dbl>[39m[23m -33, 1, 1, 0, -33, 0, 1, 0, -33, 0, 0, 0, -33, 0, 1, -33, -33, 0, -33, 0, -33, 1, 0, 0, 0, 0, 0, 1, ~
## $ hzcy086a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 1, -33, 1, 1, 1, -33, 1, 0, -33, -33, 0, -33, 0, -33, 1, 0, 1, 0, 1, 0, 0, ~
## $ hzcy087a [3m[38;5;246m<dbl>[39m[23m -33, 0, 1, 0, -33, 0, 0, 1, -33, 1, 0, 1, -33, 0, 0, -33, -33, 1, -33, 0, -33, 0, 0, 1, 1, 1, 1, 1, ~
## $ hzcy088a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 1, -33, 0, -33, 0, 0, 1, 0, 0, 0, 0, ~
## $ hzcy089a [3m[38;5;246m<dbl>[39m[23m -33, 1, 1, 0, -33, 1, 0, 1, -33, 0, 0, 1, -33, 0, 0, -33, -33, 0, -33, 1, -33, 0, 1, 1, 0, 1, 1, 0, ~
## $ hzcy090a [3m[38;5;246m<dbl>[39m[23m -33, 1, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 1, 0, 0, 0, 0, ~
## $ hzcy091a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 1, -33, -33, 0, -33, 0, -33, 0, 0, 1, 0, 0, 0, 0, ~
## $ hzcy092a [3m[38;5;246m<dbl>[39m[23m -33, 0, 1, 0, -33, 0, 0, 1, -33, 0, 0, 1, -33, 1, 1, -33, -33, 0, -33, 0, -33, 1, 1, 1, 1, 1, 1, 1, ~
## $ hzcy093a [3m[38;5;246m<dbl>[39m[23m -33, 1, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 1, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy095a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzcy096a [3m[38;5;246m<dbl>[39m[23m -33, 4, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, -3~
## $ hzcy097a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, -3~
## $ hzcy098a [3m[38;5;246m<dbl>[39m[23m -33, 0, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, -3~
## $ hzcy099a [3m[38;5;246m<dbl>[39m[23m -33, 1, -88, -88, -33, -88, -88, -88, -33, -88, -88, -88, -33, -88, -88, -33, -33, -88, -33, -88, -3~
## $ hzza001a [3m[38;5;246m<dbl>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ hzza002a [3m[38;5;246m<dbl>[39m[23m 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1~
## $ hzza003a [3m[38;5;246m<dbl>[39m[23m 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1~
## $ hzzq009a [3m[38;5;246m<dbl>[39m[23m -33, 4, 5, 4, -33, 4, 3, 5, -33, 4, 4, 5, -33, 4, 4, -33, -33, 4, -33, -99, -33, 4, 5, 5, 4, 4, 4, 4~
## $ hzzq016b [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 1, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 1, 0, 0, 0, ~
## $ hzzq023a [3m[38;5;246m<dbl>[39m[23m -33, 5, 5, 4, -33, 4, 4, 5, -33, 5, 4, 5, -33, 5, 4, -33, -33, 4, -33, 4, -33, 5, 5, 5, 4, 4, 4, 5, ~
## $ hzzp201a [3m[38;5;246m<dbl>[39m[23m -33, 31, 31, 31, -33, 31, 31, 31, -33, 31, 31, 31, -33, 31, 31, -33, -33, 31, -33, 31, -33, 31, 31, ~
## $ hzzp204a [3m[38;5;246m<dbl>[39m[23m -33, 210, 377, 309, -33, 429, 586, 366, -33, 283, 248, 703, -33, 466, 332, -33, -33, 223, -33, 306, ~
## $ hzzp207a [3m[38;5;246m<dbl>[39m[23m -33, 1584549879, 1584469614, 1584525461, -33, 1584461540, 1584823080, 1584543510, -33, 1584823044, 1~
## $ hzzr001a [3m[38;5;246m<dbl>[39m[23m -33, 3, 34, 4, -33, 3, 7, 2, -33, 2, 2, 3, -33, 2, 65, -33, -33, 6, -33, 2, -33, 6, 9, 2, 16, 5, 16,~
## $ hzzr002a [3m[38;5;246m<dbl>[39m[23m -33, 24, 83, 35, -33, 41, 67, 39, -33, 40, 33, 57, -33, 50, 142, -33, -33, 39, -33, 43, -33, 67, 74,~
## $ hzzr003a [3m[38;5;246m<dbl>[39m[23m -33, 48, 117, 67, -33, 90, 121, 140, -33, 75, 71, 112, -33, 74, 158, -33, -33, 62, -33, 72, -33, 107~
## $ hzzr004a [3m[38;5;246m<dbl>[39m[23m -33, 71, 161, 101, -33, 143, 212, 176, -33, 115, 97, 177, -33, 137, 188, -33, -33, 91, -33, 116, -33~
## $ hzzr005a [3m[38;5;246m<dbl>[39m[23m -33, 82, 175, 110, -33, 159, 230, 188, -33, 123, 101, 196, -33, 199, 204, -33, -33, 96, -33, 123, -3~
## $ hzzr006a [3m[38;5;246m<dbl>[39m[23m -33, 0, 206, 140, -33, 209, 264, 222, -33, 150, 128, 250, -33, 257, 220, -33, -33, 118, -33, 151, -3~
## $ hzzr007a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 0, 0, 0, 0, ~
## $ hzzr008a [3m[38;5;246m<dbl>[39m[23m -33, 101, 245, 166, -33, 288, 319, 238, -33, 194, 154, 293, -33, 305, 248, -33, -33, 138, -33, 178, ~
## $ hzzr009a [3m[38;5;246m<dbl>[39m[23m -33, 130, 293, 190, -33, 340, 388, 278, -33, 232, 188, 347, -33, 388, 266, -33, -33, 160, -33, 218, ~
## $ hzzr010a [3m[38;5;246m<dbl>[39m[23m -33, 145, 0, 216, -33, 0, 438, 310, -33, 0, 210, 0, -33, 410, 292, -33, -33, 0, -33, 245, -33, 388, ~
## $ hzzr011a [3m[38;5;246m<dbl>[39m[23m -33, 150, 312, 222, -33, 366, 446, 315, -33, 248, 221, 376, -33, 413, 307, -33, -33, 191, -33, 250, ~
## $ hzzr012a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 246, -33, 0, 509, 0, -33, 0, 0, 0, -33, 426, 0, -33, -33, 0, -33, 0, -33, 0, 0, 0, 966, 0~
## $ hzzr013a [3m[38;5;246m<dbl>[39m[23m -33, 189, 345, 266, -33, 412, 558, 355, -33, 267, 240, 427, -33, 458, 325, -33, -33, 214, -33, 293, ~
## $ hzzr014a [3m[38;5;246m<dbl>[39m[23m -33, 193, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 396, 0, 0, 0,~
## $ hzzr015a [3m[38;5;246m<dbl>[39m[23m -33, 200, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 0, -33, 0, -33, 0, 0, 415, 0, 0, 0,~
## $ hzzr016a [3m[38;5;246m<dbl>[39m[23m -33, 209, 360, 307, -33, 424, 576, 363, -33, 277, 247, 445, -33, 465, 331, -33, -33, 222, -33, 303, ~
## $ hzzr017a [3m[38;5;246m<dbl>[39m[23m -33, 210, 377, 309, -33, 429, 586, 366, -33, 283, 248, 703, -33, 466, 332, -33, -33, 223, -33, 306, ~
## $ hzzr018a [3m[38;5;246m<dbl>[39m[23m -33, 138, 307, 206, -33, 360, 416, 293, -33, 245, 200, 369, -33, 396, 283, -33, -33, 168, -33, 233, ~
## $ hzzr019a [3m[38;5;246m<dbl>[39m[23m -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, 0, -33, 0, 0, -33, -33, 187, -33, 0, -33, 0, 0, 361, 0, 0, 44~
```
]

---

## Selecting variables

We might want to reduce our data frame (or create a new one) to only include a subset of specific variables. Say, for example, we want to select only the variables that measure the risk of becoming infected with or spreading the Corona virus from our full data set. There are two options for doing this with `base R`:

Option 1
.small[

```r
gp_covid_risk <- gp_covid[, c("hzcy001a", "hzcy002a", "hzcy003a", "hzcy004a", "hzcy005a")]
# When subsetting with [], the first value refers to rows, the second to columns
# [, c("var1", "var2", ...)] means we want to select all rows but only some specific columns.
```
]

Option 2
.small[

```r
gp_covid_risk <- subset(gp_covid, TRUE, select = c(hzcy001a, hzcy002a, hzcy003a, hzcy004a, hzcy005a))
# Again, here the 2nd argument refers to the rows.
# Setting it to TRUE means that we want to include all rows in the subset.
```
]

---

## Selecting variables

You can also select variables based on their numeric index.

```r
gp_covid_demo <- gp_covid[, 6:13]

names(gp_covid_demo)
```

```
## [1] "sex"                   "age_cat"               "education_cat"         "intention_to_vote"     "choice_of_party"      
## [6] "political_orientation" "marstat"               "household"
```

---

## Selecting variables

In the `tidyverse`, we can create a subset of variables with the `dplyr` verb `select()`.

```r
gp_covid_risk <- gp_covid %>% 
 select(hzcy001a,
 hzcy002a,
 hzcy003a,
 hzcy004a,
 hzcy005a)

head(gp_covid_risk)
```

```
## # A tibble: 6 x 5
## hzcy001a hzcy002a hzcy003a hzcy004a hzcy005a
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -33 -33 -33 -33 -33
## 2 5 5 2 5 5
## 3 5 6 3 6 6
## 4 4 4 2 4 3
## 5 -33 -33 -33 -33 -33
## 6 3 3 -99 3 3
```

---

## Selecting a range of variables

There also is a shorthand notation for selecting a set of consecutive columns with `select()`.

```r
gp_covid_risk <- gp_covid %>% 
 select(hzcy001a:hzcy005a)

head(gp_covid_risk)
```

*Note*: You can also use this shorthand notation for the `select` argument of the `base R` function `subset()`.

---

## Selecting a range of variables

Same as for `base R`, you can also use the numeric index of variables in combination with `select()` from `dplyr`.

```r
gp_covid_demo <- gp_covid %>% 
 select(6:13)

names(gp_covid_demo)
```

---

## Unselecting variables

If you just want to exclude one or a few columns/variables, it is easier to unselect those than to select all others. Again, there's two ways to do this with `base R`.

Option 1
.small[

```r
gp_covid_cut <- gp_covid[!(names(gp_covid) %in% c("za_number", "version", "doi"))]
# The ! operator means "not" (i.e., it negates a condition)
# The %in% operator means "is included in" (in this case the following character vector)

dim(gp_covid_cut)
```

```
## [1] 3765  134
```
]

Option 2
.small[

```r
gp_covid_cut <- subset(gp_covid, TRUE, select = -c(za_number, version, doi))

dim(gp_covid_cut)
```

```
## [1] 3765  134
```
]

---

## Unselecting variables

You can also use `select()` from `dplyr` to exclude one or more columns/variables.

```r
gp_covid_cut <- gp_covid %>% 
 select(-c(za_number, version, doi))

dim(gp_covid_cut)
```

```
## [1] 3765  134
```

---

## Advanced ways of selecting variables

`dplyr` offers several helper functions for selecting variables. For a full list of those, you can check the [documentation for the `select()` function](https://dplyr.tidyverse.org/reference/select.html).

```r
gp_covid_cy <- gp_covid %>% 
 select(starts_with("hzcy"))

gp_covid_cat <- gp_covid %>% 
 select(ends_with("_cat"))

glimpse(gp_covid_cat)
```

```
## Rows: 3,765
## Columns: 2
## $ age_cat <dbl> 7, 7, 8, 4, 1, 10, 4, 7, 8, 1, 6, 8, 2, 6, 2, 2, 2, 7, 4, 8, 1, 7, 4, 3, 5, 7, 7, 6, 6, 5, 7, 7, 5, 7, 5, 2,~
## $ education_cat <dbl> 3, 2, 2, 3, 3, 2, 2, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 3, 2, 3, 3, ~
```

---

## Advanced ways of selecting variables

Another particularly useful selection helper is `where()`. You can, e.g., use `where()` to select only a specific type of variables.

```r
gp_covid_num <- gp_covid %>% 
 select(where(is.numeric))
```

---

## What's in a name?

One thing that we need to know - and might want to change - are the names of the variables in the dataset.

```r
names(gp_covid)
```

```
##   [1] "za_number"             "version"               "doi"                   "id"                    "cohort"               
##   [6] "sex"                   "age_cat"               "education_cat"         "intention_to_vote"     "choice_of_party"      
##  [11] "political_orientation" "marstat"               "household"             "hzcy001a"              "hzcy002a"             
##  [16] "hzcy003a"              "hzcy004a"              "hzcy005a"              "hzcy006a"              "hzcy007a"             
##  [21] "hzcy008a"              "hzcy009a"              "hzcy010a"              "hzcy011a"              "hzcy012a"             
##  [26] "hzcy013a"              "hzcy014a"              "hzcy015a"              "hzcy016a"              "hzcy018a"             
##  [31] "hzcy019a"              "hzcy020a"              "hzcy021a"              "hzcy022a"              "hzcy023a"             
##  [36] "hzcy024a"              "hzcy025a"              "hzcy026a"              "hzcy027a"              "hzcy028a"             
##  [41] "hzcy029a"              "hzcy030a"              "hzcy031a"              "hzcy032a"              "hzcy033a"             
##  [46] "hzcy034a"              "hzcy035a"              "hzcy036a"              "hzcy037a"              "hzcy038a"             
##  [51] "hzcy039a"              "hzcy040a"              "hzcy041a"              "hzcy042a"              "hzcy043a"             
##  [56] "hzcy044a"              "hzcy045a"              "hzcy046a"              "hzcy047a"              "hzcy048a"             
##  [61] "hzcy049a"              "hzcy050a"              "hzcy051a"              "hzcy052a"              "hzcy053a"             
##  [66] "hzcy054a"              "hzcy055a"              "hzcy056a"              "hzcy057a"              "hzcy058a"             
##  [71] "hzcy059a"              "hzcy060a"              "hzcy061a"              "hzcy062a"              "hzcy063a"             
##  [76] "hzcy064a"              "hzcy065a"              "hzcy066a"              "hzcy067a"              "hzcy068a"             
##  [81] "hzcy069a"              "hzcy070a"              "hzcy071a"              "hzcy072a"              "hzcy073a"             
##  [86] "hzcy074a"              "hzcy075a"              "hzcy076a"              "hzcy077a"              "hzcy078a"             
##  [91] "hzcy079a"              "hzcy080a"              "hzcy081a"              "hzcy083a"              "hzcy084a"             
##  [96] "hzcy085a"              "hzcy086a"              "hzcy087a"              "hzcy088a"              "hzcy089a"             
## [101] "hzcy090a"              "hzcy091a"              "hzcy092a"              "hzcy093a"              "hzcy095a"             
## [106] "hzcy096a"              "hzcy097a"              "hzcy098a"              "hzcy099a"              "hzza001a"             
## [111] "hzza002a"              "hzza003a"              "hzzq009a"              "hzzq016b"              "hzzq023a"             
## [116] "hzzp201a"              "hzzp204a"              "hzzp207a"              "hzzr001a"              "hzzr002a"             
## [121] "hzzr003a"              "hzzr004a"              "hzzr005a"              "hzzr006a"              "hzzr007a"             
## [126] "hzzr008a"              "hzzr009a"              "hzzr010a"              "hzzr011a"              "hzzr012a"             
## [131] "hzzr013a"              "hzzr014a"              "hzzr015a"              "hzzr016a"              "hzzr017a"             
## [136] "hzzr018a"              "hzzr019a"
```

---

## What's in a name?

As you can see, only a few of the variable names in the *GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany* data set are self-explanatory. The other variable names are composed of codes representing the study wave, study name, variable number and whether they are original or derived variables (have a look at the [*GESIS Panel* cheatsheet](https://www.gesis.org/fileadmin/upload/forschung/programme_projekte/Drittmittelprojekte/GESIS_Panel/gesis_panel_cheatsheet.pdf) if you want to know more), but they are not intuitive to understand. Hence, for analyzing them, especially if you want to create tables and/or plots, it can make sense to rename them. This is also a common step if you work with your own data. Depending on what method or tool(s) you used to collect the data, the variable names in your raw data may also not be what you want or need them to be.

---

## Renaming variables

It is good practice to use consistent naming conventions. Since `R` is case-sensitive, we might, e.g., want to only use lowercase letters. As spaces in variable names can cause problems, we could, e.g., decide to use 🐍 *snake_case* (🐫 *camelCase* is a common alternative; for a good brief discussion of options for avoiding spaces in variable names, see this [Medium post by Patrick Divine](https://medium.com/@pddivine/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841)).

---

# Become an ace of case

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\coding_cases.png" width="90%" style="display: block; margin: auto;" />
Artwork by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)

---

## Renaming variables

You can rename individual columns/variables in `base R` as follows:

```r
colnames(gp_covid)[colnames(gp_covid) == "hzcy048a"] <- "trust_government"
```

As for subsetting, you can also rename variables based on their numeric index.

```r
colnames(gp_covid)[4] <- "respondent_id"
```

---

## Renaming variables

An easier to use and more versatile option for renaming columns/variables is the `dplyr` function `rename()`.

```r
gp_covid_risk <- gp_covid_risk %>% 
 rename(risk_self = hzcy001a, # new_name = old_name
 risk_surroundings = hzcy002a,
 risk_hospital = hzcy003a,
 risk_quarantine = hzcy004a,
 risk_infect_others = hzcy005a)

names(gp_covid_risk)
```

```
## [1] "risk_self"          "risk_surroundings"  "risk_hospital"      "risk_quarantine"    "risk_infect_others"
```

---

## Renaming variables

For some more advanced renaming options, you can use the `dplyr` function `rename_with()`.

```r
gp_covid_risk %>% 
  rename_with(toupper) %>% 
  names()
```

```
## [1] "RISK_SELF"          "RISK_SURROUNDINGS"  "RISK_HOSPITAL"      "RISK_QUARANTINE"    "RISK_INFECT_OTHERS"
```

*Note*: The [`janitor` package](https://sfirke.github.io/janitor/) (which is `tidyverse`-oriented) can be used to facilitate several common data cleaning tasks. Among other things, it contains the function `clean_names()` that takes a data frame and creates column names that "are unique and consist only of the _ character, numbers, and letters" (from the help file for this function), with the default being 🐍 snake_case (but support for many other types of cases).

---

## Renaming variables

We can, e.g., use `rename_with()` in combination with `gsub()` (which we've already encountered in the session on *Getting Started*) to remove (or change) prefixes in variable names.

```r
gp_covid %>% 
  select(hzcy001a:hzcy005a) %>% 
  rename_with(~ gsub("hzcy", "risk", .x,
                     fixed = TRUE)) %>% 
  names()
```

```
## [1] "risk001a" "risk002a" "risk003a" "risk004a" "risk005a"
```

---

## Re~~wind~~name selecta

A nice thing about the `dplyr` verb `select` is that you can use it to select and rename variables in one step.

```r
gp_covid_risk <- gp_covid %>% 
 select(risk_self = hzcy001a,
 risk_surroundings = hzcy002a,
 risk_hospital = hzcy003a,
 risk_quarantine = hzcy004a,
 risk_infect_others = hzcy005a)

head(gp_covid_risk)
```

```
## # A tibble: 6 x 5
## risk_self risk_surroundings risk_hospital risk_quarantine risk_infect_others
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -33 -33 -33 -33 -33
## 2 5 5 2 5 5
## 3 5 6 3 6 6
## 4 4 4 2 4 3
## 5 -33 -33 -33 -33 -33
## 6 3 3 -99 3 3
```
]

---

## Moving columns

Although the positions of columns in a data frame do not matter for analyses or plotting (unless you want to select columns using their numerical index), you might want to change them. For this purpose, `dplyr` provides the `relocate()` function.

```r
gp_covid_risk <- gp_covid_risk %>% 
 relocate(risk_infect_others, .after = risk_surroundings)

glimpse(gp_covid_risk)
```

```
## Rows: 3,765
## Columns: 5
## $ risk_self <dbl> -33, 5, 5, 4, -33, 3, 4, 4, -33, 7, 4, 5, -33, 6, 4, -33, -33, 6, -33, 3, -33, 6, 5, 5, 6, 5, 7, 6, 5, ~
## $ risk_surroundings <dbl> -33, 5, 6, 4, -33, 3, 3, 4, -33, 5, 6, 6, -33, 6, 5, -33, -33, 6, -33, 5, -33, 6, 6, 97, 6, 6, 7, 6, 6,~
## $ risk_infect_others <dbl> -33, 5, 6, 3, -33, 3, 4, 4, -33, 2, 4, 6, -33, 4, 2, -33, -33, 6, -33, 3, -33, 6, 4, 3, 5, 4, 7, 5, 2, ~
## $ risk_hospital <dbl> -33, 2, 3, 2, -33, -99, 3, 3, -33, 3, 3, 7, -33, 3, 2, -33, -33, 4, -33, 3, -33, 1, 3, 4, 3, 3, 7, 3, 3~
## $ risk_quarantine <dbl> -33, 5, 6, 4, -33, 3, 3, 3, -33, 4, 5, 6, -33, 7, 3, -33, -33, 5, -33, 4, -33, 3, 4, 5, 6, 3, 7, 3, 2, ~
```

*Note*: You can also move a column before a specific other column by providing a variable name to the `.before` argument (instead of `.after`).

---

## `dplyr::relocate()`

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\dplyr_relocate.png" width="85%" style="display: block; margin: auto;" />
Artwork by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)

---

# [Exercise](https://jobreu.github.io/r-intro-gesis-2021/exercises/Exercise_2_1_1_Select_Rename.html) time 🏋️‍♀️💪🏃🚴

## [Solutions](https://jobreu.github.io/r-intro-gesis-2021/solutions/Exercise_2_1_1_Select_Rename.html)

---

## Filtering rows

In `R`, you can filter rows/observations dependent on one or more conditions.

To filter rows/observations you can use... 
- **comparison operators**:
 - **<** (smaller than)
 - **<=** (smaller than or equal to)
 - **==** (equal to)
 - **!=** (not equal to)
 - **>=** (larger than or equal to)
 - **>** (larger than)
 - **%in%** (included in)

---

## Filtering rows

... and combine comparisons with
- **logical operators**:
    - **&** (and)
    - **|** (or)
    - **!** (not)
    - **xor** (either or, not both)

---

## Filtering rows

Similar to selecting columns/variables, there are two options for filtering rows/observations with `base R`.

Option 1

```r
gp_covid_male <- gp_covid[gp_covid$sex == 1, ]

dim(gp_covid_male)
```

```
## [1] 1933  137
```

Option 2

```r
gp_covid_male <- subset(gp_covid, sex == 1)

dim(gp_covid_male)
```

```
## [1] 1933  137
```

---

## Filtering rows

The `dplyr` solution for filtering rows/observations is the verb `filter()`.

```r
gp_covid_male <- gp_covid %>% 
 filter(sex == 1)

dim(gp_covid_male)
```

```
## [1] 1933  137
```

---

## Filtering rows based on multiple conditions

```r
gp_covid_old_men <- gp_covid %>% 
 filter(sex == 1, age_cat > 7)

dim(gp_covid_old_men)
```

```
## [1] 626 137
```

---

## `dplyr::filter()`

<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\dplyr_filter.jpg" width="95%" style="display: block; margin: auto;" />
Illustration by [Allison Horst](https://github.com/allisonhorst/stats-illustrations)

---

## `dplyr::filter` - multiple conditions

By default, multiple conditions in `filter()` are added as & (and). You can, however, also specify multiple conditions differently.

**or** (cases for which at least one of the conditions is true)

```r
gp_covid_old_andor_male <- gp_covid %>% 
 filter(sex == 1 |
 age_cat > 7)

dim(gp_covid_old_andor_male)
```

```
## [1] 2432  137
```

---

## `dplyr::filter` - multiple conditions

**xor** (cases for which only one of the two conditions is true)

```r
gp_covid_old_or_male <- gp_covid %>%
 filter(xor(sex == 1, 
 age_cat > 7))

dim(gp_covid_old_or_male)
```

```
## [1] 1806  137
```

---

## Advanced ways of filtering rows

Similar to `select()` there are some helper functions for `filter()` for advanced filtering of rows. For example, you can...

- Filter rows based on a range in a numeric variable

```r
gp_covid_centrist <- gp_covid %>% 
 filter(between(political_orientation, 4, 6))

dim(gp_covid_centrist)
```

```
## [1] 2049  137
```

*Note*: The range specified in `between()` is inclusive (on both sides).

---

## Advanced ways of filtering rows

- Filter rows based on the values of specific variables matching certain criteria

```r
gp_covid_risk_low <- gp_covid_risk %>% 
 filter(if_all(everything(), ~ . < 4)) # read: if the values of all vars in this df are < 4

dim(gp_covid_risk_low)
```

```
## [1] 926   5
```

*Note*: The helper function `if_any()` can be used to specify that at least one of the variables needs to match a certain criterion.

---

## Selecting columns + filtering rows

Of course, you can also combine the selection of columns and the filtering of rows.

`Base R` option 1

```r
gp_covid_risk_male <- gp_covid[gp_covid$sex == 1, c("hzcy001a", "hzcy002a", "hzcy003a", "hzcy004a", "hzcy005a")]

dim(gp_covid_risk_male)
```

```
## [1] 1933    5
```

`Base R` option 2

```r
gp_covid_risk_male <- subset(gp_covid, sex == 1, select = c(hzcy001a, hzcy002a, hzcy003a, hzcy004a, hzcy005a))

dim(gp_covid_risk_male)
```

```
## [1] 1933    5
```

---

## Selecting columns + filtering rows

The `tidyverse` approach solution for combining the selection of columns and the filtering of rows is chaining these steps together in a pipe (in this case, the order of the pipe steps does not matter).

```r
gp_covid_risk_male <- gp_covid %>% 
 filter(sex == 1) %>% 
 select(hzcy001a:hzcy005a)

dim(gp_covid_risk_male)
```

```
## [1] 1933    5
```

---

## (Re-)Arranging the order of rows

Again, while this does not directly matter for analyses or plotting (unless you want to filter rows by their numeric index), you can rearrange the order of rows in a data set. In `base R` this can be achived as follows:

```r
gp_covid <- gp_covid[order(gp_covid$age_cat),]

head(gp_covid[, 6:13])
```

```
## # A tibble: 6 x 8
## sex age_cat education_cat intention_to_vote choice_of_party political_orientation marstat household
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 1 3 -33 -33 4 2 3
## 2 1 1 3 2 2 7 2 3
## 3 2 1 3 2 1 4 2 2
## 4 2 1 3 2 5 2 2 3
## 5 1 1 3 2 3 4 2 2
## 6 1 1 3 2 3 6 2 3
```

---

## (Re-)Arranging the order of rows

Of course, it is also possible to sort a data frame in descending order of a variable.

```r
gp_covid <- gp_covid[order(desc(gp_covid$age_cat)),]

head(gp_covid[, 6:13])
```

```
## # A tibble: 6 x 8
## sex age_cat education_cat intention_to_vote choice_of_party political_orientation marstat household
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 10 2 2 6 10 1 2
## 2 2 10 1 2 98 5 1 2
## 3 1 10 3 2 3 7 1 2
## 4 2 10 1 2 2 5 1 2
## 5 1 10 2 2 2 4 1 2
## 6 1 10 1 2 2 2 4 2
```

---

## (Re-)Arranging the order of rows

You can also sort your data frame by more than one variable.

```r
gp_covid <- gp_covid[order(gp_covid$age_cat, gp_covid$education_cat),]

head(gp_covid[, 6:13])
```

```
## # A tibble: 6 x 8
## sex age_cat education_cat intention_to_vote choice_of_party political_orientation marstat household
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 1 1 8 2 3
## 2 2 1 1 1 98 5 1 3
## 3 2 1 1 2 5 7 2 3
## 4 1 1 1 -33 -33 5 2 2
## 5 1 1 1 2 7 5 2 3
## 6 1 1 1 2 3 6 2 3
```

---

## (Re-)Arranging the order of rows

The `dplyr` verb for changing the order of rows in a data set is `arrange()` and you can use it in the same ways as the `base R` equivalent: Sorting by a single variable in ascending order, ...

```r
gp_covid %>% 
  arrange(age_cat) %>% 
  select(sex:household) %>% 
  glimpse()
```

```
## Rows: 3,765
## Columns: 8
## $ sex <dbl> 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2~
## $ age_cat <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ education_cat <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3~
## $ intention_to_vote <dbl> 1, 1, 2, -33, 2, 2, 2, -99, 2, 2, -33, 2, 2, 2, 2, 1, -33, 2, 2, 2, 2, 2, 2, 2, -33, -33, 2, 2, -33,~
## $ choice_of_party <dbl> 1, 98, 5, -33, 7, 3, 4, -99, 4, 98, -33, 98, 6, 98, 4, -99, -33, 2, 1, 5, 3, 3, 5, 5, -33, -33, 3, 3~
## $ political_orientation <dbl> 8, 5, 7, 5, 5, 6, 2, 5, 2, 6, -33, 5, 8, 5, 2, 6, 4, 7, 4, 2, 4, 6, 5, 3, 2, 1, 5, 6, 2, 3, 4, 3, 3,~
## $ marstat <dbl> 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2~
## $ household <dbl> 3, 3, 3, 2, 3, 3, 2, 3, 3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 2, 3, 2, 3, 2, 3, 2, 3, 3, 2, 3, 3, 3, 3, 3, 3~
```

---

## (Re-)Arranging the order of rows

... sorting by a single variable in descending order, ...

```r
gp_covid %>% 
  arrange(desc(age_cat)) %>% 
  select(sex:household) %>% 
  glimpse()
```

```
## Rows: 3,765
## Columns: 8
## $ sex <dbl> 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1~
## $ age_cat <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, ~
## $ education_cat <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ intention_to_vote <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -99, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, -99, 2, 2, ~
## $ choice_of_party <dbl> 98, 2, 2, 2, 4, 2, 2, 98, 1, 1, 1, 2, 2, 5, 6, 6, 2, 2, 1, 2, 98, 2, 1, 2, 2, 1, 6, 98, 1, 2, 1, 1, ~
## $ political_orientation <dbl> 5, 5, 2, 0, 0, 5, 2, 5, 5, 5, 5, 5, 6, 3, 8, 10, 4, 5, 3, 3, 5, 2, 6, 5, 2, 6, 8, 6, 5, 4, 8, 8, 8, ~
## $ marstat <dbl> 1, 1, 4, 1, 4, 1, 1, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 3, 1, 2, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ household <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 1, 2, 1, 2, 3, 2, 2, 2, 2, 2, 2, 2~
```

---

## (Re-)Arranging the order of rows

... sorting by more than one variable.

```r
gp_covid %>% 
  arrange(age_cat, education_cat) %>% 
  select(sex:household) %>% 
  glimpse()
```

---

# [Exercise](https://jobreu.github.io/r-intro-gesis-2021/exercises/Exercise_2_1_2_Filter_Arrange.html) time 🏋️‍♀️💪🏃🚴

## [Solutions](https://jobreu.github.io/r-intro-gesis-2021/solutions/Exercise_2_1_2_Filter_Arrange.html)

---