Introduction to R for Data Analysis

# Introduction to R for Data Analysis
## Data Visualization - Part 2
### Johannes Breuer & Stefan Jünger
### 2021-08-05

---

---

## Content of this session

.pull-left[
**What we will do**
- Visual checks of model assumptions
- Plotting coefficients
- Plotting predictions
- Plotting multiple models
]

---

## Packages in this session

.pull-left[
We will, again, use some packages from the `easystats` collection
- `parameters`
- `performance`
- **`see`**
]

.pull-right[
<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\logo_wall.png" width="1600" style="display: block; margin: auto;" />
]

.pull-left[
While the `see` package is straightforward to use, it has some limitations. From time to time we will, hence, also switch to the `sjPlot` package.
]

.pull-right[
<img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\r-intro-gesis-2021\content\img\sjplot_logo.png" width="40%" style="display: block; margin: auto;" />
]

---

## Data for this session

As in the previous sessions, we will use the data from the from the *GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany*.

```r
gp_covid <- 
  haven::read_sav(
    "./data/ZA5667_v1-1-0.sav"
  ) %>% 
  sjlabelled::set_na(na = c(-1:-99, 97, 98)) %>% 
  rowwise() %>%
  mutate(
    mean_trust = 
      mean(
        c_across(hzcy044a:hzcy052a),
        na.rm = TRUE
      )
  ) %>% 
  ungroup() %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(
    pol_leaning_cat = 
      case_when(
        between(political_orientation, 0, 3) ~ "left",
        between(political_orientation, 4, 7) ~ "center",
        political_orientation > 7 ~ "right"
      ) %>% 
      as.factor()
  ) %>% 
  filter(pol_leaning_cat != "NA")
```
]

---

## Fictional research question

In this session, we will investigate if older people have higher levels of trust in institutions when it comes to dealing with the COVID-19 pandemic. We will also assess whether politically right-leaning people tend to have lower levels of trust. We are also interested in how these two variables interact with each other.

**Of course, this 'research' is purely based on ad-hoc hypotheses.**

---

## Estimating a linear model (OLS)

We start with estimating a linear regression model including our variables of interest and two control variables.

```r
linear_model <-
  lm(
    mean_trust ~
      age_cat + sex + education_cat + pol_leaning_cat,
    data = gp_covid
  )

library(parameters)

model_parameters(linear_model)
```

```
## Parameter               | Coefficient |       SE |         95% CI | t(3106) |      p
## ------------------------------------------------------------------------------------
## (Intercept)             |        3.52 |     0.08 | [ 3.37,  3.67] |   46.55 | < .001
## age_cat                 |        0.03 | 5.25e-03 | [ 0.02,  0.04] |    6.13 | < .001
## sex                     |        0.09 |     0.02 | [ 0.05,  0.14] |    3.87 | < .001
## education_cat           |        0.02 |     0.02 | [-0.02,  0.05] |    0.89 | 0.374 
## pol_leaning_cat [left]  |        0.05 |     0.03 | [-0.01,  0.10] |    1.72 | 0.085 
## pol_leaning_cat [right] |       -0.28 |     0.05 | [-0.38, -0.17] |   -5.21 | < .001
```

---

## Quick inspection using `base R`

In this session, we use `base R` plots again, but only to quickly illustrate that you could use them for some basic model checks, too.

```r
par(mfrow = c(2, 2))
plot(linear_model)
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />
]

---

## Using `easystats`

`easystats` provides some nice features for inspecting model fit. Here's a simple check for normality of the residuals. It is a combination of using a function from the `performance` package and then using `see` to plot it.

```r
library(performance)
library(see)

check_normality(linear_model) %>% 
  plot()
```
]

```
## Warning: Non-normality of residuals detected (p < .001).
```

<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-3-1.png" width="75%" style="display: block; margin: auto;" />
]

---

## QQ-Plot (`easystats`)

You can also draw some classic QQ-plots for this purpose.

```r
check_normality(linear_model) %>% 
  plot(type = "qq")
```
]

```
## Warning: Non-normality of residuals detected (p < .001).
```

<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-4-1.png" width="85%" style="display: block; margin: auto;" />
]

---

## Heteroscedasticity (`easystats`)

Something that is always a bit of an issue in regression analysis is heteroscedasticity. Again, it's straightforward to check for this using `performance` and `see`.

```r
check_heteroscedasticity(
  linear_model
) %>% 
  plot()
```
]

```
## Warning: Heteroscedasticity (non-constant error variance) detected (p < .001).
```

<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-5-1.png" width="85%" style="display: block; margin: auto;" />
]

---

## Check outliers (`easystats`)

The same applies for outlier analysis.

```r
check_outliers(linear_model) %>% 
  plot()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## There is more

In the collection of `easystats` packages, there are [way more procedures](https://easystats.github.io/see/articles/performance.html) you can apply to check your models. To replicate the summary plot from `base R` with some nicer formatting, we can use the following function from the `performance` package:

```r
check_model(linear_model)
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-7-1.png" width="80%" style="display: block; margin: auto;" />
]

---

# [Exercise](https://jobreu.github.io/r-intro-gesis-2021/exercises/Exercise_4_2_1_Plotting_Diagnostics.html) time 🏋️‍♀️💪🏃🚴

## [Solutions](https://jobreu.github.io/r-intro-gesis-2021/solutions/Exercise_4_2_1_Plotting_Diagnostics.html)

---

## Getting to the substantive part

After checking the model assumptions, and pretending everything's fine (it's not...), we will deal with the substantive part of our fictional research question. The traditional way of doing that it printing some regression tables and checking the individual coefficients. That's fine but plotting is also nice. For that purpose, we will now turn to...
- Coefficient Plots / Forest Plots
- Prediction Plots

---

## Coefficient plot (`parameters`)

`parameter` provides an easy method for creating a coefficient plot in one to two lines of code.

```r
library(parameters)

model_parameters(linear_model) %>% 
  plot()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-8-1.png" width="95%" style="display: block; margin: auto;" />
]

---

## Changing labels: it's `ggplot2`-based

All of the plots produced by `parameters` and `see` are based on `ggplot2`. This means that, after calling the plotting functions from these packages, we can draw additional or manipulate existing plot layers using the usual `+`-structure.

```r
library(ggplot2)

model_parameters(linear_model) %>% 
  plot() +
  scale_x_discrete(
    labels = # in reversed order!
      c(
        "Pol. Right (Ref. Center)",
        "Pol. Left (Ref. Center)", 
        "Education", 
        "Sex", 
        "Age"
      )
  )
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## Paint it black! 🖤

Thus, we can easily manipulate the appearance of the plots, e.g., by turning to a less colorful theme.

```r
library(ggplot2)

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-10-1.png" width="95%" style="display: block; margin: auto;" />
]

---

## The advantage of prediction plots

Regression coefficients are sometimes a bit hard to read
- they are based on the scale of the independent variable when unstandardized
- it is ambigious what they actually mean substantially
- for example, is a coefficient of .2 or .05 (practically) meaningful or not?

Predictions are a solution
- they basically draw the regression line into the scatter plot of the Y and X variables
- we can read at which level of X which value of Y is expected

---

## Predictions (`sjPlot`)

For the purpose of plotting predictions, we will use the `plot_model()` function from the `sjPlot` package. In principle, it would also be possible with the `easystats` packages, but I find `sjPlot` easier to work with in this case...

```r
library(sjPlot)

linear_model %>% 
  plot_model(
    type = "pred", 
    terms = "age_cat"
  )
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-11-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## Customizing `sjplot`s

As for the `easytats` packages, the plots produced by `sjplot` are `ggplot2`-based.

```r
linear_model %>% 
  plot_model(
    type = "pred", 
    terms = "age_cat"
  ) +
  xlab("Age") +
  ylab("Trust") +
  ggtitle("Linear Model") +
  theme_bw()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-12-1.png" width="95%" style="display: block; margin: auto;" />
]

---

## Interaction effects

Let's now turn to the interaction between age and political leaning. Do right-wing political attitudes moderate the relationship between age and trust?

```r
linear_model_interaction <-
  lm(
    mean_trust ~
      age_cat + sex + education_cat + pol_leaning_cat +
      age_cat:pol_leaning_cat,
    data = gp_covid
  )

model_parameters(linear_model_interaction)
```

```
## Parameter                         | Coefficient |       SE |         95% CI | t(3104) |      p
## ----------------------------------------------------------------------------------------------
## (Intercept)                       |        3.49 |     0.08 | [ 3.33,  3.64] |   44.06 | < .001
## age_cat                           |        0.04 | 6.41e-03 | [ 0.02,  0.05] |    5.78 | < .001
## sex                               |        0.09 |     0.02 | [ 0.05,  0.14] |    3.82 | < .001
## education_cat                     |        0.02 |     0.02 | [-0.02,  0.05] |    0.94 | 0.347 
## pol_leaning_cat [left]            |        0.21 |     0.08 | [ 0.05,  0.36] |    2.63 | 0.009 
## pol_leaning_cat [right]           |       -0.60 |     0.17 | [-0.94, -0.26] |   -3.45 | < .001
## age_cat * pol_leaning_cat [left]  |       -0.02 |     0.01 | [-0.05,  0.00] |   -2.16 | 0.031 
## age_cat * pol_leaning_cat [right] |        0.05 |     0.02 | [ 0.00,  0.09] |    1.92 | 0.055
```

---

## Plotting interaction effects

Regression parameters for interactions are particular difficult to interpret. Again, `sjPlot` can help in this regard.

```r
linear_model_interaction %>% 
  plot_model(
    type = "int"
  ) +
  xlab("Age") +
  ylab("Trust") +
  ggtitle("Interaction Model") +
  theme_bw()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-13-1.png" width="95%" style="display: block; margin: auto;" />
]

---

## Sidenote: colors again

Here's an example of the different color types in `ggplot2` (`color` and `fill`) and how you can manipulate them.

```r
linear_model_interaction %>% 
  plot_model(
    type = "int"
  ) +
  xlab("Age") +
  ylab("Trust") +
  ggtitle("Interaction Model") +
  theme_bw() +
  scale_color_manual(
    values = c("red", "blue", "black"),
    labels = c("Center", "Lefties", "Rigthies"),
    name = "Political Leaning"
  ) +
  scale_fill_manual(
    values = rep("grey", 3)
  )
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-14-1.png" width="95%" style="display: block; margin: auto;" />
]

---

# [Exercise](https://jobreu.github.io/r-intro-gesis-2021/exercises/Exercise_4_2_2_Plotting_a_Regression.html) time 🏋️‍♀️💪🏃🚴

## [Solutions](https://jobreu.github.io/r-intro-gesis-2021/solutions/Exercise_4_2_2_Plotting_a_Regression.html)

---

## Estimating multiple models

Something you may often do in your research is estimating multiple models because of
- changing covariates or dependent variables
- different model specifications

We can compare models side by side using regression tables, but by now we should know: plotting is better.

Let's create a binary variable using a median split ([don't try this at home](http://www.stat.columbia.edu/~gelman/research/published/thirds5.pdf))

```r
gp_covid <-
  gp_covid %>% 
  mutate(
    mean_trust_median = 
      ifelse(
        mean_trust <= median(mean_trust, na.rm = TRUE),
        0,
        1
      )
  )
```

---

## Logistic regression

We then can run a logistic regression based on the binary variable from the median split.

```r
logistic_model <-
  glm(
    mean_trust_median ~
      age_cat + sex + education_cat + pol_leaning_cat,
    data = gp_covid,
    family = binomial(link = "logit"),
  )

model_parameters(logistic_model)
```

```
## Parameter               | Log-Odds |   SE |         95% CI |     z |      p
## ---------------------------------------------------------------------------
## (Intercept)             |    -1.16 | 0.23 | [-1.61, -0.72] | -5.09 | < .001
## age_cat                 |     0.09 | 0.02 | [ 0.06,  0.12] |  5.81 | < .001
## sex                     |     0.17 | 0.07 | [ 0.03,  0.31] |  2.32 | 0.020 
## education_cat           |     0.05 | 0.06 | [-0.06,  0.16] |  0.86 | 0.391 
## pol_leaning_cat [left]  |     0.09 | 0.08 | [-0.07,  0.25] |  1.15 | 0.251 
## pol_leaning_cat [right] |    -0.66 | 0.17 | [-1.00, -0.33] | -3.88 | < .001
```

---

## Linear probability model

Alternatively, a more modern approach is using a linear regression model that converts to a linear probability model. Its regression coefficients can be interpreted as increases or decreases on a linear probability scale.

```r
linear_probability_model <-
  lm(
    mean_trust_median ~
      age_cat + sex + education_cat + pol_leaning_cat,
    data = gp_covid,
  )

model_parameters(linear_probability_model)
```

```
## Parameter               | Coefficient |       SE |         95% CI | t(3106) |      p
## ------------------------------------------------------------------------------------
## (Intercept)             |        0.22 |     0.05 | [ 0.11,  0.33] |    3.96 | < .001
## age_cat                 |        0.02 | 3.82e-03 | [ 0.01,  0.03] |    5.87 | < .001
## sex                     |        0.04 |     0.02 | [ 0.01,  0.08] |    2.32 | 0.020 
## education_cat           |        0.01 |     0.01 | [-0.01,  0.04] |    0.86 | 0.392 
## pol_leaning_cat [left]  |        0.02 |     0.02 | [-0.02,  0.06] |    1.16 | 0.247 
## pol_leaning_cat [right] |       -0.15 |     0.04 | [-0.23, -0.08] |   -3.94 | < .001
```

---

## Comparing model fits (`performance`)

The `performance` package gives you a nice spider plot when you compare the performance of models visually.

```r
library(performance)

compare_performance(
  linear_model,
  logistic_model,
  linear_probability_model
) %>% 
  plot()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-15-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## Comparing model parameters (`sjPlot`)

To compare coefficients from different regression models, we can also use `sjplot`. This time, instead of using `plot_model()`, we use `plot_models()`.

```r
plot_models(
  linear_model,
  logistic_model,
  linear_probability_model,
  std.est = "std"
)
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-16-1.png" width="85%" style="display: block; margin: auto;" />
]

---

## Adjusting it within the function

The nice thing about `sjPlot` is that it provides a lot of options for customizing the plot. For example, to change the legend and axis labels we can use the `m.label` and `axis.labels` arguments.

```r
plot_models(
  linear_model,
  logistic_model,
  linear_probability_model,
  std.est = "std",
  m.labels = 
    c(
      "Linear",
      "Logistic", 
      "Linear Probability"
    ),
  axis.labels = 
    c(
      "Pol. Right (Ref. Center)",
      "Pol. Left (Ref. Center)", 
      "Education", 
      "Sex", 
      "Age"
    )
) 
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-17-1.png" width="95%" style="display: block; margin: auto;" />
]

---

## More customization

As `sjplot` is also based on `ggplot2` we easily add further layers to our plot, such as a theme.

```r
plot_models(
  linear_model,
  logistic_model,
  linear_probability_model,
  # std.est = "std",
  m.labels = 
    c(
      "Linear",
      "Logistic", 
      "Linear Probability"
    ),
  axis.labels = 
    c(
      "Pol. Right (Ref. Center)",
      "Pol. Left (Ref. Center)", 
      "Education", 
      "Sex", 
      "Age"
    )
) +
  theme_bw()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-18-1.png" width="85%" style="display: block; margin: auto;" />
]

---

## Plotting predictions from multiple models

Neither the `easystats` packages nor `sjPlot` provide an easy way for plotting predictions from multiple models. For this purpose, we have to dig a bit deeper into the wrangling of the underlying data. Fortunately, what `sjPlot` supports is extracting the data from predicting a model. With these data at hand, we can simply create a data frame comprising predictions from multiple models.

---

## Extracting the predictions

The procedure to extract predictions from a model is similar to using the plotting command. But instead of using `plot_model()`, we use `get_model_data()`.

```r
linear_predictions <-
  get_model_data(
    linear_model,
    term = "age_cat",
    type = "pred"
  )

linear_predictions
```

```
## # Predicted values of mean_trust
## 
## age_cat | Predicted | group_col |       95% CI
## ----------------------------------------------
##       1 |      3.73 |         1 | [3.67, 3.79]
##       2 |      3.76 |         1 | [3.71, 3.82]
##       3 |      3.80 |         1 | [3.75, 3.84]
##       4 |      3.83 |         1 | [3.79, 3.87]
##       6 |      3.89 |         1 | [3.86, 3.92]
##       7 |      3.92 |         1 | [3.89, 3.95]
##       8 |      3.96 |         1 | [3.92, 3.99]
##      10 |      4.02 |         1 | [3.97, 4.07]
## 
## Adjusted for:
## *             sex =   1.49
## *   education_cat =   2.48
## * pol_leaning_cat = center
```

---

## More predictions

We can extract the predictions from the logistic regression and the linear probability model the same way.

```r
logistic_predictions <-
  get_model_data(
    logistic_model,
    term = "age_cat",
    type = "pred"
  )
```
]

```r
linear_probability_predictions <-
  get_model_data(
    linear_probability_model,
    term = "age_cat",
    type = "pred"
  )
```
]

---

## Combining the predictions

.pull-left[
We combine the data using simple data wrangling operations. 
- `bind_rows()` is a function from `dplyr` for appending rows from one data set to another
  - there's also `rbind()` in `base R`
- we use `mutate()` to create an indicator to know which row belongs to what model
]

```r
library(dplyr)
library(tibble)

all_predictions <-
  bind_rows(
    linear_predictions %>% 
      mutate(
        model = "Linear Model"
      ),
    logistic_predictions %>% 
      mutate(
        model = "Logistic Model"
      ),
    linear_probability_predictions %>% 
      mutate(
        model = "Linear Probability Model"
      )
  ) %>% 
  as_tibble()
```
]

---

## Plotting them with `facet_wrap()`

To create multiple prediction plots with a subplot for each model we use barebones `ggplot2`. Fortunately, this is not more complex than using `easystats` packages or `sjPlot`.

```r
ggplot(
  all_predictions, 
  aes(x = x, y = predicted)
) +
  geom_line() +
  geom_line(aes(y = conf.high), linetype = "dashed") +
  geom_line(aes(y = conf.low), linetype = "dashed") +
  facet_wrap(~model) +
  theme_bw()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## Plotting only the binary models

As the plot is not particularly pretty because of different scales of the dependent variable, we can also base it on a filtered version of the predictions data frame.

```r
only_two <-
  ggplot(
    all_predictions %>% 
      filter(model != "Linear Model"), 
    aes(x = x, y = predicted)
  ) +
  geom_line() +
  geom_line(aes(y = conf.high), linetype = "dashed") +
  geom_line(aes(y = conf.low), linetype = "dashed") +
  facet_wrap(~model) +
  theme_bw()

only_two
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-20-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## Average marginal effects (AME)

In the previous session, we also (briefly) discussed average marginal effects.
- parameter estimates based on the 'real' empirical values of all other covariates
- handy when relationships are non-linear
- way easier to interpret than usual regression coefficients
- also nice when using predictions

---

## Extracting AME

We can also use the `margins` package and extract prediction information for our models. While the resulting columns names are different, the `cplot()` function from that package also gives us information about X-values, estimates, and confidence bands.

```r
library(margins)

logistic_model_AME <-
  cplot(
    logistic_model,
    what = "prediction",
    draw = FALSE
  )

linear_probability_model_AME <-
  cplot(
    linear_probability_model,
    what = "prediction",
    draw = FALSE
  )
```

---

## Combining AME data

After extracting the information, we can simply combine the two data sets again to get predictions for all included models.

```r
all_AME <-
  bind_rows(
    logistic_model_AME %>% 
      mutate(model = "Logistic Model"),
    linear_probability_model_AME %>% 
      mutate(model = "Linear Probability Model")
  ) %>% 
  as_tibble()
```

---

## Plotting predictions based on AME

Plotting them can then be done the same way as before.

```r
only_two_AME <- 
  ggplot(
    all_AME, 
    aes(x = xvals, y = yvals)
  ) +
  geom_line() +
  geom_line(aes(y = upper), linetype = "dashed") +
  geom_line(aes(y = lower), linetype = "dashed") +
  facet_wrap(~model) +
  theme_bw()

only_two_AME
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-21-1.png" width="90%" style="display: block; margin: auto;" />
]

---

## Combining them with previous predictions

Using the `patchwork` package again, we can even plot the 'simple' predictions together with the AME ones in one plot. *Note*: In principle, this would have been possible with combining the data as well.

```r
library(patchwork)

(only_two +
    ggtitle("Simple Predictions") +
    ylab("Predicted Trust Value") +
    xlab("Age")) / 
  (only_two_AME +
     ggtitle("AME Predictions") +
     ylab("Predicted Trust Value") +
     xlab("Age"))
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-22-1.png" width="80%" style="display: block; margin: auto;" />
]

---

## A programmy alternative/extension

Say, e.g., we are interested in how the relationship between age and trust varies between different party supporters. We could estimate such a relationship in form of an interaction effect. But let's just stick with estimating separate models for demonstration.

```r
library(purrr)

gp_covid_parties <-
  gp_covid %>% 
  group_by(choice_of_party) %>% 
  group_split(choice_of_party)

my_models <-
  gp_covid_parties %>% 
  purrr::map(
    ~lm(
      mean_trust ~ 
        age_cat + 
        sex + 
        education_cat,
      data = .x
      )
  )
```

---

## Putting all together using `purrr::imap()`

```r
programmy_AME <- 
  my_models %>% 
  imap(
    ~cplot(
      .x,
      dx = "mean_trust",
      what = "prediction",
      draw = FALSE,
      data = gp_covid_parties[[.y]]
    ) %>% 
      mutate(
        choice_of_party = .y
      )
  ) %>% 
  do.call(rbind, .)
```

---

## Plotting the AMEs

```r
programmy_AME %>% 
  mutate(
    choice_of_party = 
      recode_factor(
        choice_of_party,
        `1` = "CDU/CSU",
        `2` = "SPD",
        `3` = "FDP",
        `4` = "Linke",
        `5` = "Gruene",
        `6` = "AfD",
        `7` = "Other"
      )
  ) %>% 
  ggplot( 
    aes(x = xvals, y = yvals)
  ) +
  geom_line() +
  geom_line(aes(y = upper), linetype = "dashed") +
  geom_line(aes(y = lower), linetype = "dashed") +
  facet_wrap(~choice_of_party, ncol = 2) +
  theme_bw()
```
]

.pull-right[
<img src="data:image/png;base64,#4_2_Data_Visualization_Part_2_files/figure-html/unnamed-chunk-23-1.png" width="95%" style="display: block; margin: auto;" />
]

---

## What is more?

Sometimes getting to a plot that is ready for publication can be a long way to go. Fortunately, the `easystats` packages and `sjPlot` help a lot to convert it to a medium walking distance.

There are also other packages that can facilitate the goal of creating publication-ready plots, such as the [`dotwhisker`](https://cran.r-project.org/web/packages/dotwhisker/) package for coefficient plots.

Things can get messy if we have to work with statistical models that are not incorporated into the presented packages. For example, AME models or when you want to make use of results produced with other statistical software like *Stata* (for an example, you can have a look at [my code](https://stefanjuenger.github.io/land_use_disadvantages/Land_Use_Disadvantages_Main_Analysis.html) for a [recent paper](https://doi.org/10.1177/00420980211023206)).

---

# [Exercise](https://jobreu.github.io/r-intro-gesis-2021/exercises/Exercise_4_2_3_Combining_Predictions.html) time 🏋️‍♀️💪🏃🚴

## [Solutions](https://jobreu.github.io/r-intro-gesis-2021/solutions/Exercise_4_2_3_Combining_Predictions.html)

---

## Extracurricular activities

Explore the data from the *GESIS Panel Special Survey on the Coronavirus SARS-CoV-2 Outbreak in Germany* a bit further and think about some confirmatory analyses you would find interesting for these data. Also think about how you would (want to) visualize your results. If you have the time and motivation, feel free to actually do some of the analyses and visualizations you find interesting, building on the code from today's lectures and exercises.