Automatic Sampling and Analysis of YouTube Data

.title[
# Automatic Sampling and Analysis of YouTube Data
]
.subtitle[
## Recap, outlook, & practice
]
.author[
### Johannes Breuer, Annika Deubel, & M. Rohangis Mohseni
]
.date[
### February 15th, 2023
]

---

---

## Course Recap (1)

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Session </th>
   <th style="text-align:left;"> Example content </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Introduction </td>
   <td style="text-align:left;"> Why is YouTube data interesting for research? </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Tools for collecting YouTube data </td>
   <td style="text-align:left;"> Different options for collecting YouTube data </td>
  </tr>
  <tr>
   <td style="text-align:left;"> The YouTube API </td>
   <td style="text-align:left;"> API access, API requests, quota limits </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Collecting YouTube data with R </td>
   <td style="text-align:left;"> Collecting channel/video stats & viewer comments </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Processing and cleaning user comments </td>
   <td style="text-align:left;"> Data wrangling, string operations, emoji dictionaries </td>
  </tr>
</tbody>
</table>

---

## Course Recap (2)

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Session </th>
   <th style="text-align:left;"> Example content </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Basic text analysis of user comments </td>
   <td style="text-align:left;"> Counting and visualizing the frequencies of words and emojis in comments </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sentiment analysis of user comments </td>
   <td style="text-align:left;"> Assigning sentiment scores to words and emojis </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Excursus: Retrieving video subtitles </td>
   <td style="text-align:left;"> Retrieving and parsing YouTube video subtitles </td>
  </tr>
</tbody>
</table>

---

## Where To Go From Here?

Some topics that we did not cover or only briefly touched upon that you might want to explore next/further:

- Analyses for more than one video: use for-loops, functions from the `apply` family or `map` functions from the [`purrr` package](https://purrr.tidyverse.org/)

- Advanced text mining and NLP (going beyond [bag-of-words approaches](https://en.wikipedia.org/wiki/Bag-of-words_model)): check out the tutorials mentioned in the session on basic text analysis or this [presentation by Cosima Meyer](https://cosimameyer.com/slides/nlp-rladies-tunis/talk.html#1)

- Alternatives to dictionary-based approaches for sentiment analysis: See, e.g., [Boukes et al., 2019](https://doi.org/10.1080/19312458.2019.1671966) and [van Atteveldt et al., 2021](https://doi.org/10.1080/19312458.2020.1869198)

- Supervised machine learning for text analysis: The online book [Supervised Machine Learning for Text Analysis in R](https://smltar.com/) by Emil Hvitfeldt and Julia Silge is an excellent resource here

- Topic models (unsupervised ML): To get started you can, e.g., have a look at the introductions/tutorials by [Rachael Tatman](https://www.kaggle.com/rtatman/nlp-in-r-topic-modelling), [Julia Silge](https://juliasilge.com/blog/sherlock-holmes-stm/), or the [*Pew Research Center*](https://medium.com/pew-research-center-decoded/an-intro-to-topic-models-for-text-analysis-de5aa3e72bdb)

---

## Shameful Self-Promotion 🙈

We have written a book chapter based on this course which should be published later this year:

Breuer, J., Kohne, J., & Mohseni, M. R. (2023). Using YouTube Data for Social Science
Research. In J. Skopek (Ed.), *Research Handbook of Digital Sociology*. Edward Elgar Publishing.

If you are interested in working with *WhatsApp* data (and/or what else you can do with emojis and emoticons in text data), check out the [`WhatsR` package](https://github.com/gesiscss/WhatsR) (which is also still work in progress) by Julian Kohne.

---

## Acknowledgements ❤️

All slides were created with [`xaringan`](https://github.com/yihui/xaringan). For the exercises, we used the [`unilur` package](https://github.com/koncina/unilur). The structure and content of the workshop were built using the [woRkshoptools package](https://github.com/StefanJuenger/woRkshoptools).

A substantial part of the code we used in this workshop - especially that for parsing/processing the *YouTube* data - was written by [Julian Kohne](https://www.juliankohne.com/).

We thank the *GESIS* Training team for taking good care of the organization of this workshop, and all of you for participating!

---

# Any final questions or comments?

---

# Practice time

You now have some time to start or continue working on your own *YouTube* data analysis project. We'll be around, so feel free to ask questions while you work on or get started with your projects.