Automatic Sampling and Analysis of YouTube Data

# Automatic Sampling and Analysis of YouTube Data
## Recap - Outlook - Practice
### Julian Kohne<br />Johannes Breuer<br />M. Rohangis Mohseni
### 2022-02-22

---

<div class="my-footer">
  <div style="float: left;"><span>Julian Kohne, Johannes Breuer, M. Rohangis Mohseni</span></div>
  <div style="float: right;"><span>GESIS, online, 2022-02-22</span></div>
  <div style="text-align: center;"><span>Recap - Outlook - Practice</span></div>
</div>
  
---

## Course Recap (1)

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Session </th>
   <th style="text-align:left;"> Example content </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Introduction </td>
   <td style="text-align:left;"> Why is YouTube data interesting for research? </td>
  </tr>
  <tr>
   <td style="text-align:left;"> The YouTube API </td>
   <td style="text-align:left;"> API access, API requests, quota limits </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Collecting data with the tuber package for R </td>
   <td style="text-align:left;"> Collecting channel/video stats & viewer comments </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Processing and cleaning user comments </td>
   <td style="text-align:left;"> Character encoding, string operations, emoji dictionaries </td>
  </tr>
</tbody>
</table>

---

## Course Recap (2)

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Session </th>
   <th style="text-align:left;"> Example content </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Basic text analysis of user comments </td>
   <td style="text-align:left;"> Counting and visualizing the frequencies of words and emojis in comments </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sentiment analysis of user comments </td>
   <td style="text-align:left;"> Assigning sentiment scores to words and emojis </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Excursus: Retrieving video subtitles </td>
   <td style="text-align:left;"> Retrieving and parsing YouTube video subtitles </td>
  </tr>
</tbody>
</table>

---

## Where To Go From Here?

Some topics that we did not cover or only briefly touched upon that you might want to explore next/further:

- Analyses for more than one video: use for-loops, functions from the `apply` family or `map` functions from the [`purrr` package](https://purrr.tidyverse.org/)

- Advanced text mining and NLP (going beyond [bag-of-words approaches](https://en.wikipedia.org/wiki/Bag-of-words_model)): check out the introductions/tutorials mentioned in the session on basic text analysis or this [presentation by Cosima Meyer](https://cosimameyer.rbind.io/talk/nlp-rladies-tunis/)

- Alternatives to dictionary-based approaches for sentiment analysis: See the publications by [Boukes et al., 2019](https://doi.org/10.1080/19312458.2019.1671966) and [van Atteveldt et al., 2021](https://doi.org/10.1080/19312458.2020.1869198)

- Supervised machine learning for text analysis: The online book [Supervised Machine Learning for Text Analysis in R](https://smltar.com/) by Emil Hvitfeldt and Julia Silge is an excellent resource here

- Topic models (unsupervised ML): To get started you can, e.g., have a look at the introductions/tutorials by [Rachael Tatman](https://www.kaggle.com/rtatman/nlp-in-r-topic-modelling), [Julia Silge](https://juliasilge.com/blog/sherlock-holmes-stm/), or the [*Pew Research Center*](https://medium.com/pew-research-center-decoded/an-intro-to-topic-models-for-text-analysis-de5aa3e72bdb)

---

## Shameful Self-Promotion 🙈

We have written a book chapter based on this course which should be published later this year:

Breuer, J., Kohne, J., & Mohseni, M. R. (2022). Using YouTube Data for Social Science
Research. In J. Skopek (Ed.), *Research Handbook of Digital Sociology*. Edward Elgar Publishing.

If you are interested in working with *WhatsApp* data (and/or what else you can do with emojis and emoticons in text data), check out Julian's [`WhatsR` package](https://github.com/gesiscss/WhatsR) (which is also still work in progress).

---

## Acknowledgements ❤️

All slides were created with the `R` package [`xaringan`](https://github.com/yihui/xaringan) which builds on [`remark.js`](https://remarkjs.com), [`knitr`](http://yihui.name/knitr), and [`R Markdown`](https://rmarkdown.rstudio.com). The exercises were created with the [`unilur` package](https://github.com/koncina/unilur).

The original inspiration for our emoji parsing and analyses came from a [blog post](http://opiateforthemass.es/articles/emoji-analysis/) by [Jessica Peterka-Bonetta](https://github.com/today-is-a-good-day).

We thank the *GESIS* Training team for taking good care of the organization of this workshop, and all of you for participating!

---

# Any final questions or comments?

---

# Practice time

## You now have some time to start or continue working on your own *YouTube* data analysis project. We'll be around, so feel free to ask questions while you work on or get started with your projects.