class: center, middle, inverse, title-slide # Automatic Sampling and Analysis of YouTube Data ## Recap - Outlook - Practice ### Julian Kohne
Johannes Breuer
M. Rohangis Mohseni ### 2022-02-22 --- layout: true <div class="my-footer"> <div style="float: left;"><span>Julian Kohne, Johannes Breuer, M. Rohangis Mohseni</span></div> <div style="float: right;"><span>GESIS, online, 2022-02-22</span></div> <div style="text-align: center;"><span>Recap - Outlook - Practice</span></div> </div> --- ## Course Recap (1) <table> <thead> <tr> <th style="text-align:left;"> Session </th> <th style="text-align:left;"> Example content </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Introduction </td> <td style="text-align:left;"> Why is YouTube data interesting for research? </td> </tr> <tr> <td style="text-align:left;"> The YouTube API </td> <td style="text-align:left;"> API access, API requests, quota limits </td> </tr> <tr> <td style="text-align:left;"> Collecting data with the tuber package for R </td> <td style="text-align:left;"> Collecting channel/video stats & viewer comments </td> </tr> <tr> <td style="text-align:left;"> Processing and cleaning user comments </td> <td style="text-align:left;"> Character encoding, string operations, emoji dictionaries </td> </tr> </tbody> </table> --- ## Course Recap (2) <table> <thead> <tr> <th style="text-align:left;"> Session </th> <th style="text-align:left;"> Example content </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Basic text analysis of user comments </td> <td style="text-align:left;"> Counting and visualizing the frequencies of words and emojis in comments </td> </tr> <tr> <td style="text-align:left;"> Sentiment analysis of user comments </td> <td style="text-align:left;"> Assigning sentiment scores to words and emojis </td> </tr> <tr> <td style="text-align:left;"> Excursus: Retrieving video subtitles </td> <td style="text-align:left;"> Retrieving and parsing YouTube video subtitles </td> </tr> </tbody> </table> --- ## Where To Go From Here? Some topics that we did not cover or only briefly touched upon that you might want to explore next/further: - Analyses for more than one video: use for-loops, functions from the `apply` family or `map` functions from the [`purrr` package](https://purrr.tidyverse.org/) -- - Advanced text mining and NLP (going beyond [bag-of-words approaches](https://en.wikipedia.org/wiki/Bag-of-words_model)): check out the introductions/tutorials mentioned in the session on basic text analysis or this [presentation by Cosima Meyer](https://cosimameyer.rbind.io/talk/nlp-rladies-tunis/) -- - Alternatives to dictionary-based approaches for sentiment analysis: See the publications by [Boukes et al., 2019](https://doi.org/10.1080/19312458.2019.1671966) and [van Atteveldt et al., 2021](https://doi.org/10.1080/19312458.2020.1869198) -- - Supervised machine learning for text analysis: The online book [Supervised Machine Learning for Text Analysis in R](https://smltar.com/) by Emil Hvitfeldt and Julia Silge is an excellent resource here -- - Topic models (unsupervised ML): To get started you can, e.g., have a look at the introductions/tutorials by [Rachael Tatman](https://www.kaggle.com/rtatman/nlp-in-r-topic-modelling), [Julia Silge](https://juliasilge.com/blog/sherlock-holmes-stm/), or the [*Pew Research Center*](https://medium.com/pew-research-center-decoded/an-intro-to-topic-models-for-text-analysis-de5aa3e72bdb) --- ## Shameful Self-Promotion 🙈 We have written a book chapter based on this course which should be published later this year: Breuer, J., Kohne, J., & Mohseni, M. R. (2022). Using YouTube Data for Social Science Research. In J. Skopek (Ed.), *Research Handbook of Digital Sociology*. Edward Elgar Publishing. If you are interested in working with *WhatsApp* data (and/or what else you can do with emojis and emoticons in text data), check out Julian's [`WhatsR` package](https://github.com/gesiscss/WhatsR) (which is also still work in progress). --- ## Acknowledgements ❤️ All slides were created with the `R` package [`xaringan`](https://github.com/yihui/xaringan) which builds on [`remark.js`](https://remarkjs.com), [`knitr`](http://yihui.name/knitr), and [`R Markdown`](https://rmarkdown.rstudio.com). The exercises were created with the [`unilur` package](https://github.com/koncina/unilur). The original inspiration for our emoji parsing and analyses came from a [blog post](http://opiateforthemass.es/articles/emoji-analysis/) by [Jessica Peterka-Bonetta](https://github.com/today-is-a-good-day). We thank the *GESIS* Training team for taking good care of the organization of this workshop, and all of you for participating! --- class: center, middle # Any final questions or comments? --- class: center, middle # Practice time ## You now have some time to start or continue working on your own *YouTube* data analysis project. We'll be around, so feel free to ask questions while you work on or get started with your projects.