In the following exercises we will collect and explore some data from YouTube.
Before we start with this exercise, three short notes on working with the exercise files in this workshop:
You can find the solutions for this exercise as well as all other
exercises in the solutions
folder in the repo/directory
that contains the course materials. You can copy code from these
exercise files by clicking on the small blue clipboard icon in the upper
right corner of the code boxes.
We would like to ask you to solve all R
coding tasks
by writing them into your own R
script files. This ensures
that all of your solutions are reproducible, and that you can (re-)use
solutions from earlier exercises in later ones.
All solutions ‘assume’ that the working directory is the root
directory of the workshop materials (which should be a folder named
youtube-workshop-gesis-2023
stored whereever you saved the
materials on your local hard drive). This way, we can make use of files
in other folders using relative paths. This means that you may have to
change/set your working directory accordingly for your own exercise
solution scripts.
Now let’s get to it…
yt_oauth
from the tuber
package which
requires the ID of your app as well as your app secret as arguments.
Note: While going through the following exercises you might want to monitor your API quota usage via the Google Cloud Platform dashboard for your app: Select IAM & Admin -> Quotas and look for YouTube Data Api v3 - Queries per day.
get_channel_stats
function which requires the ID the channel (as a string) as its main
argument. You can find the channel ID by inspecting the page source on
the channel website and searching for the strings “channelId” or
“externalId”, or by using the Commentpicker
tool. In this particular case, however, the channel ID is also
included in the link in the exercise text ;-)
get_stats
and need the
ID of the video. The video IDs are the characters after the “v=”
parameter in the video URL.
comments_lwt_census
.
get_all_comments
.
get_all_comments
from tuber
only collects up to 5 replies per comment. How many of the comments you
just collected were replies?
parentID
in the dataframe which
you can use to answer this question. It is missing (i.e.,
NA
) for top-level comments (i.e., those that are not
replies to other comments).
.rds
file named
RawLWTComments
file in a folder named data
within the workshop materials directory.
base R
function
saveRDS
. If you have not done so before, you should create
a data
subfolder within the folder that contains the
workshops materials. Again, the code in the solution assumes that your
working directory is the workshop materials folder and stores the file
in the data
folder that you should have created.
vosonSML
package. If
you do that, you can compare the data: How many comments were collected?
Which variables do the two data sets contain? Etc.
vosonSML
package.
If you have not done so, you can create one via the Google Cloud
Platform (APIs & Services -> Credentials).