class: center, middle, inverse, title-slide .title[ # Tools and Workflows for Reproducible Research in the Quantitative Social Sciences ] .subtitle[ ## Recap & Outlook ] .author[ ### Johannes Breuer, Bernd Weiss, & Arnim Bleier ] .date[ ### 2022-11-18 ] --- layout: true --- ## Recap - Day 1 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> 10:00 - 11:00 </td> <td style="text-align:left;font-weight: bold;"> Introduction </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 11:00 - 12:00 </td> <td style="text-align:left;font-weight: bold;"> Computer literacy for reproducible research </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 13:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Introduction to R Markdown </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 15:15 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Git & GitHub - Part 1 </td> </tr> </tbody> </table> --- ## Recap - Day 2 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> 09:30 - 10:30 </td> <td style="text-align:left;font-weight: bold;"> Git & RStudio </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 10:30 - 10:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 10:45 - 11:45 </td> <td style="text-align:left;font-weight: bold;"> Jupyter Notebooks & Binder </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 11:45 - 12:30 </td> <td style="text-align:left;font-weight: bold;"> Build your own Binder </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:30 - 13:30 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 13:30 - 14:00 </td> <td style="text-align:left;font-weight: bold;"> Sharing & Publishing </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 14:00 - 14:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 14:15 - 15:45 </td> <td style="text-align:left;font-weight: bold;"> Git & GitHub - Part 2 </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 15:45 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Recap & Outlook </td> </tr> </tbody> </table> --- ## Other topics in reproducible research As we said in the introduction, we cannot cover all tools and topics related to reproducible research in this workshop. However, we want to use this session to cover some additional tools as well as other topics related to reproducible research: - Other options for collaboration - `R` package dependency management - Workflow & project templates --- ## Collaboration & different tool stacks As we all know from experience, people have different knowledge and preferences regarding the use of research tools. The workflows we have focused on in this course require the use of `R`, `R Markdown`, and `Git`, as well as *RStudio* and *GitHub*. However, it is quite likely that we collaborate with others who do not use (all of) these tools. --- ## Collaboration & different tool stacks There also are `R` packages that you can use for collaborating on `R Markdown` documents with people who do not (want to) use `R Markdown` (and `Git`): - [`trackdown`](https://claudiozandonella.github.io/trackdown/) uses *Google Docs* for this - [`redoc`](https://noamross.github.io/redoc/) "is a package to enable a two-way R Markdown-Microsoft Word workflow" (*note*: the project is currently "in suspended animation ") --- ## `trackdown` The basic workflow for `trackdown` is that you upload the content of an `.Rmd` file to *Google Docs* where you can collaboratively edit the text parts. You can then download the document again (e.g., to edit the code in the `R Markdown` document in *RStudio*), and also update the file on *Google Docs* after changing the `.Rmd` locally. The [`trackdown` documentation](https://claudiozandonella.github.io/trackdown/) provides further details. --- ## Advanced use of `trackdown` with `Git` To combine `R Markdown` with collaborative text editing via `trackdown` and version control (and to avoid potential issues caused by - possibly unintended - changes to the code parts on *Google Docs*), the author of the `trackdown` package, [Claudio Zandonella Callegher](https://github.com/ClaudioZandonella), proposes a solution in an [issue in the *GitHub* repository for the package](https://github.com/ClaudioZandonella/trackdown/issues/31). Essentially, the idea here is to create a `trackdown` branch in the `Git` repository and merge it with the `main` branch which is (mainly) used to edit the code. --- ## Dependency management The `R` world is highly dynamic and networked. This means that... - `R` packages change (sometimes in ways that cause changes in results or even breaking code) - `R` packages typically depend on other packages (which can also change) Eventually, this can lead to a problem often described as "code rot", meaning that code ceases to function (or leads to different results). --- ## Dependency management <img src="data:image/png;base64,#Recap_outlook_files/figure-html/dep-example-1.png" style="display: block; margin: auto;" /> --- ## Dependency management - Our projects may use/require different package versions - Manually managing dependencies is a nightmare - Keeping track of the dependencies and their changes - Restoring the `R` environment --- ## Dependency management In the previous sessions, we got to know *Binder*, which not only allows us to share our results and code with the world in an interactive manner, but also addresses the issue of code rot and dependency management (by sharing complete pre-specified `R` environments). -- However, for various reasons, using *Binder* might not be an option. Luckily, there are also other (more lightweight) options for handling dependency management and preventing code rot when working with `R`. --- ## Solutions for `R` 1. [`checkpoint`](https://github.com/RevolutionAnalytics/checkpoint/) by *Microsoft* - Requires a project-based workflow - Package database will gradually grow 2. [`groundhog`](https://groundhogr.com/) - Package database will gradually grow 3. [`renv`](https://rstudio.github.io/renv/articles/renv.html) by *RStudio* - Most flexible and powerful - Least straightforward to use .small[ *Note*: To ensure that everything works as on your machine, the packages need to be the same (version) as in the `checkpoint`/`groundhog` library. So, to be extra sure about this, you may need to update your local package installations accordingly. ] --- ## `checkpoint` `checkpoint` is part of the [Reproducible `R` Toolkit from *Microsoft*](https://mran.microsoft.com/documents/rro/reproducibility) <img src="data:image/png;base64,#C:\Users\breuerjs\Documents\Lehre\reproducible-research-gesis-2022\content\img\checkpoint.png" width="1000" style="display: block; margin: auto;" /> .small[ https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html ] --- ## `checkpoint` Dependencies are detected automatically ```r library("checkpoint") checkpoint("2022-11-11") library("ggplot2") ``` -- Uses a date-specific directory outside of the usual `R` package directory. By default, this is: ```bash ~/.checkpoint/... ``` --- ## `checkpoint` To further increase reproducibility, it is also possible to specify an `R` version. ```r checkpoint("2022-11-11", r_version = "4.1.3") ``` --- ## Dependency management - one step further Going beyond `R` packages (and potentially also the `R` version), a more holistic (albeit also technically demanding) approach to dependency management (and combating code rot) is the use of containers via *Docker*. If you want to get into this, you can have a look at the [Introduction to Docker for R Users](https://colinfay.me/docker-r-reproducibility/) by [Colin Fay](https://colinfay.me/) or the mini-tutorial [Enough Docker to be Dangerous](https://seankross.com/2017/09/17/Enough-Docker-to-be-Dangerous.html) by [Sean Kross](https://seankross.com/). There are different prespecified *Docker* files for `R` users available via [*rocker*](https://github.com/rocker-org/rocker), and there also is a [Docker workflow to work reproducibly with papaja](https://github.com/crsh/papaja_docker). .small[ *Note*: Remember that *Binder* also makes use of *Docker*. ] --- ## Project setup and templates In this workshop, we have shown you how to manually set up a reproducible research workflows. However, there are some tools that you can use to automate parts of this process. These can range from very simple to very elaborate solutions. --- ## Project setup and templates We have already seen and tested the file `create-project.sh` (which is small shell script for initializing a basic project folder structure that can be easily adapted and extended using any text editor). However, there also are several other (more complex) packages and templates that can be used for the creation and maintenance of reproducible research workflows, such as... - [`template`](https://github.com/crsh/template) - [`WORCS`](https://cjvanlissa.github.io/worcs/index.html) - *Workflow for Open Reproducible Code in Science* - [`workflowr`](https://workflowr.github.io/workflowr/) - [`starter`](https://www.danieldsjoberg.com/starter/) - [`rrtools`](https://github.com/benmarwick/rrtools) - Tools for Writing Reproducible Research in R [*start your lab*](https://www.startyourlab.com/) also provides an [`R` Project Template](https://github.com/startyourlab/r-project-template). --- ## Shoulders of giants... but sometimes also clay feet As you may have already experienced, not all tools always play together nicely. Keep in mind, that most tools that we have covered in this workshop are free and open source software (FOSS). Also, tool stacks can have break points and many tools themselves depend on other tools/tool stacks. Hence, things may not always work perfectly. -- But don't despair! There usually are solutions ([*Stack Overflow*](https://stackoverflow.com/) and issues in associated *GitHub* repositories are good places to find them) and the advantage of FOSS is that there usually is an active development community that you can also get involved in. --- ## Showing appreciation 👏 The creation and maintenance of FOSS takes a lot of time and this is rarely recognized as much as it should be. One thing we can do to change this is to at least give credit where credit is due and cite the tools and resources that we use. --- ## Showing appreciation 👏 ```r citation("trackdown") ``` ``` ## ## To cite trackdown in publications use: ## ## Emily Kothe, Claudio Zandonella Callegher, Filippo Gambarota, Janosch ## Linkersdörfer and Mathew Ling (2021). trackdown: Collaborative Writing and Editing ## of R Markdown (or Sweave) Documents in Google Drive. ## https://doi.org/10.5281/zenodo.5167319 ## ## Ein BibTeX-Eintrag für LaTeX-Benutzer ist ## ## @Manual{, ## title = {trackdown: Collaborative Writing and Editing of R Markdown (or Sweave) Documents in Google Drive}, ## author = {Emily Kothe and Claudio Zandonella Callegher and Filippo Gambarota and Janosch Linkersdörfer and Mathew Ling}, ## year = {2021}, ## note = {R package version 1.1.1}, ## url = {https://github.com/claudiozandonella/trackdown}, ## doi = {10.5281/zenodo.5167319}, ## } ``` --- ## FOSS is boss ["Open source is a hard requirement for reproducibility"](https://www.brodrigues.co/blog/2022-11-16-open_source_repro/) (Bruno Rodrigues) --- ## Looking back You created a *GitHub* repository containing materials for a fully reproducible research pipeline! 🎉 If you created a public *GitHub* repository: Head over to http://starlogs.net/ and paste the URL of the repository to recap your heroic journey into the universe of reproducible research! 🌌 --- class: center, middle # Any other questions or things you want to say/discuss? --- ## Looking forward We hope that we could get you started or help you with with making your research (more) reproducible. Of course, as always, there is much more to explore and learn. The only way to really get familiar with the tools and workflows is if you use them for your own research. -- And remember that making your research (more) reproducible is an incremental process. Every step towards reproducibility is an improvement 👣. You don't have to (and probably should not) leap to a full `R Markdown` + `Git` + `Docker` workflow all at once. -- Keep calm and stay reproducible! 😊 --- class: center, middle # Thank you very much for participating in this workshop! 🙇 We hope that you learned something and also had some fun (at least a little bit...)