class: center, middle, inverse, title-slide .title[ # Workflows for Reproducible Research with R & Git ] .subtitle[ ## Recap & Outlook ] .author[ ### Johannes Breuer, Bernd Weiss, & Arnim Bleier ] .date[ ### 2023-11-17 ] --- layout: true --- ## Recap - Day 1 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> 09:30 - 10:45 </td> <td style="text-align:left;font-weight: bold;"> Introduction </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 10:45 - 11:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Coffee Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 11:00 - 12:00 </td> <td style="text-align:left;font-weight: bold;"> Computer literacy </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 13:00 - 15:00 </td> <td style="text-align:left;font-weight: bold;"> Git & GitHub - Part 1 </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 15:00 - 15:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Coffee Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 15:15 - 16:30 </td> <td style="text-align:left;font-weight: bold;"> Git & GitHub - Part 2 </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 16:30 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Q & A </td> </tr> </tbody> </table> --- ## Recap - Day 2 <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: gray !important;"> 09:00 - 09:30 </td> <td style="text-align:left;font-weight: bold;"> Recap Day 1 </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 09:30 - 11:00 </td> <td style="text-align:left;font-weight: bold;"> Dependency management </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 11:00 - 11:15 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Coffee Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 10:15 - 12:00 </td> <td style="text-align:left;font-weight: bold;"> Build your own Binder </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 12:00 - 13:00 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Lunch Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 13:00 - 14:30 </td> <td style="text-align:left;font-weight: bold;"> Binder & Notebooks </td> </tr> <tr> <td style="text-align:left;color: gray !important;color: gray !important;"> 14:30 - 14:45 </td> <td style="text-align:left;font-weight: bold;color: gray !important;"> Coffee Break </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 14:45 - 16:00 </td> <td style="text-align:left;font-weight: bold;"> Saving computational environments </td> </tr> <tr> <td style="text-align:left;color: gray !important;"> 16:00 - 17:00 </td> <td style="text-align:left;font-weight: bold;"> Recap & Outlook </td> </tr> </tbody> </table> --- ## Other topics in reproducible research As we said in the introduction, we cannot cover all tools and topics related to reproducible research in this workshop. However, we want to use this session to at least mention and provide pointers to resources for some of these topics. Things we did not cover (TWDNC): - Reproducible documents/papers - Data sharing - Collaboration with `Git` & *GitHub* - Alternatives to *Docker* - Alternatives to *Binder* - Workflow & project templates --- ## TWDNC: Reproducible documents There are different formats for [literate programming](https://en.wikipedia.org/wiki/Literate_programming) that combine text, code, and output, such as: - [`R Markdown`](https://rmarkdown.rstudio.com/) (`.Rmd`) - Although not as widely used, there also is the format of [`R` Notebooks](https://bookdown.org/yihui/rmarkdown/notebook.html). - [Jupyter Notebooks](https://jupyter.org/) (`.ipynb`) - [Quarto](https://quarto.org/) (`.qmd`) *Note*: Jupyter Notebooks, `Quarto` (depending on the output format), and `R` Notebooks (can) also add interactivity. --- ## TWDNC: Reproducible documents Some helpful resources for getting started or digging deeper ⛏: - `R Markdown` - [our slides from last year's workshop](https://jobreu.github.io/reproducible-research-gesis-2022/slides/RMarkdown.html#1) - [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) by Yihui Xie, J. J. Allaire, & Garrett Grolemund - [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/) by Yihui Xie, Christophe Dervieux, & Emily Riederer - `Quarto` - materials from the [workshop "Automated Reports & Co with Quarto and Markdown" by David Schoch & Chung-hong Chan](https://gesiscss.github.io/quarto-workshop/) - [Quarto: The Definitive Guide](https://quarto-tdg.org/) by Mine Çetinkaya-Rundel and Charlotte Wickham --- ## TWDNC: Data sharing Typically, there are different research products that you can share: - a publication - code used for data processing and analysis (= focus of this workshop) - research data - other materials (e.g., stimuli, questionnaires, etc.) --- ## TWDNC: Data sharing Studies have repeatedly shown that "sharing upon request" is not a sustainable solution for data sharing. And we have discussed why *GitHub* is not a good place for sharing research data. However, there are other more suitable options for sharing research data. The best one is the use of dedicated repositories for research data. The paper by [Klein et al. (2018)](https://doi.org/10.1525/collabra.158) provides an overview of public repositories that hold psychological data.<sup>1</sup> A good tool for finding suitable repositories is the [*Registry of Research Data Repositories*](https://www.re3data.org/). <small><small>[1] However, parts of this overview have inevitably become somewhat outdated since the paper was published.</small></small> --- ## TWDNC: Data sharing Some good options for sharing research data (as everything, each with their own pros and cons): - General purpose (data) repositories, such as the [*Open Science Framework*](https://osf.io/) (OSF), [*Zenodo*](https://zenodo.org/), or [*Harvard Dataverse*](https://dataverse.harvard.edu/) - Curated discipline-specific archives, such as the [*GESIS Data Archive*](https://www.gesis.org/en/data-services/home), [*PsychArchives*](https://www.psycharchives.org/) by *ZPID*, or [*ICPSR*](https://www.icpsr.umich.edu/web/pages/) - Archives for specific types of data, such as [*Qualiservice*](https://www.qualiservice.org/en/) or [*The Qualitative Data Repository*](https://qdr.syr.edu/) --- ## TWDNC: Collaboration with `Git` & *GitHub* Typically, you collaborate with others on research projects. `Git` and *GitHub* can also be used for this purpose. However, this requires that all collaborators are willing and able to do so. For some guidance on how `Git` & *GitHub* can be used for collaboration, you can have a look at [our slides on this topic from last year](https://jobreu.github.io/reproducible-research-gesis-2022/slides/intro-collab-github.html#31) or the [slides by Frederik Aust from a similar workshop](file:///C:/Users/breuerjs/Documents/Teaching/reproducible_research_KU-Leuven_2022/reproducible-research-practices-workshop/slides/6_github_collaboration.html#1) (taught together with Johannes Breuer). --- ## TWDNC: *GitHub* as a social network 🕸 You can *star* repositories ⭐, *follow* users (or organizations), and have a personalized newsfeed on *GitHub*. It is also easily possible to contribute to the work of others or have others contribute to your work, e.g., via creating or closing an [*issue*](https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues) or [*pull requests*](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests). --- ## TWDNC: Alternatives to *Docker* - [*podman*](https://podman.io/) as an open-source alternative to *Docker* - using the package manager [*Nix*](https://nixos.org/) with the `R` package [`rix`](https://b-rodrigues.github.io/rix/articles/reproducible-analytical-pipelines-with-nix.html) - Bruno Rodrigues wrote a series of [blog posts about "Reproducible Data Science with Nix"](https://www.brodrigues.co/blog/2023-10-20-nix_for_r_part7/) - *note*: only works for Linux and macOS; for Windows you need to use [WSL](https://learn.microsoft.com/en-us/windows/wsl/about) --- ## TWDNC: Alternatives to *Binder* *Note*: All of the following alternatives are commercial (but offer limited free use). - [*Code Ocean*](https://codeocean.com/) - [*Observable*](https://observablehq.com/) - [*Posit Cloud*](https://posit.co/products/cloud/cloud/) (formerly *RStudio Cloud*) --- ## TWDNC: Project setup and templates In this workshop, we have shown you how to manually set up a reproducible research workflows. However, there are some tools that you can use to automate parts of this process. These can range from very simple to very elaborate solutions. --- ## TWDNC: Project setup and templates We have already seen and tested the file `create-project.sh` (which is small shell script for initializing a basic project folder structure that can be easily adapted and extended using any text editor). However, there also are several other (more complex) packages and templates that can be used for the creation and maintenance of reproducible research workflows, such as... - [`template`](https://github.com/crsh/template) by Frederik Aust & Marius Barth - [`WORCS`](https://cjvanlissa.github.io/worcs/index.html) - *Workflow for Open Reproducible Code in Science* - [`workflowr`](https://workflowr.github.io/workflowr/) - [`starter`](https://www.danieldsjoberg.com/starter/) - a toolkit for starting new projects - [`rrtools`](https://github.com/benmarwick/rrtools) - Tools for Writing Reproducible Research in R - [`targets`](https://docs.ropensci.org/targets/) - Function-oriented Make-like declarative workflows for R [*start your lab*](https://www.startyourlab.com/) also provides an [`R` Project Template](https://github.com/startyourlab/r-project-template). --- ## Other resources on reproducible research with `R` - [Building reproducible analytical pipelines with R](https://raps-with-r.dev/) by Bruno Rodrigues - Blog post [An overview of what's out there for reproducibility with R](https://www.brodrigues.co/blog/2023-10-05-repro_overview/) by Bruno Rodrigues - [Chapter on "Computational Reproducibility"](https://lakens.github.io/statistical_inferences/14-computationalreproducibility.html) in the book/course "Improving Your Statistical Inferences" by Daniel Lakens - [BERD Course Booklet: Make Your Research Reproducible](https://berd-nfdi.github.io/BERD-reproducible-research-course/) by Heidi Seibold - [Guide for Reproducible Research by *The Turing Way*](https://the-turing-way.netlify.app/reproducible-research/reproducible-research.html) --- ## Final note: Showing appreciation 👏 The creation and maintenance of FOSS takes a lot of time and this is rarely recognized as much as it should be. One thing we can do to change this is to at least give credit where credit is due and cite the tools and resources that we use. --- ## Final note: Showing appreciation 👏 ```r citation("rang") ``` ``` ## To cite rang in publications use: ## ## Chan C, Schoch D (2023). "rang: Reconstructing reproducible R computational environments." _PLOS ONE_. ## doi:10.1371/journal.pone.0286761 <https://doi.org/10.1371/journal.pone.0286761>, ## <https://github.com/gesistsa/rang>. ## ## Ein BibTeX-Eintrag für LaTeX-Benutzer ist ## ## @Article{, ## title = {rang: Reconstructing reproducible R computational environments}, ## journal = {PLOS ONE}, ## author = {Chung-hong Chan and David Schoch}, ## url = {https://github.com/gesistsa/rang}, ## year = {2023}, ## doi = {10.1371/journal.pone.0286761}, ## } ``` --- ## FOSS is boss 😎 ["Open source is a hard requirement for reproducibility"](https://www.brodrigues.co/blog/2022-11-16-open_source_repro/) (Bruno Rodrigues) --- ## Looking back You created a *GitHub* repository containing materials for a fully reproducible research pipeline! 🎉 If you created a public *GitHub* repository: Head over to http://starlogs.net/ and paste the URL of the repository to recap your heroic journey into the universe of reproducible research! 🌌 --- ## The path to reproducibility 👣 <img src="data:image/png;base64,#https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66fbf8a0-a59a-464f-9b4d-a3c0106ec76f_2904x2244.jpeg" width="75%" style="display: block; margin: auto;" /> <small><small>Illustration by [Heidi Seibold](https://heidiseibold.com/) from her newsletter/*Substack* blog post ["Document with care: README, Metadata, code comments, … to make both others and future you happy"](https://heidiseibold.substack.com/p/6-steps-towards-reproducible-research-fc5)</small></small> --- class: center, middle # Any other questions or things you want to say/discuss? --- ## Looking forward We hope that we could get you started or help you with with making your research (more) reproducible. Of course, as always, there is much more to explore and learn. The only way to really get familiar with the tools and workflows is if you use them for your own research. -- And remember that making your research (more) reproducible is an incremental process. Every step towards reproducibility is an improvement 👣. You don't have to (and probably should not) leap to a full `R` + `Git` + `Docker` workflow all at once. -- Keep calm and stay reproducible! 😊 --- class: center, middle # Thank you very much for participating in this workshop! 🙇 We hope that you learned something and also had some fun (at least a little bit...)