class: center, middle, inverse, title-slide .title[ # Tools and Workflows for Reproducible Research in the Quantitative Social Sciences ] .subtitle[ ## Using Git and GitHub ] .author[ ### Bernd Weiß ] .date[ ### 2022-11-18 ] --- layout: true --- class: center, middle # Overview --- ## Recap - Basic introduction to Git - Git workflow - Important Git concepts/commands (`git init`, `git add`, `git commit`, `git log`, `git show`, `git clone`, `git checkout`, `git branch`, ...) - Also, the following slides are heavily influenced by another course on ["Reproducible research workflows for psychologists"](http://frederikaust.com/reproducible-research-practices-workshop/) by Frederik Aust and Johannes Breuer, especially the part on ["Collaborate with Git & GitHub"](https://crsh.github.io/reproducible-research-practices-workshop/slides/6_github_collaboration.html) --- ## .center[<img src="data:image/png;base64,#https://www.git-tower.com/learn/media/pages/git/ebook/en/desktop-gui/remote-repositories/introduction/405ca134e0-1667823656/basic-remote-workflow.png" alt="drawing" width="500"/>] (Source: https://www.git-tower.com/learn/git/ebook/en/desktop-gui/remote-repositories/introduction) --- class: center, middle # Please, let me in (authentication) --- ## Local or remote & HTTPS or SSH? - A Git project/repo is stored in a repository, which can be local or remote - When using Git to access a remote repository (for backup or collaborative work) on a remote server, you need to authenticate yourself to the server - There are two ways of authentication: HTTPS or SSH --- ## Choose a remote server - GitHub (which will be using) - GitLab - ... --- ## GitHub: Using personal access tokens - These days, [authentication via personal access tokens (PAT) (and https)](https://happygitwithr.com/https-pat.html) seems the way to go when using GitHub - In the following, I will illustrate the process using multiple screenshots - Note that my explanation does not include any R/RStudio-related processes. Johannes will talk about these things in more detail - Finally, I will focus on MS Windows. Arnim can help with macOS/Linux. --- ## In GitHub, go to the `Settings` website: <img src="data:image/png;base64,#../img/f_github-pat-0-settings.png" width="55%" style="display: block; margin: auto;" /> --- ## Next, go to the `Developer Settings` entry: <img src="data:image/png;base64,#../img/f_github-pat-1-settings.png" width="90%" style="display: block; margin: auto;" /> --- ## Then, choose `Tokens (classic)`: <img src="data:image/png;base64,#../img/f_github-pat-2-developer.png" width="90%" style="display: block; margin: auto;" /> --- ## And, generate a new token; important, save this Token (e.g., in your password manager): <img src="data:image/png;base64,#../img/f_github-pat-3-create-token.png" width="90%" style="display: block; margin: auto;" /> --- ## When you are now cloning a new repository (or pushing/pulling for the first time), you will be asked once to enter your username... <img src="data:image/png;base64,#../img/f_github-pat-4-enter-credent.png" width="90%" style="display: block; margin: auto;" /> --- ## ... and your Token (even though it says "Password"): <img src="data:image/png;base64,#../img/f_github-pat-4-enter-credent2.png" width="90%" style="display: block; margin: auto;" /> --- ## If, for whatever reason, you decide to reset/remove your credentials, you can do so using the [Windows Credentials Manager](https://support.microsoft.com/en-us/windows/accessing-credential-manager-1b5c916a-6a16-889f-8581-fc16e8165ac0) (in German: "Anmeldeinformationsverwaltung") <img src="data:image/png;base64,#../img/f_github-pat-5-wincred.png" width="70%" style="display: block; margin: auto;" /> --- ## Setting up SSH - SSH is a network protocol that comes in handy, when you work with remote repositories and when you do not want to type-in your password every time you pull (fetch) or push (send) from a remote repository. You still need to authenticate yourself, though. - To work with Git on you local computer, you do not need SSH (= Secure Shell). - Authentication in SSH (which is also the name of the program) works by using a private and a public key (usually the public key has the file extension `.pub`, e.g., my public key is `id_rsa.pub`). When you start working with SSH for the very first time, you have to create both keys. --- ## - The private key remains on your local computer and you have to make sure that it is safe -- it is a simple text file and it is your password now, and everyone who has your private key can access your files. Again, everyone who has your private key has your password! This is what my *public key* looks like: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAyOQ9RT6TkfgkdO2NspzdVJE5CZ03yYAhVwLGo CrI3E9/Ix0MAySunXExjhsQi2XkhPBjLOEahYuuLaAWHuBc7apUPRNSBy+mdUHnH3 0BdTQijQ6vj3RL99HO4yrZnipIlkS5ufw/+hpbXXOzSOqTvyGtL9ygm3eA2HDSQtz 2ptFq8anODJDKrgTbNLb/YZ9KDIcpdO/Sfk4LtvaGF3tIFlyE+pogNmN4eWiYg9Xv 25BhVVxWMHadRFLeDastWO4SedriEHzQYaNgxVNTufqolJ0nbg4R//fVDxjR2SbzV AHLZ+eVPUx+vzcPVMP9wYPcnii9YLiSRy+hlUAOR/kXeQ== berndweiss --- class: middle ## **The *public key* (not the private key!) has to be stored at the GitHub/GitLab/... website. Now, everyone who has your public key can encrypt files (that are sent to you via the internet) but only you (or anyone else who has your private key) can decrypt the files. And, for that reasons you do not have to login everytime you push/pull files from the remote repository.** --- ## - How to setup SSH on you computer is explained on this website: https://docs.gitlab.com/ce/ssh/README.html ("Generate an SSH key pair") - The most important point is that `ssh` is able to find your key pair, i.e., it needs to be located in your HOME folder If everything works well, you should receive the following friendly welcome message after typing in the command `ssh -T git@github.com`: > Hi berndweiss! You've successfully authenticated, but GitHub does not provide shell access. --- ## Security - Be extremely careful including sensitive information (e.g., personal data, passwords, access tokens) into a (public) GitHub repository. There are people out there who search for these things... see also https://docs.github.com/en/code-security/secret-scanning - Use your `.gitignore` file to exclude sensitive files/folders - Please enable [two-factor authentication on GitHub](https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication) - In case you have accidentally include sensitive information, check out this GitHub website on [Removing sensitive data from a repository]( https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository) --- class: center, middle # Working with remote respositories --- ## Adding a remote repository via `git remote add` - Can be accomplished using the git command `git remote add <name> <url>`. The usual name for `<name>` is `origin`, however, feel free to choose another name (when you clone a repo, then this has already happened). - SSH: The `<url>` for this repository looks like `git@git.gesis.org:weissbd/ps2017-xx-intro2git.git`; another example is this one: `git@github.com:berndweiss/ps2017-11_porto-campbell-ma-workshop.git`. The url can be found in the respective github/gitlab repository. - HTTPS: `git remote add origin https://github.com/berndweiss/lala.git` --- ## `git pull` - Given that you have already established a link to an remote server (such as GitHub, e.g., via `git remote add` or `git clone`), updates can be downloaded via the `git pull <name remote server> <branch>` command - Most often, this is `git pull origin main` .center[<img src="data:image/png;base64,#https://wac-cdn.atlassian.com/dam/jcr:63e58c34-b273-4e48-a6b1-6e3ba4d4a0ea/01%20bubble%20diagram-01.svg?cdnVersion=637" alt="drawing" width="400"/>] .small[(Source: https://www.atlassian.com/git/tutorials/syncing/git-pull)] --- ## - In the background, `git pull` combines two steps, `git fetch` and `git merge` .center[<img src="data:image/png;base64,#https://wac-cdn.atlassian.com/dam/jcr:0269bb2d-eb7f-43d8-80a2-8afa88d11eea/02%20bubble%20diagram-02.svg?cdnVersion=637" alt="drawing" width="400"/>] (Source: https://www.atlassian.com/git/tutorials/syncing/git-pull) --- ## `git push` - Again, given that you have already established a link to an remote server (such as GitHub, e.g., via `git remote add` or `git clone`), updates can be uploaded via the `git push <name remote server> <branch>` command - Most often, this is `git push origin main` - More information can be found here: https://www.atlassian.com/git/tutorials/syncing/git-push --- ## Exercise - In a previous exercise, you have created your own repository (let's call it `your-new-repo`) - Now, go to your GitHub account and create a new repository on GitHub - Startpage -> tab "Repositories" -> green button "New" <img src="data:image/png;base64,#../img/f_github-new-repo.png" width="40%" style="display: block; margin: auto;" /> --- ## - Enter a new "Repository name" - Make it "Private" (unless you have something important to share) - Do **not** check any of the "Initialize this repository with" boxes - Hit the green "Create repository" button --- ## - Choose SSH or HTTPS as protocol ("Quick setup — if you’ve done this kind of thing before") - Look for "...or push an existing repository from the command line" - HTTPS: - GitHub offers some assistance in terms of providing the full `git remote add` command; copy this line, which should look like `git remote add origin https://github.com/berndweiss/your-repo-name.git` - Execute the command `git remote add origin https://github.com/berndweiss/your-repo-name.git` in the Git Bash in your local repository (`your-new-repo`) --- ## - SSH: - Copy the line `git remote add origin git@github.com:berndweiss/your-repo-name.git` - Execute the command `git remote add origin git@github.com:berndweiss/your-repo-name.git` in the Git Bash in your local repository (`your-new-repo`) - Make some changes (edit a text file, create a new file etc.). Then `add` and `commit` these changes locally - Make sure that `git status` shows a clean repository - Now you can run your first `git push origin main` - Reload the GitHub page via F5; you now should see the content of your local repo `your-new-repo` --- ## - You can delete a GitHub repository via the tab "Settings" -> "Options", then scroll down -> "Danger Zone" -> "Delete this repository" --- class: center, middle # Inviting collaborators --- ## Adding collaborators to your GitHub repo - Adding collaborators works only for GitHub repositories that you own - GitHub provides a lot of collaboration features - Edit files in browser - Change highlighting and commenting - Interactive revise-and-resubmit workflow - Issue tracker (to-do list and discussion) .small[(Source: http://frederikaust.com/reproducible-research-practices-workshop/slides/6_github_collaboration.html#5)] --- ## Workflows for collaboration - Adding changes to a repo without prior review - Push directly to main branch on GitHub - You need be an invited collaborator - Suggest changes with review (pull request) - Create a new branch ("parallel universe" of repository) - You can be an invited collaborator or a complete stranger - Edits can be made directly on GitHub or locally on your computer .small[(Source: http://frederikaust.com/reproducible-research-practices-workshop/slides/6_github_collaboration.html#8)] --- ## <img src="data:image/png;base64,#../img/f_github-collab-1.png" width="90%" style="display: block; margin: auto;" /> --- ## <img src="data:image/png;base64,#../img/f_github-collab-2.png" width="90%" style="display: block; margin: auto;" /> --- ## <img src="data:image/png;base64,#../img/f_github-collab-3.png" width="90%" style="display: block; margin: auto;" /> --- class: center, middle # Merge conflicts --- ## Merge conflicts - Merge conflicts occur when there are two competing changes that affect the same file *and* the same lines in that same file; or if one person decided to delete it while the other person decided to modify it - Git will inform you about a merge conflict and will indicate the two competing changes in a file .small[(source: https://www.git-tower.com/learn/git/ebook/en/command-line/advanced-topics/merge-conflicts)] --- ## <img src="data:image/png;base64,#../img/f_merge-conf-github.png" width="90%" style="display: block; margin: auto;" /> --- ## <img src="data:image/png;base64,#../img/f_merge-conf-cli.png" width="90%" style="display: block; margin: auto;" /> --- ## An example of a merge conflict - This is how a merge conflict looks like in the file `test.R`: ``` <<<<<<< HEAD x <- rnorm(10) ======= x <- rnorm(100000) >>>>>>> 047e3f7a00a5541622e5a40dc342df3af0591838 mean(x) sd(x) ``` - A merge conflict only affects the developer who is causing a merge conflict - You have to resolve the merge conflict by editing the respective file(s) (then add and commit the changes) (remove the `<<<<<<<`, `=======`, `>>>>>>>` and save the file) --- ## <img src="data:image/png;base64,#../img/f_merge-conf-resolved.png" width="90%" style="display: block; margin: auto;" /> --- class: center, middle # Forking --- ## Forking - Forking refers to the process of creating a personal copy of someone else's project - Forking works only for public repositories (or, you have been invited to a private repository) - In order to contribute to another (public) repository via *pull requests*, you first need to fork the respective repository .small[(Source: https://docs.github.com/en/get-started/quickstart/contributing-to-projects)] --- ## <img src="data:image/png;base64,#../img/f_github-fork.png" width="90%" style="display: block; margin: auto;" /> --- class: center, middle # Pull requests --- ## Pull requests on GitHub - Pull request only work in the context of a web platform such as GitHub or GitLab - It is a (polite/only) way to contribute to another person's GitHub repository - If you are a collaborator, then it is a polite way to contribute - If you are not a collaborator, then it is the only way to contribute - Note, the following example is based on - two GitHub users: `berndweiss` and `berndweisspublic` - both users are not collaborating - on each slide I will indicate which GitHub user is currently involved --- ## - The current status of the repo (and the file `test.R`) from the perspective of GitHub user `berndweiss` <img src="data:image/png;base64,#../img/f_github-pull-requ-0.png" width="90%" style="display: block; margin: auto;" /> --- ## - GitHub user `berndweisspublic` insists of 100000 observations and modified the file accordingly <img src="data:image/png;base64,#../img/f_github-pull-requ-1.png" width="90%" style="display: block; margin: auto;" /> --- ## Since GitHub user `berndweisspublic` does not own the repository, he cannot commit the changes but *proposes* changes <img src="data:image/png;base64,#../img/f_github-pull-requ-2.png" width="90%" style="display: block; margin: auto;" /> --- ## The next step is that `berndweisspublic` creates a *pull request*, i.e., asking user `berndweiss` to accept his changes <img src="data:image/png;base64,#../img/f_github-pull-requ-3.png" width="90%" style="display: block; margin: auto;" /> --- ## GitHub user `berndweiss` is informed about a pull request; he can accept the pull request (`merge`) or close the pull request, i.e., deny it <img src="data:image/png;base64,#../img/f_github-pull-requ-4.png" width="70%" style="display: block; margin: auto;" /> --- class: center, middle # Now, work together! --- ## Exercises - Since this exercise is about collaboration, you need at least another person who is willing to collaborate with you. Go to our Ilias repo, there is a file called `wsrr_groups-github.txt` which assigns participants to groups. - We have setup a Breakout room for each group. --- ## - Task 1: Start the collaboration - Every group member (A and B) should create a new public repo on GitHub, the repo should include at least one text file - The other group member(s) should be invited to this GitHub repo - Clone the repo on your local computer, i.e., A clones B's repo and B clones A's repo --- ## - Task 2: Fight! - Everyone in the group (again, let's use members A and B): On your computer, user A modifies the **first line** of the text file in your partner's repo (B). Make sure that your changes are different (delete a word in the first line, add a word etc.) - The repo's owner (B) also makes changes using the GitHub file editor; these changes should not be identical to user's A local changes - `add` and `commit` your changes on your computer and then try to `pull` the modified repo from GitHub. - You will hopefully experience a merge conflict. Deal with it. - Now switch sides, i.e., B edits the file locally, and A makes changes on GitHub --- ## - Task 3: Forking - Fork my public test repo: https://github.com/berndweiss/twrr_public_testrepo - Fork your partner's public test repo --- ## - Task 4. Pull requests - Go to you partner's repo, fork it, introduce some changes, and then make a pull request - Accept (or close, i.e., decline) your partner's pull request (you can also check out the options under `Files changed` -> `Review changes`)