Git

Here we provide some more details on Git and GitHub. However, we are only scratching the surface. To learn more about this topic, we highly recommend the following resources:

Why we use Git and GitHub?

There are three main reasons to use Git and GitHub.

Share: Even if we do not take advantage of the advanced and powerful version control functionality, we can still use Git and GitHub to share our code. We have already shown how we can do this with RStudio.
Collaborating: Once you set up a central repo, you can have multiple people make changes to code and keep versions synched. GitHub provides a free service for centralized repos. GitHub also has a special utility, called a pull request, that can be used by anybody to suggest changes to your code. You can easily either accept or deny the request.
Version control: The version control capabilities of Git permit us to keep track of changes we make to our code. We can also revert back to previous versions of files. Git also permits us to create branches in which we can test out ideas, then decide if we merge the new branch with the original.

Here we focus on the sharing aspects of Git and GitHub and refer the reader to the links above to learn more about this powerful tool.

Overview of Git

To effectively permit version control and collaboration, in Git files move across four different areas: [AN: “we” move? ]

But how does it all get started? There are two ways. We can clone an existing repo or initialize one. We will try the first approach first.

Clone

We are going to clone an existing Upstream Repository. You can see it on GitHub here. By visiting this page, you can see multiple files are directories. This is the Upstream Repository. By clicking the green clone button we can copy the repo’s URL https://github.com/rairizarry/murders.git.

But what does clone mean? Rather than download all these files to your computer, we are going to actually copy the entire Git structure which means we will add the files and directories to each of the three local stages: Working Directory, Staging Area and Local Repository. When you clone, all three are exactly the same to start.

You can quickly see an example of this by doing the following. Open a terminal and type:

## Cloning into 'murders'...

You now have cloned a GitHub repo and have a working Git directory, with all the files, on your system.

## README.txt
## analysis.R
## data
## download-data.R
## murders.Rproj
## rdas
## report.Rmd
## report.md
## report_files
## wrangle-data.R

The Working Directory is the same as your Unix working directory. When you edit files using an editor such as RStudio, you change the files in this area and only in this area. Git can tell you how these files relate to the versions of the files in other areas with the command git status:

If you check the status now, you will see that nothing has changed and you get the following message:

## On branch master
## Your branch is up to date with 'origin/master'.
## 
## nothing to commit, working tree clean

We are now going to make changes that we eventually want to be synched with the upstream repo. But we don’t want to do this until we are sure these are final enough versions to share. However, we can keep track of changes we make in our local directory before pushing these files to the upstream repo. Yet we also want to avoid keeping track of too many changes in the local version. We don’t want every little change, only changes we think are worth tracking. Edits in the staging area are not kept by the version control system. We add a file to the staging area with the git add command. Below we create a file using the Unix echo command just as an example (in reality you would use RStudio):

We are also adding a temporary file that we do not want to track:

Now we can stage the file we want in our repository:

Notice what that status says now [AN: missing code?]

Look at the status:

## On branch master
## Your branch is up to date with 'origin/master'.
## 
## Changes to be committed:
##   (use "git reset HEAD <file>..." to unstage)
## 
##  new file:   new-file.txt
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  tmp.txt

Any changes we make to new-file.txt will now get added to the repository next time we commit. We commit like this:

## [master bdf2947] adding a new file
##  1 file changed, 1 insertion(+)
##  create mode 100644 new-file.txt

We have now changed to local repo:

## On branch master
## Your branch is ahead of 'origin/master' by 1 commit.
##   (use "git push" to publish your local commits)
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  tmp.txt
## 
## nothing added to commit but untracked files present (use "git add" to track)

If we edit that file again, it changes only in the working directory. To add to the local repo, we need to stage it and commit the changes that are added to the local repo:

## [master d99ec4c] adding a new line to new-file
##  1 file changed, 1 insertion(+)
## On branch master
## Your branch is ahead of 'origin/master' by 2 commits.
##   (use "git push" to publish your local commits)
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  tmp.txt
## 
## nothing added to commit but untracked files present (use "git add" to track)

This step is often unnecessary in our uses of Git. We can skip the staging part if we add the file name to the commit command like this:

## [master 8ec0fc6] minor change to new-file
##  1 file changed, 1 insertion(+)
## On branch master
## Your branch is ahead of 'origin/master' by 3 commits.
##   (use "git push" to publish your local commits)
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  tmp.txt
## 
## nothing added to commit but untracked files present (use "git add" to track)

We can keep track of all the changes we have made with:

## commit 8ec0fc6c32e6e02497f4d44371435e2d68b997ef
## Author: Stephanie Hicks <stephaniechicks@gmail.com>
## Date:   Tue Jul 31 06:54:00 2018 -0700
## 
##     minor change to new-file
## 
## commit d99ec4c7f3b877a2d81febbd0af17f5dfb75fd5c
## Author: Stephanie Hicks <stephaniechicks@gmail.com>
## Date:   Tue Jul 31 06:54:00 2018 -0700
## 
##     adding a new line to new-file
## 
## commit bdf2947ad938f4fa464fdc61ba2614c5fdc7b0a4
## Author: Stephanie Hicks <stephaniechicks@gmail.com>
## Date:   Tue Jul 31 06:54:00 2018 -0700
## 
##     adding a new file

The final step is to push the changes to the upstream repo. This is done with the git push command like this:

However, you will not be able to do this because you do not have permission to edit the upstream repo. If this was your repo, you could.

If this is a collaborative project, the upstream repo may change and become different than our version. To update our local repository to be like the upstream repo, we use the command fetch:

And then to make these copies to the staging and working directory areas, we use the command:

However, we often just want to change both in one show. For this we use:

We earlier learned how RStudio has buttons to do all this. The details provided here should help you understand what happens in the background.

Now let’s learn the second way we can get started: by initializing a directory on our own computer rather than cloning.

We will show how we created the GitHub for our gun murders project. We first created a project on our computer so we already had all the files and directory ready. But we did not yet have a Git local repo or GitHub upstream repo.

We start by creating a new repo on our GitHub page:

We click on the New button:

We called it murders to match the name of the directory on our local system. But if you are doing this for another project please chose an appropriate name.

We then get a series of instructions on how to get started. But we can instead use what we have learned. The main thing we need from this page is to copy the repo’s URL, in this case: https://github.com/rairizarry/murders.git.

At this moment, we can start a terminal and cd into our local projects directory. In our example, it would be:

We then intialize the directory. This turns the directory into Git directory and Git starts tracking:

All the files are now only in our working directory.

The next step is to connect the local repo with the GitHub repo. In a previous example, we had RStudio do this for us. Now we need to do it ourselves. We can adding any of the files and committing it: [AN: review last sentence. shoud be ‘start by adding..’ or ‘add’]

We now have a file in our local repo and can connect it to the upstream repo, which has url https://github.com/rairizarry/murders.git.

To do this, we use the the command git remote add.

We can now use git push since there is a connection to an upstream repo:

We can continue adding and committing each file, but it might be easier to use RStudio. To do this start the project by opening the Rproj file. The git icons should appear:

We can now go to GitHub and confirm that our files are there.