GitHub

GitHub is a online service that permits you to organize and share your code in what are called repositories. In a later section, we will show you how to use Git and GitHub to organize your data science projects through RStudio. Before we do that, we create an account and a repository.

GitHub Accounts

The next step is to get a GitHub account. Basic GitHub accounts are free. The first step is to go to GitHub where you will see a box in which you can sign up.

You want to pick a name carefully. You want it to be short, easy to remember and spell, somehow related to your name, and professional. This last one is important since you might be sending potential employers a link to your GitHub account. In the example below I am sacrificing on the ease of spelling to incorporate my name. Your initials and last name are usually a good choice. If you have a very common name, then this may have to be taken into account. A simple solution would be to add numbers or spell out part of your name.

The account I use for my research, rafalab, is the same one I use for my webpage and Twitter, which makes it easy to remember for those that follow my work.

Once you have a GitHub account, you are ready to connect RStudio to this account. You start by going to the Global Options and selecting GitSVN:

You then need to enter to path [AN: “a” path?] for the Git executable we just installed.

On the Windows default installation this will be C:/Program File/Git/bin/git.exe, but you should find it by browsing your system as this can change from system to system.

Now to avoid entering our GitHub password every time we try to access our repository, we will create what is called SSH RSA Key. RStudio can do this for us automatically if we click on the Creat RSA Key button:

You can follow the default instructions and:

RStudio and GitHub should now be able to connect and we are ready to create a first GitHub code repository.

GitHub Repositories

You are now ready to create a GitHub repository (repo). The general idea is that you will have at least two copies of your code: one on your computer and one on GitHub. If you add collaborators to this project, then each will have a copy on their computer. The GitHub copy is usually consider the master copy that each collaborator syncs to. Git will help you keep all the different copies synced.

As mentioned, one of the advantages of keeping code on a GitHub repository is that you can easily share it with potential employers interested in seeing examples of your work. Because many data science companies use version control systems, like Git, to collaborate on projects, they might also be impressed that you already know at least the basics.

The first step in creating a repo for your code is to initialize on GitHub. Because you already created an account, you will have a repo on GitHub with the URL _http:\\github.com/username_.

To create a repo, first login to your account by clicking the Sign In button on http://github.com. You might already be signed in, in which case the Sign In button will not show up:

If signing in, you will have to enter your username and password. We recommend you set up your browser to remember this to avoid typing it in each time.

Once on your account, you can click on Repositories:

Then you click on New to create a new repo:

You will then want to chose a good descriptive name for the project. Remember that in the future, you might have dozens of repos so keep that in mind when choosing a name. Here we will use homerwork-0. We recommend you make the repo public. If you want to keep it private, you will have to pay a monthly charge.

You now have your first repo on GitHub. The next step will be to clone it on your computer and start editing and syncing using Git.

To do this, it is convenient to copy the link provided by GitHub specifically to connect to this repo using Git.