RStudio

RStudio will be our launching pad for data science projects. It not only provides an editor for us to create and edit our scripts but many other useful tools. In this section, we go over some of the basics.

The panes

When you start RStudio for the first time, you will see three panes. The left pane shows the R console. On the right, the top pane includes three tabs: Extensions, History and Connections, while the bottom pane shows five tabs: File, Plots, Packages, Help and Viewer. You can click on each tab to move across the different features.

To start a new script, you can click on File, the New File, then R Script.

This starts a new pane on the left and it is here where you can start writing your script.

Key bindings

Many tasks we perform with the mouse can be achieved with a combination of key strokes instead. These keyboard versions for performing tasks are referred to as key bindings. For example, we just showed how to use the mouse to start a new script, but you can also use a key binding: Ctrl+Shift+N on Windows and command+shift+N on the Mac.

Although in this tutorial we often show how to use the mouse, we highly recommend that you memorize key bindings for the operations you use most. RStudio provides a useful cheat sheet with the most widely used commands. You can get it from RStudio directly:

and it looks like this:

You might want to keep this handy so you can look up key-bindings when you find yourself performing repetitive point-and-clicking.

Installing R packages

Most of what we have learned in this book depends on the tidyverse. The data we have been working on depend on the dslabs package. These packages do not come pre-installed in R. In fact, the default installation of R is quite minimal and for many of your projects you will need to download and install one or more packages.

You can install packages directly from R with the command install.packages. To install the tidyverse package we would type, in the R console:

install.packages("tidyverse")

Note that we can install more than one package at once by feeding a character vector to this function:

install.packages(c("tidyverse", "dslabs"))

You can also install packages using RStudio in the following way:

One advantage of using RStudio is that it auto-completes package names once you start typing, which is helpful when you do not remember the exact spelling of the package:

Once you select your package we recommend selecting all the defaults:

Remember that installing tidyverse actually installs several packages.

Once packages are installed, you can load them into R and you do not need to install them again, unless you install a fresh version R. Remember packages are installed in R not RStudio.

It is helpful to keep a list of all the packages you need for your work in a script because if you need to perform a fresh install of R, you can re-install all your packages by simply running a script.

You can see all the packages you have installed using the following function:

installed.packages()

Running commands while editing scripts

There are many editors specifically made for coding. These are useful because color and indentation are automatically added to make code more readable. RStudio is one of these editors, and it was specifically developed for R. One of the main advantages provided by RStudio over other editors is that we can test our code easily as we edit our scripts. Here we show an example.

Let’s start by opening a new script as we did before. A next step is to give the script a name. We can do this through the editor by saving the current new unnamed script. You can do this by clicking on the save icon or using the key binding Ctrl-S on Windows and command-S on the Mac.

When you ask for the document to be saved for the first time, RStudio will prompt you for a name. You want to use a descriptive name, with lower case letters, no spaces, only hyphens to separate words, and then followed by the suffix .R. We will call this script my-first-script.R.

Now we are ready to start editing our first script. The first lines of code in an R script are dedicated to loading the libraries we will use. Another useful RStudio feature is that once we type library() it starts auto-completing with libraries that we have installed. Note what happens when we type library(ti):

Another feature you may have noticed is that when you type library( the second parenthesis is automatically added. This will help you avoid one of the most common errors in coding: forgetting to close a parenthesis.

Now we can continue to write code. As an example we will make a graph showing murder totals versus population totals by state. Once you are done writing the code needed to make this plot, you can try it out by sourcing in the code. To do this you can click on the Run button up on the upper right side of the editing pane. You can also use the key binding: Ctrl+Shift+Enter on Windows or command+shift+return on the Mac.

Once you run the code, you will see it appear in the R console and, in this case, the generated plot appears in the plots console. Note that the plot console has a useful interface that permits you to click back and forward across different plots, zoom in to the plot, or save the plots as files.

To run one line at a time instead of the entire script, you can use Control-Enter or Windows and command-return on the Mac.

Global options

You can change the look and functionality of RStudio quite a bit.

To change the global options you click on Tools then Global Options….

As an example we show how to change the appearance of the editor. To do this click on Appearance and then notice the Editor theme options.

You can click on these and see examples of how your editor will look.

I personally like the Cobalt option. This makes your editor look like this:

As a second example, we show how to make a change that we highly recommend. And it is to change the Save workspace to .RData on exit to Never and uncheck the Restore .RData into workspace at start. By default, R saves all the objects you have created into a file called .RData. This is done so that when you restart the session in the same folder, it will load these objects. We find that this causes confusion especially when we share code with colleagues and assume they have this .RData file. To change these options, make your General settings look like this:

Keeping organized with RStudio Projects

A data analysis project is not always a dataset and a script. It often involves several scripts, the data may be saved across several files, and it is often convenient to save intermediate files. RStudio projects provide a way to keep all this organized in one folder. We will later learn how RStudio facilitates sharing work in these projects.

To organize yourself on a computer, it is essential that you understand how your filesystem is organized. A systematically organized filesystem can greatly increase your productivity, especially if you work on more than one project at at time. In a later section, we explain how Unix provides a powerful tool to help you with this. In this section, we will create a folder in a default location for illustrative purposes. Once you become a regular R user, you will want to think carefully about the best location for the folder in which you will keep a new project.

To start a project you click and File and the New Project

Unless you have a pre-selected folder to save the work, you will select the New Directory option.

Then, for a data analysis project, you usually select the New Project option:

Now you will have to decide on the location of the folder that will be associated with your project as well as the name of the folder. When choosing a folder name, just like with file names, make sure it is a meaningful name that will help you remember what the project is about. As with files, we recommend using lower case letters, no spaces, and using hyphens to separate words. We will call the folder for this project my-first-project. Note that this will generate a file called my-first-project.Rproj in the folder associated with the project. We will see how this is useful a few lines below.

You will be given options on where this folder should be on your file system. In this example, we will place it in our home folder, but this is generally not a good practice. As we describe in more detail later, you want organize your file system following a hierarchical approach and you might have a folder called projects where you keep a folder for each project.

Now when you start using RStudio with a project, you will see the project name in the upper left corner. This will remind you what project this particular RStudio session belongs to. When you open an RStudio session with no project, it will say Project: (None).

When working on a project all files will be saved and searched for in the folder associated with the project. Below, we show an example of a script that we wrote and saved with the name code.R. Because we used a meaningful name for the project, we can be a bit less informative when we save the files. Although we do not do it here, you can have several scripts open at once. You simply need to click File, then New File and pick the type of file you want to edit.

One of the main advantages of using Projects is that after closing RStudio, to continue where we left off on the project, we simply double click or open the file saved when we first created the RStudio project. In this case, the file is called my-first-project.Rproj.

If we open this file, RStudio will start up and open the scripts we were editing.

Using Git and GitHub in RStudio

We are now ready to clone a repo, start editing files on our computer and syncing to GitHub. We will use RStudio to facilitate this. We will also use Unix for the first time! A first step is to let Git know who we are. This will make it easier to connect with GitHub. We start by opening a terminal window in RStudio (remember you can get one through Tools in the menu bar). Now we use the git config command to tell Git who we are. We will type the following two commands in our terminal window:

git config --global user.name "Your Name"
git config --global user.mail "your@email.com"

You need to use the email account you used to open your GitHub account. The RStudio sessions should look something like this:

Now we are ready to start a RStudio project that uses version control and stores the code on a GitHub repo. To do this, we start a project, but instead of New Directory we will instead select Version Control:

Then we will select Git as our version control system:

The repository URL is the link you used to clone. Above we used the https://github.com/username/homework-0.git as an example. In the project directory name you need to put the name of the folder that was generated, which in our example will be the name of the repo homework-0. This will create a folder called homework-0 on your local system.

Once you do this, the project is created and it is aware of the connection to a GitHub repo. You will see on the top right corner the name and type of project and a new tab on the upper right pane titled Git.

If you select this tab, it will show you the files on your project with some icons that give you information about these files and their relationship to the repo. In the example below, we already added a file to the folder, called code.R which you can see in the editing pane.

We now need to pay attention to the Git pane. It is important to know that your local files the GitHub repo will not be synched automatically. [AN: double check what is in pink, perhaps add a comma?] You have to do this when you are ready. To truly understand what is [AN: algo debe seguir “what is”?] we need to learn more details about Git, and we will do so a bit later. Right now, we will quickly show you how to sync with this simple example.

The main actions in Git are to:

  1. pull changes from the remote repo, in this case the GitHub repo,
  2. add files, or as we say in the Git lingo Stage files,
  3. commit changes to the local repo and
  4. push changes to the remote repo, in our case the GitHub repo.

Before we start working on a collaborative project, usually the first thing we do is pull in the changes from the remote repo, in our case the one on GitHub. However, for the example shown here, since we are starting with an empty repo and we are the only ones making changes, we don’t need to start by pulling.

In RStudio, the status of the file as it relates to the remote and local repos are represent the status symbols and colors [AN: “are represent”?]. A yellow square means that Git knows nothing about this file. To sync with the GitHub repo, we need to add the file, then commit the change to our local Git repo, then push the change to the GitHub repo. Right now, the file is just on our computer. To add the file using RStudio, we click the Stage box. You will see that the status icon now changes to a green A.

Note: we are only adding the code.R file. We don’t necessarily need to add all the files in our local repo to the GitHub repo, only the ones we want to keep track of or the ones we want to share. If our work is producing files of a certain type that we do not want to keep track off, we can add the suffix that defines these files to the .gitignore file. More details on using .gitignore are included here. These files will stop appearing in your RStudio Git pane. For the example shown here, we will only be adding the code.R. But, in general, for an RStudio project, we recommend adding both the .gitignore and .Rproj files.

Now we are ready to commit the file to our local repo. In RStudio, we can use the Commit button. This will open a new dialog window. With Git, whenever we commit a change, we are required to enter a comment describing the changes being committed.

In this case, we will simply describe that we are adding a new script. In this dialog box, RStudio also gives you a summary of what you are changing to the GitHub repo. In this case, because it is a new file, the entire file is highlighted as green which highlights the changes.

Once we hit the commit button, we should see a message from Git with a summary of the changes that were committed.

Now we are ready to push these changes to the GitHub repo. We can do this by clicking on the Push button on the top right corner:

We now see a message from Git letting us know that the push has succeeded.

Note that in the pop-up window we no longer see the code.R file. This is because it now new changes have been performed since [AN: quieres decir “no new changes”?]. We can exit this pop-up window now and continue working on our code.

If we now visit our repo on the web, we will see that it matches our local copy.

Congratulations you have successfully created a GitHub code repository! Soon we will learn how to use this to keep organized while sharing our code. Before we continue learning about Git, we will provide a brief introduction to Unix and how it is used to keep organized. [AN: note both sentences include “to keep organized”]