Advanced Unix

Most Unix implementations include a large number of powerful tools and utilities. We have just learned the very basics here. We recommend that you use Unix as your main file management tool. It will take time to become comfortable with it, but as you struggle, you will find yourself learning just by looking up solutions on the internet. In this section, we superficially cover slightly more advanced topics. The main purpose of the section is to make you aware of what is available rather than explain everything in detail.

Arguments

Most Unix commands can be run with arguments. Arguments are typically defined by using a dash - or two dashes --, followed by a letter or a word. An example of an argument is the -r in front of rm. The r stands for recursive and the result is that files and directories are removed recursively, which means that if you type:

rm -r directory-name

all files, subdirectories, files in subdirectories, subdirectories in subdirectories, etc. will be removed. This is equivalent to throwing a folder in the trash, except you can’t recover it. Once you remove it, you remove it. [AN: change second part to “Once you remove it, it is erased” or "it is deleted’?] Often, when you are removing directories, you will encounter files that are protected. In such cases, you can use the argument -f which stands for force.

You can also combine arguments. So, for instance, to remove a directory regardless of protected files, you type:

rm -rf directory-name

Now remember, once you remove there is no going back, so use this command very carefully.

A command that is often called with argument is ls. Here are some examples:

ls -a

The a stands for all. This argument makes ls show you all files in the directory, including hidden files. In Unix, all files starting with a . are hidden. Many applications create hidden directories to store important information without getting in the way of your work. An example is git. Once you initialize a directory as a git directory with git init, a hidden directory called .git is created. Another hidden file is the .gitignore file.

Another examples is:

ls -l

The l stands for long and the result is that more information about the files is shown.

It is often useful to see files in chronological order. For that we use:

ls -t

and to reverse the order of how files are shown you can use:

ls -r

We can combine all these arguments to show more information, for all files, in reverse chronological order:

ls -lart

Each command has a different set of arguments. In the next section, we learn how to find out what they each do.

Getting help

As you may have noticed, Unix uses an extreme version of abbreviations. This makes it very efficient, but hard to guess how to call commands. To make up for this weakness, Unix includes complete help files or man pages (man is short for manual). In most systems, you can type man followed by the command name to get help. So for ls, we would type:

man ls

This command is not available in some of the compact implementations of Unix, such as Git Bash. An alternative way to get help that works on Git Bash is to type the command followed by --help. So for ls, it would be as follows:

ls --help

Pipes

The help pages are typically long and if you type the commands above to see the help, it scrolls all the way to the end. It would be good [AN: “be a good idea?”] to save the help to a file and then use less to see it. The pipe, written like this |, does something similar. It pipes the results of a command to the command after the pipe. This is similar to the pipe %>% that we use in R. To get more help we thus can type:

man ls | less

or in Git Bash:

ls --help | less

This is also useful when listing files with many files. We can type:

ls -lart | less

Wild cards

One of the most powerful aspects of Unix are the wild cards. Suppose we want to remove all the temporary html files produced while trouble shooting for a project. Imagine there are dozens of files. It would be quite painful to remove them one by one. In Unix, we can actually write an expression that means all the files that end in .html. It uses the wildcards are *. [AN: review previous sentence] To list all html files, we would type:

ls *.html

To remove all html files in a directory, we would type:

rm *.html

The other useful wild card is the ? symbol. This means any character. So if all the files we want to erase have the form file-001.html with the numbers going from 1 to 999, we can type:

rm file-???.html

This will only remove files with that format.

We can combine wild cards. For example, to remove all files with the name file-001 regardless of suffix, we can type:

rm file-???.*

Environment variables

Unix has settings that affect your command line environment. These are called environment variables. The home directory is one of them. We can actually change some of these. In Unix, variables are distinguished from other entities by adding a $ in front. The home directory is stored in $HOME.

Earlier we saw that echo prints out. [AN: review previous sentence for completeness] So we can see our home by typing:

echo $HOME

You can see them all by typing:

env

You can change some of these [AN: some of thse what?]. But this varies across different shells which we describe next.

Shells

Much of what we are doing is part of what is called the Unix shell. There are actually different shells. They are very similar, almost noticeable [AN: “noticeable”? no tiene sentido. quitar?]. But there are differences, although we do not cover those here. You can see what shell you are using by typing:

echo $SHELL

The most common one is bash.

Once you know the shell, you can change environmental variables. In Bash Shell we do it using export val value. To change the path, described in more detail soon, type: (Don’t actually run this command though!)

export PATH = /usr/bin/

There is a program that is run before each terminal starts, where you can edit variables so they change whenever you call the terminal. This changes in different implementations, but if using bash, you can create a file called .bashrc, .bash_profile,.bash_login, or .profile. You might already have one.

Executables

In Unix, all programs are files. They are called executives [AN: executives or executables?]. So ls, mv and git are all files. But where are these program files? You can find out using the command which:

which git

## /usr/bin/git

That directory is probably full of program files. The directory /usr/bin usually holds many program files. If you type:

ls /usr/bin

in your terminal, you will see several executable files.

There are other directories that usually hold program files. The Application directory in the MAC or Program Files directory in Windows are examples.

When you type ls, Unix knows to run a program in another directory. How does it know this? This information is included in the environmental variable $PATH. If you type:

echo $PATH

you will see a list of directories separated by :. The directory /usr/bin is probably one of the first ones on the list. Unix looks for program files in those directories in that order. Although we don’t teach it here, you can actually create executables yourself. However, if you put it in your working directory and this directory is not on the path, you can’t run it just by typing the command. You get around this by typing the full path. So if your command is called my-ls, you can type

./my-ls

Learning to write your own executables is useful.

Permissions and file types

If you type:

ls -l

At the beginning you will see a series of symbols like this -rw-r--r--. This string indicates the type of file: regular file -, directory d, or executable x. This string also indicates the permission of the file: is it readable? writable? execrable? [AN: is "execrable a term?] Can other users on the system read? Can other users on the system write? Can other users execute if the file is executable? This is more advanced than what we cover here, but you can learn much more in a Unix reference book.

Commands you should learn

There are many commands that are beyond the scope of this book, but we want to make you aware of them. They are:

open/start - On the mac open filename tries to figure out the right application of the filename and open it with that application. This is a very useful command. On Git Bash you can try start filename. Try opening an R or Rmd file with open or start: it should open them with RStudio.
nano - A bare-bones editor.
ln - create a symbolic link. We do not recommend its use, but you should be familiar with it.
tar - archive files and subdirectories of a directory into one file.
ssh - connect to another computer.
grep - search for patterns in a file.
awk/sed - These are two very powerful commands that permit you to find specific strings in files and change them.

File manipulation in R

We can also perform file management from within R. The key functions to learn about can be seen by looking at the help file for ?files. Another useful function is unlink.

Although not recommended, note that you can run Unix commands in R using system.