Exercise 1 - Refresh command line skills¶

Github page URL: https://github.com/reslp/reproducibility-workshop/blob/main/day-1/exercise-1-shell-intro.rst

For the following three days we assume that you can navigate in a UNIX-like file system using the command line, list, copy, rename and remove files, and alsodisplay the contents of files. Further, we assume that you know how to execute scripts and use shell programs like grep, sed, cut, etc. and have a basic understanding of how piping | works.

Let’s do a few small things to get you warmed up.

In case you need help

Solutions to the below tasks can be found here, if need be.

First, let’s connect to the server.

You’ve been provided with *.pem file that contains your users credentials for connecting to the server. Further, you’ll need to know the IP address of the server. This will be change every day. As an example I use 18.237.42.108. Thirdly, you’ll need to know your username, e.g. user1. To sum up, you’ll need:

c1.pem - make sure you know where it’s located on your computer

IP address, e.g.: 18.237.42.108

username, e.g.: user1

With that, if you have ssh set up on your computer connecting should be as easy as:

(user@host)-$ ssh -i path/to/your/c1.pem user1@18.237.42.108

You could store the relevant info also in variables, and connect like so:

(user@host)-$ pem="biorepo.pem" #your file may be called c1.pem, c2.pem, etc. depending on your user
(user@host)-$ IP="18.237.42.108" #this will change every day
(user@host)-$ user="ubuntu" #change to reflect your username, user1, user2, user3, etc.
(user@host)-$ ssh -i $pem $user@$IP #connect - confirm with yes if you connect for the first time

On Windows, you may need a third party software to connect. We recommend MobaXterm <https://mobaxterm.mobatek.net/>.

If successful you’ll find yourself connected and your prompt will look something like that:

user40@ip-172-31-4-141:~$

Your home directory should only contain a single directory at this stage.

user40@ip-172-31-4-141:~$ pwd
/home/user40
user40@ip-172-31-4-141:~$ ls
Share

Let’s create a bit of directory structure and navigate through it.

user40@ip-172-31-4-141:~$ mkdir -p linux-intro/bin
user40@ip-172-31-4-141:~$ mkdir linux-intro/data
user40@ip-172-31-4-141:~$ mkdir linux-intro/results

user40@ip-172-31-4-141:~$ cd linux-intro/data
user40@ip-172-31-4-141:~$ pwd
/home/user40/linux-intro/data
user40@ip-172-31-4-141:~$ cd

Task 1

Copy a file called README.md from a directory called data in ~/Share/linux-intro to your directory linux-intro/data.

Make sure to retain the timestamp of the original file.

Task 2

Get an overview of what the original directory structure looks like with the tree command tree ~/Share/linux-intro (example)

Copy the directory subfolder1 and all its content from ~/Share/linux-intro/data/Day1/ to your directory linux-intro/data, considering the following:

make sure to also bring about the entire directory structure from Day1 onwards, so that you get linux-intro/data/Day1/subfolder1
do not copy subfolders subfolder2 and subfolder3 in Day1.
keep original timestamps

Now, let’s add a line of text to the file linux-intro/data/README.md

user40@ip-172-31-4-141:~$ echo "Add some text" >> linux-intro/data/README.md

Task 3a

Fast forward 3 months into the future. You’ve been otherwise occupied and return to the current project. You vaguely remember that you made some change to the README.md file, or did you?

Check the md5sums of the original file ~/Share/linux-intro/data/README.md and your copy linux-intro/data/README.md.

Note if you save the output of md5sum in a text file you can always check later on.

Task 3b

Fast forward 3 months into the future. You’ve been otherwise occupied and return to the current project. You vaguely remember that you made some change to the README.md file, but what did you change?

use the diff command to compare the two files ~/Share/linux-intro/data/README.md and linux-intro/data/README.md.

diff is very useful, but the output can be a bit tricky to interpret. A slightly more complex example can be found here.

Random numbers and reproducibility¶

Random numbers are common in bioinformatics software employing different kinds of heuristics. If you want to work reproducibly it’s worth knowing a few things in this context. Let’s play with that.

Print a random number between 1 and 1000 to screen.

user40@ip-172-31-4-141:~$ echo "$((1 + RANDOM % 1000))"

Task 4

Devise a for loop to generate 10 random numbers between 1 and 1000, consecutively. Repeat three times.

Task 5

Make the ‘random’ number generation reproducible by setting a seed - 42 seems to be a good choice.

Task 6

Write a bash script for the above task, and make it executable so you can execute it like so:

user40@ip-172-31-4-141:~$ ./linux-intro/bin/random_numbers.sh 10 42

Where the first number is the number of random integers between 1 and 1000 to generate and the second number is your seed.

Add the directory ./linux-intro/Day1/bin to your users ${PATH} so that your script will be available globally.

Now you should be warmed up .. ;-)

Exercise 1 - Refresh command line skills¶

Random numbers and reproducibility¶

Reproducibility-in-Bioinformatics

Navigation

Related Topics