Exercise 1 - Refresh command line skills¶
Github page URL: https://github.com/reslp/reproducibility-workshop/blob/main/day-1/exercise-1-shell-intro.rst
For the following three days we assume that you can navigate in a
UNIX-like file system using the command line, list, copy, rename and
remove files, and alsodisplay the contents of files. Further, we assume
that you know how to execute scripts and use shell programs like
grep, sed, cut, etc. and have a basic understanding of how
piping | works.
Let’s do a few small things to get you warmed up.
In case you need help
Solutions to the below tasks can be found here, if need be.
First, let’s connect to the server.
You’ve been provided with *.pem file that contains your users
credentials for connecting to the server. Further, you’ll need to know
the IP address of the server. This will be change every day. As an example
I use 18.237.42.108. Thirdly, you’ll need to know your username, e.g. user1.
To sum up, you’ll need:
c1.pem- make sure you know where it’s located on your computerIP address, e.g.:
18.237.42.108username, e.g.:
user1
With that, if you have ssh set up on
your computer connecting should be as easy as:
(user@host)-$ ssh -i path/to/your/c1.pem user1@18.237.42.108
You could store the relevant info also in variables, and connect like so:
(user@host)-$ pem="biorepo.pem" #your file may be called c1.pem, c2.pem, etc. depending on your user
(user@host)-$ IP="18.237.42.108" #this will change every day
(user@host)-$ user="ubuntu" #change to reflect your username, user1, user2, user3, etc.
(user@host)-$ ssh -i $pem $user@$IP #connect - confirm with yes if you connect for the first time
On Windows, you may need a third party software to connect. We recommend MobaXterm <https://mobaxterm.mobatek.net/>.
If successful you’ll find yourself connected and your prompt will look something like that:
user40@ip-172-31-4-141:~$
Your home directory should only contain a single directory at this stage.
user40@ip-172-31-4-141:~$ pwd
/home/user40
user40@ip-172-31-4-141:~$ ls
Share
Let’s create a bit of directory structure and navigate through it.
user40@ip-172-31-4-141:~$ mkdir -p linux-intro/bin
user40@ip-172-31-4-141:~$ mkdir linux-intro/data
user40@ip-172-31-4-141:~$ mkdir linux-intro/results
user40@ip-172-31-4-141:~$ cd linux-intro/data
user40@ip-172-31-4-141:~$ pwd
/home/user40/linux-intro/data
user40@ip-172-31-4-141:~$ cd
Task 1
Copy a file called README.md from a directory called data in ~/Share/linux-intro to your directory linux-intro/data.
Make sure to retain the timestamp of the original file.
Task 2
Get an overview of what the original directory structure looks like with the tree command tree ~/Share/linux-intro (example)
Copy the directory subfolder1 and all its content from ~/Share/linux-intro/data/Day1/ to your directory linux-intro/data, considering the following:
make sure to also bring about the entire directory structure from
Day1onwards, so that you getlinux-intro/data/Day1/subfolder1do not copy subfolders
subfolder2andsubfolder3inDay1.keep original timestamps
Now, let’s add a line of text to the file linux-intro/data/README.md
user40@ip-172-31-4-141:~$ echo "Add some text" >> linux-intro/data/README.md
Task 3a
Fast forward 3 months into the future. You’ve been otherwise occupied and return to the current project. You vaguely remember that you made some change to the README.md file, or did you?
Check the md5sums of the original file ~/Share/linux-intro/data/README.md and your copy linux-intro/data/README.md.
Note if you save the output of md5sum in a text file you can always check later on.
Task 3b
Fast forward 3 months into the future. You’ve been otherwise occupied and return to the current project. You vaguely remember that you made some change to the README.md file, but what did you change?
use the diff command to compare the two files ~/Share/linux-intro/data/README.md and linux-intro/data/README.md.
diff is very useful, but the output can be a bit tricky to interpret. A slightly more complex example can be found here.
Random numbers and reproducibility¶
Random numbers are common in bioinformatics software employing different kinds of heuristics. If you want to work reproducibly it’s worth knowing a few things in this context. Let’s play with that.
Print a random number between 1 and 1000 to screen.
user40@ip-172-31-4-141:~$ echo "$((1 + RANDOM % 1000))"
Task 4
Devise a for loop to generate 10 random numbers between 1 and 1000, consecutively. Repeat three times.
Task 5
Make the ‘random’ number generation reproducible by setting a seed - 42 seems to be a good choice.
Task 6
Write a bash script for the above task, and make it executable so you can execute it like so:
user40@ip-172-31-4-141:~$ ./linux-intro/bin/random_numbers.sh 10 42
Where the first number is the number of random integers between 1 and 1000 to generate and the second number is your seed.
Add the directory ./linux-intro/Day1/bin to your users ${PATH}
so that your script will be available globally.
Now you should be warmed up .. ;-)