======================================= Exercise 1 - Refresh command line skills ======================================= Github page URL: `https://github.com/reslp/reproducibility-workshop/blob/main/day-1/exercise-1-shell-intro.rst `_ For the following three days we assume that you can navigate in a UNIX-like file system using the command line, list, copy, rename and remove files, and alsodisplay the contents of files. Further, we assume that you know how to execute scripts and use shell programs like ``grep``, ``sed``, ``cut``, etc. and have a basic understanding of how piping ``|`` works. Let’s do a few small things to get you warmed up. .. admonition:: In case you need help Solutions to the below tasks can be found `here `_, if need be. First, let’s connect to the server. You’ve been provided with ``*.pem`` file that contains your users credentials for connecting to the server. Further, you'll need to know the IP address of the server. This will be change every day. As an example I use ``18.237.42.108``. Thirdly, you'll need to know your username, e.g. ``user1``. To sum up, you'll need: - ``c1.pem`` - make sure you know where it's located on your computer - IP address, e.g.: ``18.237.42.108`` - username, e.g.: ``user1`` With that, if you have ``ssh`` set up on your computer connecting should be as easy as: .. code:: bash (user@host)-$ ssh -i path/to/your/c1.pem user1@18.237.42.108 You could store the relevant info also in variables, and connect like so: .. code:: bash (user@host)-$ pem="biorepo.pem" #your file may be called c1.pem, c2.pem, etc. depending on your user (user@host)-$ IP="18.237.42.108" #this will change every day (user@host)-$ user="ubuntu" #change to reflect your username, user1, user2, user3, etc. (user@host)-$ ssh -i $pem $user@$IP #connect - confirm with yes if you connect for the first time On Windows, you may need a third party software to connect. We recommend `MobaXterm `. If successful you’ll find yourself connected and your prompt will look something like that: .. code:: bash user40@ip-172-31-4-141:~$ Your home directory should only contain a single directory at this stage. .. code:: bash user40@ip-172-31-4-141:~$ pwd /home/user40 user40@ip-172-31-4-141:~$ ls Share Let’s create a bit of directory structure and navigate through it. .. code:: bash user40@ip-172-31-4-141:~$ mkdir -p linux-intro/bin user40@ip-172-31-4-141:~$ mkdir linux-intro/data user40@ip-172-31-4-141:~$ mkdir linux-intro/results .. code:: bash user40@ip-172-31-4-141:~$ cd linux-intro/data user40@ip-172-31-4-141:~$ pwd /home/user40/linux-intro/data user40@ip-172-31-4-141:~$ cd .. admonition:: Task 1 Copy a file called ``README.md`` from a directory called ``data`` in ``~/Share/linux-intro`` to your directory ``linux-intro/data``. - Make sure to retain the timestamp of the original file. .. admonition:: Task 2 Get an overview of what the original directory structure looks like with the tree command ``tree ~/Share/linux-intro`` (`example `_) Copy the directory ``subfolder1`` and all its content from ``~/Share/linux-intro/data/Day1/`` to your directory ``linux-intro/data``, considering the following: - make sure to also bring about the entire directory structure from ``Day1`` onwards, so that you get ``linux-intro/data/Day1/subfolder1`` - do not copy subfolders ``subfolder2`` and ``subfolder3`` in ``Day1``. - keep original timestamps Now, let's add a line of text to the file ``linux-intro/data/README.md`` .. code:: bash user40@ip-172-31-4-141:~$ echo "Add some text" >> linux-intro/data/README.md .. admonition:: Task 3a Fast forward 3 months into the future. You've been otherwise occupied and return to the current project. You vaguely remember that you made some change to the ``README.md`` file, or did you? Check the md5sums of the original file ``~/Share/linux-intro/data/README.md`` and your copy ``linux-intro/data/README.md``. Note if you save the output of ``md5sum`` in a text file you can always check later on. .. admonition:: Task 3b Fast forward 3 months into the future. You've been otherwise occupied and return to the current project. You vaguely remember that you made some change to the ``README.md`` file, but what did you change? use the ``diff`` command to compare the two files ``~/Share/linux-intro/data/README.md`` and ``linux-intro/data/README.md``. ``diff`` is very useful, but the output can be a bit tricky to interpret. A slightly more complex example can be found `here `_. Random numbers and reproducibility ================================== Random numbers are common in bioinformatics software employing different kinds of heuristics. If you want to work reproducibly it's worth knowing a few things in this context. Let's play with that. Print a random number between 1 and 1000 to screen. .. code:: bash user40@ip-172-31-4-141:~$ echo "$((1 + RANDOM % 1000))" .. admonition:: Task 4 Devise a for loop to generate 10 random numbers between 1 and 1000, consecutively. Repeat three times. .. admonition:: Task 5 Make the ‘random’ number generation reproducible by setting a seed - **42** seems to be a good choice. .. admonition:: Task 6 Write a bash script for the above task, and make it executable so you can execute it like so: .. code:: bash user40@ip-172-31-4-141:~$ ./linux-intro/bin/random_numbers.sh 10 42 Where the first number is the number of random integers between 1 and 1000 to generate and the second number is your seed. Add the directory ``./linux-intro/Day1/bin`` to your users ``${PATH}`` so that your script will be available globally. Now you should be warmed up .. ;-)