About¶
In Bioinformatics the computer is our lab. We perform experiments, by trying different software, playing around with parameters to tweak analyses and we write code to combine different analyses into large workflows. We also develop new tools which then need to be tested extensively to identify and remove problematic behaviour and bugs. With having to do all that, it is easy to forget that analyses could have to be repeated at a later time, similar to experiments in the wet-lab. It can happen easily that after some time has passed (eg. when results need to be brought into a publishable form), it is unclear which software was used, how the data was treated or which settings have been used. This has lead to a tremendous decline in reproducibility of bioinformatic focused studies.
In this course we will introduce you to concepts on how to improve the reproducibility of your work. We will start with data organization and documentation and we will work our way through version control systems, virtual environments and software encapsulation to workflow management. By applying these tools, your bioinformatic workflows can become more transparent, easier to share and easier to reproduce.
Syllabus¶
Day 1 - Monday |
Day 2 - Tuesday |
Day 3 - Wednesday |
|
|---|---|---|---|
Topic |
Theory: Reproducibility in Bioinformatics |
Theory: Software encapsulation and its limits |
Theory: Workflow management systems |
Exercise 1 |
Refresh commandline skills |
Virtual environments: Conda |
GNU Make, Snakemake and Nextflow |
Exercise 2 |
Installing software reproducibly |
Containerization basics |
|
Topic |
Theory: Data organization, documentation and version control |
Theory: Creating own containers |
Theory: Distributing workflows to different systems |
Exercise 1 |
Data organization and documentation: Markdown, YAML |
Advanced containerization |
Creating a self-sustained pipeline transferable to other systems |
Exercise 2 |
Version control with Git |
Container pitfalls |
Instructors¶
Christoph Hahn (University of Graz, Austria) – GitHub Twitter