About

In Bioinformatics the computer is our lab. We perform experiments, by trying different software, playing around with parameters to tweak analyses and we write code to combine different analyses into large workflows. We also develop new tools which then need to be tested extensively to identify and remove problematic behaviour and bugs. With having to do all that, it is easy to forget that analyses could have to be repeated at a later time, similar to experiments in the wet-lab. It can happen easily that after some time has passed (eg. when results need to be brought into a publishable form), it is unclear which software was used, how the data was treated or which settings have been used. This has lead to a tremendous decline in reproducibility of bioinformatic focused studies.

In this course we will introduce you to concepts on how to improve the reproducibility of your work. We will start with data organization and documentation and we will work our way through version control systems, virtual environments and software encapsulation to workflow management. By applying these tools, your bioinformatic workflows can become more transparent, easier to share and easier to reproduce.

Syllabus

Day 1 - Monday

Day 2 - Tuesday

Day 3 - Wednesday

Topic

Theory: Reproducibility in Bioinformatics

Theory: Software encapsulation and its limits

Theory: Workflow management systems

Exercise 1

Refresh commandline skills

Virtual environments: Conda

GNU Make, Snakemake and Nextflow

Exercise 2

Installing software reproducibly

Containerization basics

Topic

Theory: Data organization, documentation and version control

Theory: Creating own containers

Theory: Distributing workflows to different systems

Exercise 1

Data organization and documentation: Markdown, YAML

Advanced containerization

Creating a self-sustained pipeline transferable to other systems

Exercise 2

Version control with Git

Container pitfalls

Instructors

Christoph Hahn (University of Graz, Austria) – GitHub Twitter

Philipp Resl (University of Graz, Austria) – GitHub Twitter