1 Intro

Bioinformatic analysis can be quite messy at times - complex data wrangling operations, multiple versions of the same analysis (Supervisor: ‘Hey - we should add this bit!’; later: ‘Hey - we should drop that bit…’), parallel working environments (local/ cluster) are all out for your sanity.

The following is a presentation of how I (currently) organize my stuff to not get lost. Of course there are probably a zillion of other ways you can organize your workflow and there are some aspects that might be helpful but I did not have the time yet to implement/get to know yet. Therefore this tutorial surely is biased towards my personal preferences/experience and is merely meant as template that you should tweak to your own liking.

It will cover the following topics:

Basic Setup: Organization of your files & how to get to the command line
Bash: The native Linux language (basics)
Cluster: How to get to the (GEOMAR) high performance computing cluster
Git: version control, connectivity & collaboration
R RStudio, RStudio projects & my favorite packages
Nextflow: organizing your analysis

Getting Stuff Done

Getting Stuff Done

1 Intro