1 Intro

Bioinformatic analysis can be quite messy at times - complex data wrangling operations, multiple versions of the same analysis (Supervisor: ‘Hey - we should add this bit!’; later: ‘Hey - we should drop that bit…’), parallel working environments (local/ cluster) are all out for your sanity.

The following is a presentation of how I (currently) organize my stuff to not get lost. Of course there are probably a zillion of other ways you can organize your workflow and there are some aspects that might be helpful but I did not have the time yet to implement/get to know yet. Therefore this tutorial surely is biased towards my personal preferences/experience and is merely meant as template that you should tweak to your own liking.

It will cover the following topics:

  • Basic Setup: Organization of your files & how to get to the command line
  • Bash: The native Linux language (basics)
  • Cluster: How to get to the (GEOMAR) high performance computing cluster
  • Git: version control, connectivity & collaboration
  • R RStudio, RStudio projects & my favorite packages
  • Nextflow: organizing your analysis