Getting Stuff Done
2019-11-11
1 Intro
Bioinformatic analysis can be quite messy at times - complex data wrangling operations, multiple versions of the same analysis (Supervisor: ‘Hey - we should add this bit!’; later: ‘Hey - we should drop that bit…’), parallel working environments (local/ cluster) are all out for your sanity.
The following is a presentation of how I (currently) organize my stuff to not get lost. Of course there are probably a zillion of other ways you can organize your workflow and there are some aspects that might be helpful but I did not have the time yet to implement/get to know yet. Therefore this tutorial surely is biased towards my personal preferences/experience and is merely meant as template that you should tweak to your own liking.
It will cover the following topics:
- Basic Setup: Organization of your files & how to get to the command line
- Bash: The native Linux language (basics)
- Cluster: How to get to the (GEOMAR) high performance computing cluster
- Git: version control, connectivity & collaboration
- R RStudio, RStudio projects & my favorite packages
- Nextflow: organizing your analysis