1 Intro

This repository contains the complete workflow used in the paper “Rapid radiation in a highly diverse marine environment.” The individual chapters of this documentation follow the separate main steps of the workflow. Each of the chapters thus refers to an individual prefix in the git x.x references of the papers method section. The individual steps partly depend on each other - especially git 1 - git 3 should be executed in order and before the other steps.

1.2 Prerequesites

All scripts assume two variables to be set within the bash environment:

  • $BASE_DIR is assumed to point to the base folder of this repository
  • $SFTWR is a folder that contains all the software dependencies that are used within the scripts

The analysis is controlled using the workflow manager nextflow and uses slightly different configurations across the individual pipelines. The exact commands used to execute the analysis during the development of the publication are stored within the aliases set within sh/nextflow_alias.sh.

Furthermore, external dependencies need to be downloaded and deployed at the expected places (s. README.md at the ressources folder).

1.3 Figures

The creation of the figures is bundled in a single script (git 20) which can be executed once all nextflow scripts have successfully run.

cd $BASE_DIR
bash sh/create_figures.sh

This is basically just a wrapper script that will run all scripts located under $BASE_DIR/R/fig. Under this location, you will find one R script per figure (and suppl. figure). So if you are only interested in a single figure - that is the place to start looking.

Furthermore, a more detailed documentation exists for all the figure scripts used for the manuscript:

F1, F2, F3, F4 F5 and F6

as well as for all the supplementary figures:

SF1, SF2, SF3, SF4, SF5, SF6, SF7, SF8, SF9, SF10, SF11, SF12, SF13, SF14, SF15, SF16, SF17, SF18, SF19, SF20 and SF21.

1.4 R setup

There is an additional R package needed to run the plotting scripts for the figures ({GenomicOriginsScripts}). This depends on several non-CRAN R-packages, so to be able to install the package successfully, the following packages will also need to be installed:

# installing non-CRAN dependencies
install.packages("remotes")
remotes::install_bioc("rtracklayer")
remotes::install_github("YuLab-SMU/ggtree")
remotes::install_github("k-hench/hypogen")
remotes::install_github("k-hench/hypoimg")
# installing GenomicOriginsScripts
remotes::install_github("k-hench/GenomicOriginsScripts")

Once these non-CRAN packages are installed, it should be possible to re-create the used R environment using the {renv} package. After opening the RStudio project (hamlet_radiation.Rproj), call:

# restoring R environment
install.packages("renv")
renv::restore()

Apart from the specific R packages that can be retrieved via {renv} from the renv.lock file, the used R setup at the at time of compilation is as follows:

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.3    magrittr_2.0.1    bookdown_0.19     htmltools_0.5.1.1
##  [5] tools_4.0.3       yaml_2.2.1        stringi_1.5.3     rmarkdown_2.7.6  
##  [9] knitr_1.31        stringr_1.4.0     digest_0.6.27     xfun_0.22        
## [13] rlang_0.4.10      evaluate_0.14