Script repository
(Hench et al. supplement)
2022-01-14
1 Intro
This repository contains the complete workflow used in the paper “Rapid radiation in a highly diverse marine environment.” The individual chapters of this documentation follow the separate main steps of the workflow. Each of the chapters thus refers to an individual prefix in the git x.x references of the papers method section. The individual steps partly depend on each other - especially git 1 - git 3 should be executed in order and before the other steps.
1.1 Analysis
A documentation of the data preparation and the data analysis (git 1.x - 19.x) can be found at:
- git 1.x: Genotyping
- git 2.x: Genotyping all base pairs
- git 3.x: Analysis (FST & GxP)
- git 4.x: Analysis (dXY & \(\pi\))
- git 5.x: Analysis (topolgy weighting)
- git 6.x: Analysis (\(\rho\))
- git 7.x: Analysis (PCA)
- git 8.x: Analysis (demographic history)
- git 9.x: Analysis (hybridization)
- git 10.x: Analysis (admixture)
- git 11.x: Analysis (allele age)
- git 12.x: Analysis (FST permutation)
- git 13.x: Analysis (whg phylogeny)
- git 14.x: Analysis (outlier region phylogeny)
- git 15.x: Analysis (\(\pi\) with/without outlier regions)
- git 16.x: Analysis (IBD)
- git 17.x: Analysis (dstats)
- git 18.x: Genotyping all base pairs (mtDNA and unplaced Contigs)
- git 19.x: Analysis (Serraninae phyologeny)
1.2 Prerequesites
All scripts assume two variables to be set within the bash environment:
$BASE_DIR
is assumed to point to the base folder of this repository$SFTWR
is a folder that contains all the software dependencies that are used within the scripts
The analysis is controlled using the workflow manager nextflow
and uses slightly different configurations across the individual pipelines. The exact commands used to execute the analysis during the development of the publication are stored within the aliases set within sh/nextflow_alias.sh
.
Furthermore, external dependencies need to be downloaded and deployed at the expected places (s. README.md at the ressources
folder).
1.3 Figures
The creation of the figures is bundled in a single script (git 20) which can be executed once all nextflow
scripts have successfully run.
cd $BASE_DIR
bash sh/create_figures.sh
This is basically just a wrapper script that will run all scripts located under $BASE_DIR/R/fig
.
Under this location, you will find one R
script per figure (and suppl. figure).
So if you are only interested in a single figure - that is the place to start looking.
Furthermore, a more detailed documentation exists for all the figure scripts used for the manuscript:
as well as for all the supplementary figures:
SF1, SF2, SF3, SF4, SF5, SF6, SF7, SF8, SF9, SF10, SF11, SF12, SF13, SF14, SF15, SF16, SF17, SF18, SF19, SF20 and SF21.
1.4 R setup
There is an additional R package needed to run the plotting scripts for the figures ({GenomicOriginsScripts}). This depends on several non-CRAN R-packages, so to be able to install the package successfully, the following packages will also need to be installed:
# installing non-CRAN dependencies
install.packages("remotes")
::install_bioc("rtracklayer")
remotes::install_github("YuLab-SMU/ggtree")
remotes::install_github("k-hench/hypogen")
remotes::install_github("k-hench/hypoimg")
remotes# installing GenomicOriginsScripts
::install_github("k-hench/GenomicOriginsScripts") remotes
Once these non-CRAN packages are installed, it should be possible to re-create the used R environment using the {renv} package.
After opening the RStudio project (hamlet_radiation.Rproj
), call:
# restoring R environment
install.packages("renv")
::restore() renv
Apart from the specific R packages that can be retrieved via {renv} from the renv.lock
file, the used R setup at the at time of compilation is as follows:
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_4.0.3 magrittr_2.0.1 bookdown_0.19 htmltools_0.5.1.1
## [5] tools_4.0.3 yaml_2.2.1 stringi_1.5.3 rmarkdown_2.7.6
## [9] knitr_1.31 stringr_1.4.0 digest_0.6.27 xfun_0.22
## [13] rlang_0.4.10 evaluate_0.14