1 Intro

This repository contains a cleaned up version of the scripts used in the paper “Association between vision and pigmentation genes during genomic divergence”. It documents the entire progression from raw data to the final manuscript figures.

A visual overview of the process is given in Workflow.

1.1 Data

The raw data used within the study is stored at the European Nucleotide Archive (ENA). It can be retrieved using the project accesion number PRJEB27858. This includes the raw data used for the genome assembly, the resequencing data used for the population genetic analysis as well as the RNA sequencing data.

External data that is used within the scripts can not be provided (eg. the stickleback reference genome) and needs to be accessed independently.

1.2 Figures

A more detailed documentation exists for all the figures of the manuscript:

F1, F2, F3 & F4

as well as for all the supplementary figures:

S01, S02, S03, S05, S06, S07, S08, S09, S10, S11, S12, S13, S14, S15, S16 & S17

The only exception to this is the supplementary figure S04. This figure is a byproduct of the anchoring step during the assembly and was produced by the Allmaps software. Afterwards, Inkscape was used to adjust the coloration and labels of the linkage maps.

1.3 Background

All scripts assume two variables to be set within the bash environment:

  • $WORK is assumed to point to the base folder of this repository
  • $SFTWR is a folder that contains all the software dependencies that are used within the scripts

The dependencies need to be downloaded and installed separately.

The scripts are organized/ numbered in chronological order. Multiple scripts with equal numbers (eg. 2.2.4.pca_bel.sh, 2.2.4.pca_hon.sh & 2.2.4.pca_pan.sh) usually work on parallel branches of the process and can be executed in parallel. In contrast to this, scripts with higher numbers usually depend on the output of scripts with lower numbers and should therefore be executed afterwards.

Most of the scripts start with a comment block that defines the requested resources for the used computer cluster:

#PBS -l elapstim_req=<runtime>
#PBS -l memsz_job=<memory>
#PBS -b <threads>
#PBS -l cpunum_job=<cores>
#PBS -N <job-name>
#PBS -q <job-que>
#PBS -o <stdout-log>.stdout
#PBS -e <stderr-log>.stderr