19 (git 18) Genotyping III (all callable sites for mtDNA and unplaced contigs)
This pipeline can be executed as follows:
cd $BASE_DIR/nf/18_genotyping_all_basepairs_mt
source ../../sh/nextflow_alias.sh
nf_run_allbp_mt1
19.1 Summary
The genotyping procedure is controlled by the nextflow script genotyping_all_basepairs_mt.nf
(located under $BASE_DIR/nf/18_genotyping_all_basepairs_mt
).
Based on an intermediate step from genotyping.nf
(git 1.10), this script produces a data set that includes all callable sites - that is SNPs as well a invariant sites that are covered by sequence (for mtDNA and unplaced contigs).
The genotypes produced by this script are then used in the Serraninae phylogeny.
19.2 Details of genotyping_all_basepairs_mt.nf
19.2.1 Data preparation
The nextflow script starts with a small header and then imports the joint genotyping likelihoods for all samples produced by genotyping.nf
.
Furthermore a channel is created to call mtDNA and unplaced contigs seperately.
#!/usr/bin/env nextflow
// git 18.1
// open genotype likelyhoods
Channel
.fromFilePairs("../../1_genotyping/1_gvcfs/cohort.g.vcf.{gz,gz.tbi}")
.set{ vcf_cohort }
Channel
.from(["LG_M", "unplaced"])
.set{ lg_mode }
The samples are jointly genotyped, independently for mtDNA and unplaced contigs and including invariant sites.
// git 18.2
// actual genotyping step (including invariant sites)
process joint_genotype_snps {
label "L_88g48h_LGs_genotype"
publishDir "../../1_genotyping/2_raw_vcfs/", mode: 'copy'
input:
set vcfId, file( vcf ), val( mode ) from vcf_cohort.combine( lg_mode )
output:
set file( "all_sites.${mode}.vcf.gz" ), file( "all_sites.${mode}.vcf.gz.tbi" ), val( mode ) into ( all_bp_non_lg_1, all_bp_non_lg_2 )
script:
if( mode == 'unplaced' )
"""
gatk --java-options "-Xmx85g" \
GenotypeGVCFs \
-R=\$BASE_DIR/ressources/HP_genome_unmasked_01.fa \
-XL=LG01 \
-XL=LG02 \
-XL=LG03 \
-XL=LG04 \
-XL=LG05 \
-XL=LG06 \
-XL=LG07 \
-XL=LG08 \
-XL=LG09 \
-XL=LG10 \
-XL=LG11 \
-XL=LG12 \
-XL=LG13 \
-XL=LG14 \
-XL=LG15 \
-XL=LG16 \
-XL=LG17 \
-XL=LG18 \
-XL=LG19 \
-XL=LG20 \
-XL=LG21 \
-XL=LG22 \
-XL=LG23 \
-XL=LG24 \
-XL=LG_M \
-V=${vcf[0]} \
-O=intermediate.vcf.gz \
--include-non-variant-sites=true \
--allow-old-rms-mapping-quality-annotation-data
gatk --java-options "-Xmx85G" \
SelectVariants \
-R=\$BASE_DIR/ressources/HP_genome_unmasked_01.fa \
-V=intermediate.vcf.gz \
--select-type-to-exclude=INDEL \
-O=all_sites.${mode}.vcf.gz
rm intermediate.*
"""
else if( mode == 'LG_M' )
"""
gatk --java-options "-Xmx85g" \
GenotypeGVCFs \
-R=\$BASE_DIR/ressources/HP_genome_unmasked_01.fa \
-L=${mode} \
-V=${vcf[0]} \
-O=intermediate.vcf.gz \
--include-non-variant-sites=true \
--allow-old-rms-mapping-quality-annotation-data
gatk --java-options "-Xmx85G" \
SelectVariants \
-R=\$BASE_DIR/ressources/HP_genome_unmasked_01.fa \
-V=intermediate.vcf.gz \
--select-type-to-exclude=INDEL \
-O=all_sites.${mode}.vcf.gz
rm intermediate.*
"""
}