9 Supplementary Figure 03
9.1 Summary
This is the accessory documentation of Supplementary Figure 03.
The Figure can be recreated by running the R script S03.R
:
cd $WORK/3_figures/F_scripts
Rscript --vanilla S03.R
rm Rplots.pdf
9.2 Details of S03.R
In principal the S03.R
script is a reduced versions of the F2.R
script. Is an executable R script that depends on a variety of image manipulation and data managing packages.
In the following the differences to F3.R
are highlighted. Obvious changes (different data input files, axis labels, y range) are omitted.
Apart from these trivial changes, the genome wide summary of the statistic is omitted.
Therefore, the following lines of the original Script (F2.R
) were removed from S03.R
:
gwFST <- read.csv('../../2_output/08_popGen/05_fst/genome_wide_weighted_mean_fst.txt',sep='\t')
gwFST$RUN <- factor(gwFST$run,levels=levels(data$RUN))
#....#
p1 <- ggplot()+
#....#
geom_vline(data = data.frame(x=567000000),aes(xintercept=x),col=annoclr,lwd=.2)+
geom_rect(data=gwFST,aes(xmin=570*10^6,xmax=573*10^6,ymin=0,ymax=gwFST*secScale))+
#....#
Furthermore, the generation of the outlier annotation changed. While in the FST script, the outlier where determined from the loaded data and later exported, within the G x P script the the highlights don’t refer to G x P outliers, but the old FST outlier are imported and plotted for reference.
data2 <- read.csv('../../0_data/0_resources/fst_outlier_windows.txt', sep='\t')
musks <- read.csv('../../0_data/0_resources/fst_outlier_ID.txt', sep='\t')
This replaced the original section that computes the outlier windows within F2.R
:
threshs <- data %>% group_by(RUN) %>% summarise(thresh=quantile(WEIGHTED_FST,.9998))
data2 <- data %>% filter(RUN %in% levels(data$RUN)[1:3]) %>% merge(.,threshs,by='RUN',all.x=T) %>%
mutate(OUTL = (WEIGHTED_FST>thresh)) %>% filter(OUTL) %>% group_by(RUN) %>%
mutate(CHECK=cumsum(1-(BIN_START-lag(BIN_START,default = 0)==5000)),ID=paste(RUN,'-',CHECK,sep='')) %>%
ungroup() %>% group_by(ID) %>%
summarise(CHROM=CHROM[1],
xmin = min(BIN_START+GSTART),
xmax=max(BIN_END+GSTART),
xmean=mean(GPOS),
LGmin=min(BIN_START),
LGmax=min(BIN_END),
LGmean=min(POS),
RUN=RUN[1],COL=COL[1]) %>%
ungroup() %>% mutate(muskS = letters[as.numeric(as.factor(xmin))],
musk=LETTERS[c(1,3,4,4,1,2,2,3)])
musks <- data2 %>% group_by(musk) %>% summarise(MUSKmin=min(xmin),
MUSKmax=max(xmax),
MUSKminLG=min(LGmin),
MUSKmaxLG=max(LGmax)) %>%
merge(data2,.,by='musk') %>% mutate(muskX = ifelse(musk %in% c("A","B"),
MUSKmin-1e+07,MUSKmax+1e+07))
The final plot, as produced by the S03.R
script: