fast sequential Markov coalescent simulation of genomic data under complex evolutionary models

Additional scrpts for the preparation of input files and the analysis of results

We list below a series of R and bash scripts that have been developed by several people from our group to facilitate the analysis of the results of fsc27 as well as for preparing input files.

visualizing scenarios modeled in par files

A new R script called ParFileViewer.r reads par files and plots the modeled scenario. parFileViewer has been adapter to reflect the latest chages in fsc27. this script can be useful to check that the modeled scenario corresponds to what was intended. It can also be useful to visualize the scenario obtained after some parameter estimation (use the *_maxL.par file). More information on the use of the this program can be found here.

shiny app

Alexandre Gouy has written a web application based on the R/Shiny framework h to facilitate the interpretation of fastsimcoal results. It features various visualisations: the demographic model, observed and expected marginal and joint SFS, as well as likelihood comparisons and parameter estimates dispersions if multiple runs are provided. Source code can be found gitHub. It can be used locally or directly from our CMPG web site.

running multiple instances of fsc

Laurent Excoffier and Nina Marchi have written a bash script to launch several instances of fsc under the SLURM worload manager. It requires .tpl, .est and .obs SFS files plus fastsimcoal program.
Vitor Sousa has written a similar bash script to run it on a single multi-core machine, with a an associated readme file here

processing demographic inference output

Several R scripts have been written by Vitor Sousa, Nina Marchi, and Laurent Excoffier to process the output of demographic inferences. They are found in the zip files "processFscOutput.zip" and "block bootstrap and model choice.zip"

file converters

David Alexander Marques (DAM) , Vitor Sousa (VS), Nina Marchi (NM) and Laurent Excoffier (LE) have written several file conversion scripts:

An Rscript (by DAM) SFStool.R to
1) convert: site frequency spectra (SFS), either multidimensional or 2D-SFS to marginal 2D-SFS or 1D-SFS
2) visualize: 2D-SFS and 1D-SFS, including comparison between observed SFS (ending with .obs) and expected SFS (ending with .txt) if both are available in a folder
A Python script (sampleKgenotypesPerPop.py, by DAM) to prepare genotypes in a VCF for conversion into an SFS, in which no missing data is allowed. The script randomly draws a user-defined number (K) of non-missing genotypes per population from a VCF file and thereby subsamples genotypes from each population into a new VCF with pseudo-individuals. Sites with less than k genotypes per population are discarded. The script uses a population file with format Individual<TAB>Population, one per line, and allows to specify different K's for different populations.
A Python script (vcf2sfs.py, by DAM) which converts VCF files into site frequency spectra (SFS) files of the respective dimensions in fastsimcoal2 format (1D, 2D or multi-D, depending on the number of populations in the VCF file) by simply counting alleles from genotypes for sites without missing data. It can also compute SFS for non-overlapping windows in the genome (defined by distance along the chromosome or number of sequenced sites in the VCF file), compute bootstrap replicates of the SFS (by resampling single sites) or block-bootstrap replicates of the SFS (by resampling non-overlapping windows in the genome). IMPORTANT: Sites with missing data are discarded, so use of sampleKgenotypesPerPop.py (see above) recommended prior to get rid of missing data.
A Python script (foldSFS.py, by DAM) that converts unfolded SFS of any dimension (*_DAF*, *_jointDAF*, *_DSFS*) into their folded equivalent (*_MAF*, *_jointMAF*, *_MSFS*). Currently supports only SFS files containing a single SFS.
A series of bash and R scripts to convert VCF files to SFS by VS. They are contained in the following zip file buildSFSFromVCF.zip
An R script (by NM and LE) to transform a genotype table, as newly produced by fsc27, into a multidimensional SFS. This script computes from a genotype table the allele frequencies in user-defined populations and then build a multidimensional SFS with populations listed in the order defined by the user. The script uses a genotype table file with genomic positions as rows, diploid individuals as columns, no missing data and genotype entries coded as 0 (homozygote ancestral), 1 (heterozygote), and 2 (homozygote derived). It also computes the number of sites, monomorphic and polymorphic, to be used to compute the SFS (the monomorphic sites do not have to be present in the genotype table).

Last updated by L. Excoffier on 15.04.2021