fast sequential
Markov coalescent
simulation of genomic data under complex evolutionary models
While preserving all the simulation flexibility of simcoal2,
fastsimcoal is now implemented under a faster continous-time sequential
Markovian coalescent approximation, allowing it to efficiently generate
genetic diversity for different types of markers along large genomic
regions, for both present or ancient samples. It includes a parameter
sampler allowing its integration into Bayesian or likelihood parameter
estimation procedure.
fastsimcoal can handle very complex evolutionary scenarios including an
arbitrary migration matrix between samples, historical events allowing
for population resize, population fusion and fission, admixture events,
changes in migration matrix, or changes in population growth rates. The
time of sampling can be specified independently for each sample,
allowing for serial sampling in the same or in different populations.
Different markers, such as DNA sequences, SNP, STR (microsatellite) or
multi-locus allelic data can be generated under a variety of mutation
models (e.g. finite- and infinite-site models for DNA sequences,
stepwise or generalized stepwise mutation model for STRs data,
infinite-allele model for standard multi-allelic data).
fastsimcoal can simulate data in genomic regions with arbitrary
recombination rates, thus allowing for recombination hotspots of
different intensities at any position. fastsimcoal implements a new
approximation to the ancestral recombination graph in the form of
sequential Markov coalescent allowing it to very quickly generate
genetic diversity for >100 Mb genomic segments.
fastsimcoal2
now allows one to estimate demographic
parameters from
the (joint) site frequency spectrum (SFS)using
simulations to compute the expected SFS and a robust method for the
maximization of the composite likelihood.
new version of fastsimcoal2 : fsc2603 (October
14th 2017 release - fixes a bug in ver 2.6.0.2)
fsc26 introduces several novelties like the possibility to
model inbreeding in some populations, compatibility with angsd, removal
of singletons, faster computations, several syntax changes, and
corrects some bugs
The
average inbreeding coefficient of individuals in a population can now
be specified as a third optional parameter in the sample size
definition. In this case, the sample age needs to be defined (set to
zero in most applications), as: <sample size> <sample age> <inbreeding coefficient>
Possibility to define initial parameter values for demographic inference
Option -initvalues file.pv , where file.pv
lists initial non-complex parameter values to use. This option is
mainly useful when computing bootstrap confidence intervals, as it
allows one to use less replicates for each bootstrap data set. A *.pv
file is now automatically generated after each parameter estimation by
fsc26
Computation of MAF 1D and 2D SFS with option --foldedSFS
by simply folding the corresponding unfolded SFS (for compatibility
with angsd, where the minor allele is computed separately for each SFS)
Optional faster but approximate log computations with option --logprecision n,
where n is a number between 10 and 23 specifying the precision of the
computation of logarithms. 23 means full precision and is the default
value.
Optional parameter optimization without taking singletons into account specified with option --nosingleton
Syntax changes
For parameter optimization,
-N option has been suppressed, and maximum no. of iteration is now equal to that set by the -n option
The number of cycles to performed is now fixed and only specified with option -L
The -l
option is now optional and means something different. It is now used to
specify the number of cycles where information on monomorphic sites is
used. After these initial cycles, likelihood will only be computed (and
optimized) on the polymorphic sites. This option needs to be used
together with the “reference” keyword in the .est file (see section on est file).
The –M
option is now just a flag mentioning we want to perform parameter
estimation from the observed SFS. It should therefore not be followed
by any number.
Removed -D option to produce output in dadi format.
Implementation of instantaneous bottlenecks with keyword instbotadded to historical event definition. Only works in absence of recombination for the moment.
No more warnings if deme size tends to zero or infinity if deme is empty (intoduced in ver 2.6.0.3)
Bug corrections
Expected marginal SFS were not computed when computing
expected SFS with FREQ data
Wrong likelihoods were computed with option-0
No more program crash when using large recombinaton rates
ver 2.6.0.3 corrects a bug in ver 2.6.0.2 (released on October 9th) preventing correct parameter optimization under some scenarios
See this page
for a complete
list of changes since first fastsimcoal release
benchmarks
Comparisons with other coalescent simulations programs such as ms,
simcoal2
or MaCS
can be found here
getting started
A quick overview of how to get started with fastsimcoal can be found here (but it is better to
read the manual
first)
visualizing scenarios modeld in par files
With Vitor Sousa, we have written an R script called ParFileInterpreter-v6.3.1.r
that reads par files and plots the modeled scenario, which can be
useful to check that the your modeled scenario corresponds to what you
had in mind. It can also be useful to visualize the scenario
obtained
after some parameter estimation (use the *_maxL.par file).
More information
on the use of the this program can be found here.
running fsc26 on a mac
I have realized (thanks to Melissa Wilson Sayre) that the plain version
of fsc26 will not run on mac osX unless you have installed a recent
version of gcc.
This is because fsc26 is multithreaded and it uses intel's libraries
based on openMP, which are not distributed anymore with recent versions
of mac OSX.
So to be able to run fsc26 on your mac, you need to first install a
recent version of gcc.
Extract the tar archive with the command
gunzip gcc-7.1-bin.tar.gz
Install gcc ver 5.1 in /usr/local with the command sudo tar -xvf gcc-7.1-bin.tar -C /.
problems running fsc26 on linux (kernel too old)
It seems that fsc26 is not able to run on old linux version with an old
kernel, potentially due to the need of openmp libraries that need to be
dynamically linked to the program.
A Google
group on fastsimcoal
(https://groups.google.com/forum/#!forum/fastsimcoal) has been created
to promote discussion or allow queries on any aspect of fastsimcoal.
Please use it!
citation
fastsimcoal2 andhigher:
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C., and M.
Foll
(2013) Robust demographic inference from genomic and SNP data. PLOS
Genetics, 9(10):e1003905.
fatsimcoal:
Excoffier, L. and Foll, M (2011) fastsimcoal: a continuous-time
coalescent simulator of genomic diversity under arbitrarily complex
evolutionary scenarios Bioinformatics 27: 1332-1334.