fast sequential Markov coalescent
simulation of genomic data under complex evolutionary models
While preserving all the simulation flexibility of simcoal2,
fastsimcoal is now implemented under a faster continous-time sequential
Markovian coalescent approximation, allowing it to efficiently generate
genetic diversity for different types of markers along large genomic
regions, for both present or ancient samples. It includes a parameter
sampler allowing its integration into Bayesian or likelihood parameter
estimation procedure.
fastsimcoal can handle very complex evolutionary scenarios including an
arbitrary migration matrix between samples, historical events allowing
for population resize, population fusion and fission, admixture events,
changes in migration matrix, or changes in population growth rates. The
time of sampling can be specified independently for each sample,
allowing for serial sampling in the same or in different populations.
Different markers, such as DNA sequences, SNP, STR (microsatellite) or
multi-locus allelic data can be generated under a variety of mutation
models (e.g. finite- and infinite-site models for DNA sequences,
stepwise or generalized stepwise mutation model for STRs data,
infinite-allele model for standard multi-allelic data).
fastsimcoal can simulate data in genomic regions with arbitrary
recombination rates, thus allowing for recombination hotspots of
different intensities at any position. fastsimcoal implements a new
approximation to the ancestral recombination graph in the form of
sequential Markov coalescent allowing it to very quickly generate
genetic diversity for >100 Mb genomic segments.
fastsimcoal2
now allows one to estimate demographic parameters from
the (joint) site frequency spectrum (SFS)using
simulations to compute the expected SFS and a robust method for the
maximization of the composite likelihood.
new version of fastsimcoal2 : fsc25 ver 2.5.2.21 (November 2015
release)
fsc25 ver 2.5.2.21 mainly corrects a bug introduced in
ver 2.5.2.8 preventing the implementation of exponential growth
in the first simulated tree
In
addition to overall polishing and bug corrections, the main innovation
of ver 2.5 of fastsimcoal2 is the introduction
of multithreading
(with the -c option). This option offers the possibility of
doing parameter optimization on desktop machines, as most modern
machines have multiple cores. Note that there is no strict linear
increase in the performance of multithreaded runs and no. of threads
(cores), so that it is not recommended to use more than one thread on a
linux cluster.
Bug corrections and modifications in ver 2.5.2.21
(November 2015)
1.Bug corrections:
Non
implementation of exponential growth at time zero
for the first simulated tree. Initial population size therefore does
not change
for that tree. Note that specifications of exponential growth rates in
historical
events are correctly implemented even in the first tree. Exponential
growth is
then correctly implemented in the next simulated trees (thanks to Anand
Bhaskar)
Crash in case of very large samples sizes
(e.g. 60,000) (thanks to Anand Bhaskar)
Incorrect computation of the max lhood
when non integers sre used in the observed sfs (thanks to Andi Knautt)
Reported expected SFS was that of the
last iteration and not that associated to the max lhood parameter
estimates
In case of crash due to bad tpl
file, parameters reported in file called <generic name>_bad.par were
not those leading to the crash
Speed
optimization. Up to 30% speed gain.
Output of
time to MRCA in file <generic
name>_mrca.txt with new compiler directive --recordMRCA. Beware that this
option really slows down computations. Note that we also output the he
deme in which MRCA occurred.
See this page for a complete
list of changes since first fastsimcoal release
benchmarks
Comparisons with other coalescent simulations programs such as ms,
simcoal2 or MaCS can be found here
getting started
A quick overview of how to get started with fastsimcoal can be found here (but it is better to read the manual first)
visualizing scenarios modeld in par files (new)
With Vitor Sousa, we have written an R script called ParFileInterpreter-v6.3.r
that reads par files and plots the modeled scenario, which can be
useful to check that the your modeled scenario corresponds to what you
had in mind. It can also be useful to visualize the scenario
obtained
after some parameter estimation (use the *_maxL.par file). More information
on the use of the this program can be found here.
running fsc25 on a mac
I have realized (thanks to MelissaWilson Sayre) that the plain version
of fsc25 will not run on mac osX unless you have installed a recent
version of gcc.
This is because fsc25 is multithreaded and it uses intel's libraries
based on openMP, which are not distributed anymore with recent versions
of mac OSX.
So to be able to run fsc25 on your mac, you need to first install a
recent version of gcc.
Extract the tar archive with the command
gunzip gcc-5.1-bin.tar.gz
Install gcc ver 5.1 in /usr/local with the command sudo tar -xvf gcc-5.1-bin.tar -C /.
problems running fsc25 on linux (kernel too old)
It seems that fsc25 is not able to run on old linux version with an old
kernel, potentially due to the need of openmp libraries that need to be
dynamically linked to the program.
A Google
group on fastsimcoal
(https://groups.google.com/forum/#!forum/fastsimcoal) has been created
to promote discussion or allow queries on any aspect of fastsimcoal.
Please use it!
citation
fastsimcoal2 and fsc22:
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C., and M.
Foll
(2013) Robust demographic inference from genomic and SNP data. PLOS
Genetics, 9(10):e1003905.
fatsimcoal:
Excoffier, L. and Foll, M (2011) fastsimcoal: a continuous-time
coalescent simulator of genomic diversity under arbitrarily complex
evolutionary scenarios Bioinformatics 27: 1332-1334.