fast sequential Markov coalescent
simulation of genomic data under complex evolutionary models
Changes in fastsimcoal25 relative to fastsimcoal21 (August
2014)
In
addition to overall polishing and bug corrections, the main innovation
of ver 2.5 of fastsimcoal2 is the introduction of multithreading
(with the -c option). This option aims at offering the possibility of
doing parameter optimization on desktop machines, as most modern
machines have multiple cores. Note that there is no strict linear
increase in the performance of multithreaded runs and no. of threads
(cores), so that it is not recommended to use more than one thread on a
linux cluster.
New features
The
fastsimcoal2 program ver2.5 has been renamed fsc25
(shorter name is better)
Use of a different random number
generator (same seed will produce different results than in
fastsimcoal21)
Code optimization resulting in up to
1-75% speed gain for single threaded version (see benchmark)
Multithreading (64 bit only),
for more speed gain on a multicore processor desktop
machine (see benchmark)
Result files for parameter estimation
now output in separate result directory
More options to generate SNP data
New specification for MAF SFS
Added a version for macOSX running in
earlier versions (e.g. from 10.6 upwards) (thanks to Iain Mathieson)
More tolerant
reading of input files (thanks to Allan Strand)
Rules in est files can now be used for parameter estimations
Bug corrections in fsc25 relative to fsc21
Corrected bug where maximum number of simulations was set to
lower number of simulations during Brent optimization
Program crashed when trying to compute SFS when too many
polymorphic sites need to be kept in memory (this number can be changed
with
-k option)
Solved problems when multiple parameter definitions are listed in
def files
Corrected
bug preventing recombination to be simulated when several block
structures were defined in par file (thanks to Thomas Willems)
Incorrect computatin of likelihood when using fractional numbers
i observed SFS (thanks to Andreas Kautt)
Bug corrections in ver 2.5.0.2 (August 2014)
Growth rates was inactivated in ver 2.5.0, and all simulatins
were perfromed with a constant size population (thanks to Melissa
Wilson Sayres for pointing this out)
Bug corrections and modifications in ver 2.5.1 (October
2014)
Example files are back in zip files (thanks to Alfredo)
Description
of the exact format of the multiSFS format has been modified in the
manual (thanks to Vitor Sousa and Raphael Leblois)
Problem in implementing
recombination with multiple runs (option -nx where x>1) (thanks to
Vitor Sousa and Yang)
More precision on branch
length when outputing tree in NEXUS format (thanks to Shuo Yang)
New faster way to implement
recombination under the SMC' algorithm and its extension to multiple
recombinations between sites
Bug corrections and modifications in ver 2.5.2 (March
2015)
1.Bug corrections:
fsc251
asked for a joint SFS when two populations samples were listed in tpl
file but only one contained active lineages. Bug found by Charleston
Chiang.
TMRCA was not found
in case of recombination and demes with some inactive lineages. Bug
found by Ryan Bohlender)
fsc251 was not
generating output files when path was provided before input file names
(par or tpl).Note that fsc25 should always be run from the directory
containing the input files, even though the program can be can be
physically located elsewhere. Bug found by Greer Dolby.
fsc251 was not taking
into account growth rate changes specified in historical events (bug
introduced in 2.5.1, and it was not present in ver 2.5.0).
–k option has no upper
limit anymore, and its default value is 100,000
Added new –P command
line option, allowing to get the global pooled SFS obtained by pooling
all lineages as if in a single population
Added two new operators
in est file for complex parameters: %min% and %max%
Added new functions in
est files for complex parameters: abs(), exp(), log(), log10(), pow10()
Added a new "bounded"
keyword in est file to specify that the upper range of a simple
parameter is bounded. Needs to be listed after the "output" or "hide"
keywords.
Added two new keywords
for historical events: "keep" and "nomig".
Expected
joint SFS is
now rescaled such that the sum of sfs entries for polymorphic sites is
1. This
shoudl lead to more exact lhood computation from multiple 2D SFS.
Bug corrections and modifications in ver 2.5.2.8 (May
2015)
1.Bug corrections:
incorrect simulation
of mutations in case of high recombination
rates. There was a strong negative correlation between the
recombination rate and
the number of polymorphic loci, when adjacent sites were the object of
recombination.
The number of mutation was underestimated for recombination rates, say
>1e-7. This bug affected ALL previous fsc releases
Possible overestimation of
TMRCA and overall tree size
in case of recombination. Bug present since early fsc2 release.
Crash of fsc2 in case
of very high
recombination rate with DNA da
Incorrect writing of recombination positions in output arp file when
simulating several threads
maxObsLhood was not correctly
computed when estimation of parameters in a scenario with a single
population
Change
of migration matrix not implemented after first recombination (thanks
to Stefano Mona)
Computation
of MAF SFS incorrect in case of multiple mutations per site ( when -I
option not provided and high mutation rates) (thanks to Jason
Weir).
Speed
optimization
Output of
random DNA nucleotides instead of N for monomorphic loci with the –S
option.
Possibility
to run fsc without command line option if file "fsc_run.txt" is present
and contains run path and command line options in current working
directory
Bug corrections and modifications in ver 2.5.2.21
(November 2015)
1.Bug corrections:
Non
implementation of exponential growth at time zero
for the first simulated tree. Initial population size therefore does
not change
for that tree. Note that specifications of exponential growth rates in
historical
events are correctly implemented even in the first tree. Exponential
growth is
then correctly implemented in the next simulated trees (thanks to Anand
Bhaskar)
Crash in case of very large samples sizes
(e.g. 60,000) (thanks to Anand Bhaskar)
Incorrect computation of the max lhood
when non integers sre used in the observed sfs (thanks to Andi Knautt)
Reported expected SFS was that of the
last iteration and not that associated to the max lhood parameter
estimates
In case of crash due to bad tpl
file, parameters reported in file called <generic name>_bad.par were
not those leading to the crash
Speed
optimization. Up to 30% speed gain.
Output of
time to MRCA in file <generic
name>_mrca.txt with new compiler directive --recordMRCA. Beware that this
option really slows down computations. Note that we also output the he
deme in which MRCA occurred.
Changes in fastsimcoal21 relative to fastsimcoal2 (December
2013)
New features
64 bit windows version of fastsimcoal2 (20% speed gain compared
to 32 bit version!)
Modified output of monomorphic samples.
By default, fastsimcoal2 only outputs polymorphic sites for DNA data.
If the coalescent tree is too shallow, no mutation can occur on a given
tree. In that case, fastsimcoal2 now outputs a single loci with "N" for
all individual instead of missing data in arlequin files (*.arp).
This change prevents a bug when analysing simulated arlequin files with
arlsumstat.
Optional use of a manual seed
for the random number generator (--seed xxx command line option)
Outputs
par file with estimated maximum likelihood parameters. This file can be
used to generate pseudo-observed SFS to estimate parametric bootstrap
confidence intervals around the ML parameters.
Bug corrections
With -s0 option, the number of reported polymorphic sites in file
"<file_name>_numPolymSites.obs" was incorrectly set to zero and
the maximum likelihood reported in the file
"<file_name>.lhoodObs" was set to INF. These two problems are now
corrrected.
Multiple whitespace or multiple tabs between parameters in
historical events caused erratic behavior. Multiple separators are now
allowed in historical events.
When
estimating a single parameter from the SFS, the number of performed ECM
loops could be smaller (usually 2) than the required minimum.
Changes in fastsimcoal2 relative to fastsimcoal
New features
Optional output of all
simulated sites (including monomorphic sites) (-S command line option)
Optional use of a manual seed
for the random number generator (--seed xxx command line option)
Simulation of ascertained SNP
data
Generation of the (joint) site
frequency spectrum (SFS) from DNA sequence data
Generation of multidimensional
(>2D) SFS
Generation of Nexus coalescent trees with branch lengths now
expressed in fractions of generations (e.g. 1205.123)
Ability to estimate
demographic
parameters from the site frequency spectrum inferred from DNA sequences
or
ascertained SNP chips
Need to specify number of SNPs to output with
-s option (specify 0 to output all SNPs)
Bug corrections
Potential crash when generating scenarios with historical events
and recombination
Crash when simulation of samples of size zero and recombination
The pattern of polymorphisms obtained in a population for a given
past demography changed depending or not if other samples were
simulated as well, in presence of recombination
Non-convergence to the MRCA when simulating serial samples of age
zero
Notes that bugs 1-3 were due to the same problem in the code.