fast sequential Markov coalescent
simulation of genomic data under complex evolutionary models
Changes in fastsimcoal25 relative to fastsimcoal21 (August 2014)
In
addition to overall polishing and bug corrections, the main innovation
of ver 2.5 of fastsimcoal2 is the introduction of multithreading
(with the -c option). This option aims at offering the possibility of
doing parameter optimization on desktop machines, as most modern
machines have multiple cores. Note that there is no strict linear
increase in the performance of multithreaded runs and no. of threads
(cores), so that it is not recommended to use more than one thread on a
linux cluster.
New features
The fastsimcoal2 program ver2.5 has been renamed fsc25 (shorter name is better)
Use of a different random
number generator (same seed will produce different results than in fastsimcoal21)
Code optimization
resulting in up to 1-75% speed gain for single threaded version (see benchmark)
Multithreading (64 bit
only), for more speed gain on a multicore processor desktop
machine (see benchmark)
Result files for parameter
estimation now output in separate result directory
More options to generate
SNP data
New specification for MAF
SFS
Added a version for macOSX
running in earlier versions (e.g. from 10.6 upwards) (thanks to Iain
Mathieson)
More tolerant reading of input files (thanks to Allan
Strand)
Rules in est files
can now be used for parameter estimations
Bug corrections in fsc25 relative to fsc21
Corrected bug where maximum number of simulations was set to lower number of simulations during Brent optimization
Program crashed when trying to compute SFS when too many
polymorphic sites need to be kept in memory (this number can be changed with
-k option)
Solved problems when multiple parameter definitions are listed in def files
Corrected
bug preventing recombination to be simulated when several block
structures were defined in par file (thanks to Thomas Willems)
Incorrect computatin of likelihood when using fractional numbers i observed SFS (thanks to Andreas Kautt)
Bug corrections in ver 2.5.0.2 (August 2014)
Growth rates was inactivated in ver 2.5.0, and all simulatins
were perfromed with a constant size population (thanks to Melissa
Wilson Sayres for pointing this out)
Bug corrections and modifications in ver 2.5.1 (October 2014)
Example files are back in zip files (thanks to Alfredo)
Description
of the exact format of the multiSFS format has been modified in the
manual (thanks to Vitor Sousa and Raphael Leblois)
Problem in implementing recombination with multiple runs (option -nx where x>1) (thanks to Vitor Sousa and Yang)
More precision on branch length when outputing tree in NEXUS format (thanks to Shuo Yang)
New faster way to implement recombination under the SMC' algorithm and its extension to multiple recombinations between sites
Bug corrections and modifications in ver 2.5.2 (March 2015)
1.Bug corrections:
fsc251 asked for a joint
SFS when two populations samples were listed in tpl file but only one
contained active lineages. Bug found by Charleston Chiang.
TMRCA was
not found in case of recombination and demes with some inactive lineages.
Bug found by Ryan Bohlender)
fsc251
was not generating output files when path was provided before input file
names (par or tpl).Note that fsc25 should always be run from the
directory containing the input files, even though the program can be can
be physically located elsewhere. Bug found by Greer Dolby.
fsc251 was
not taking into account growth rate changes specified in historical
events (bug introduced in 2.5.1, and it was not present in ver 2.5.0).
–k option has no upper limit
anymore, and its default value is 100,000
Added new –P command line
option, allowing to get the global pooled SFS obtained by pooling all
lineages as if in a single population
Added two new operators in est
file for complex parameters: %min% and %max%
Added new functions in est
files for complex parameters: abs(), exp(), log(), log10(), pow10()
Added a new "bounded"
keyword in est file to specify that the upper range of a simple parameter
is bounded. Needs to be listed after the "output" or
"hide" keywords.
Added two new keywords for
historical events: "keep" and "nomig".
Expected joint SFS is
now rescaled such that the sum of sfs entries for polymorphic sites is 1. This shoudl lead to more exact lhood computation from multiple 2D SFS.
Bug corrections and modifications in ver 2.5.2.8 (May 2015)
1.Bug corrections:
ncorrect simulation of mutations in case of high recombination
rates. There was a strong negative correlation between the recombination rate and
the number of polymorphic loci, when adjacent sites were the object of recombination.
The number of mutation was underestimated for recombination rates, say
>1e-7. This bug affected ALL previous fsc releases
Possible overestimation of TMRCA and overall tree size
in case of recombination. Bug present since early fsc2 release.
Crash of fsc2 in case of very high
recombination rate with DNA da
Incorrect writing of recombination positions in output arp file when simulating several threads
maxObsLhood was not correctly computed when estimation of parameters in a scenario with a single population
Change of
migration matrix not implemented after first recombination (thanks to Stefano
Mona)
Computation of MAF SFS incorrect in case of multiple mutations per site ( when -I option not provided and high mutation rates) (thanks to Jason Weir).
Speed
optimization
Output of random DNA nucleotides instead of N for monomorphic loci with the –S option.
Possibility
to run fsc without command line option if file "fsc_run.txt" is present
and contains run path and command line options in current working
directory
Bug corrections in ver 2.5.2.11 (7 June 2015)
1.Bug corrections:
Potential crash of fsc2528 due to -Ofast compiler (bad comparison of floating point variables) (thanks to Jason Weir)
Bad estimation of lhood in case of non-integer observed sfs (thanks to Andi Knautt)
Changes in fastsimcoal21 relative to fastsimcoal2 (December 2013)
New features
64 bit windows version of fastsimcoal2 (20% speed gain compared to 32 bit version!)
Modified output of monomorphic samples.
By default, fastsimcoal2 only outputs polymorphic sites for DNA data.
If the coalescent tree is too shallow, no mutation can occur on a given
tree. In that case, fastsimcoal2 now outputs a single loci with "N" for
all individual instead of missing data in arlequin files (*.arp).
This change prevents a bug when analysing simulated arlequin files with
arlsumstat.
Optional use of a manual seed
for the random number generator (--seed xxx command line option)
Outputs
par file with estimated maximum likelihood parameters. This file can be
used to generate pseudo-observed SFS to estimate parametric bootstrap
confidence intervals around the ML parameters.
Bug corrections
With -s0 option, the number of reported polymorphic sites in file
"<file_name>_numPolymSites.obs" was incorrectly set to zero and
the maximum likelihood reported in the file "<file_name>.lhoodObs" was set to INF. These two problems are now corrrected.
Multiple whitespace or multiple tabs between parameters in
historical events caused erratic behavior. Multiple separators are now
allowed in historical events.
When
estimating a single parameter from the SFS, the number of performed ECM
loops could be smaller (usually 2) than the required minimum.
Changes in fastsimcoal2 relative to fastsimcoal
New features
Optional output of all
simulated sites (including monomorphic sites) (-S command line option)
Optional use of a manual seed
for the random number generator (--seed xxx command line option)
Simulation of ascertained SNP
data
Generation of the (joint) site
frequency spectrum (SFS) from DNA sequence data
Generation of multidimensional
(>2D) SFS
Generation of Nexus coalescent trees with branch lengths now
expressed in fractions of generations (e.g. 1205.123)
Ability to estimate
demographic
parameters from the site frequency spectrum inferred from DNA sequences
or
ascertained SNP chips
Need to specify number of SNPs to output with
-s option (specify 0 to output all SNPs)
Bug corrections
Potential crash when generating scenarios with historical events
and recombination
Crash when simulation of samples of size zero and recombination
The pattern of polymorphisms obtained in a population for a given
past demography changed depending or not if other samples were
simulated as well, in presence of recombination
Non-convergence to the MRCA when simulating serial samples of age
zero
Notes that bugs 1-3 were due to the same problem in the code.