fast sequential
Markov coalescent
simulation of genomic data under complex evolutionary models
Changes in fsc28
relative to fsc27
New features
New syntax in the .tpl
files
to deal with sample heterogeneity. We
introduce the concept of sfs pools where the sfs of different samples
can be
computed as a pool. It allows for considering any spatial of temporal
heterogeneity. New key word “sfspool”
in deme size section
Possibility to record the deme
of origin of chromosome
segments when implementing an admixture even so that it is possible to
simulate
chromosome painting. New keyword “recordAdmOrigin”
in historical
events
New command line options (-y
and -z) to fine
tune the parameter estimation procedure
Other changes and bug
corrections:
When simulating several data
files with definition files (.def)
the SFSs are written in different files, either in separate directories
with
the -j option, or in the same directory without the -j option
Program was crashing when
simulating exponential growth and
migration. Bug found by Jason Weir.
Optimisation of computations when estimating
data from
multidimensional SFS
Bad computation of lhood when estimated from
the maxL.par
files as compared to that computed during parameter
estimations, in case of
population growth. Bug found by Kyle Lewald
Incorrect simulations from par files when some
demes are
explicitly killed. Bug found by Kyle Lewal
Changes in fsc27
relative to fsc26
New features
New syntax
in the .est files.
It is now possible to
include previously defined simple parameters as search range
delimiters. The
keyword paramInRange needs to be
specified at the end of
lines
containing such parameters.
New keyword
in .par or .tpl
file: absoluteResize.
It allows a given sink population to take a new absolute size,
independently of
its previous size. It eliminates the need to compute this resize as a
complex
parameter in the .est file
The [RULES]
section has been
suppressed from input
files. It is simply not read anymore. These rules have become obsolete
given
the new syntax described in point.
SNP data
types are not considered anymore, as they led to
biased simulations. Use short segments of DNA and the -sX
option to
generate X SNPs instead
Simulations of large and
sparsely occupied
structured
populations has been optimized and can be up to10 times faster than the
previous version. There is very little gain for simulations with a
small number
of migration-connected demes, though.
Simulations of large
recombining chromosomes has
been
optimized, when using large values of the -k options
Generation
of genotype table (.gen
file) as an
alternative output to Arlequin (-G
option). The
additional -g
option allows one to generate diploid genotypes (coded as 0, 1 or 2)
instead of
haploid genotypes (coded as 0 or 1)
Possibility
to “kill” demes, such as
to make them
inaccessible to migration. Setting a sink deme size to zero (using a
sink
resize of zero in a historical event) will now prevent further
migration to
this deme. This is useful as one can keep the same migration matrix
after the
disappearance of some demes (e.g. due to population fusion backward in
time).
Comments are now possible at
the end of any line
of .est files.
Other changes and bug
corrections:
When
a deme size goes to zero (e.g. due to negative growth),
a warning is only produced if the deme is occupied (thanks to David
Marques for
requesting this change).
Bug corrected
when computing likelihood with ghost
populations and a single sampled deme.
Corrected bug (found by David
Marques) with
options --noSingleton
and --foldedSFS in the presence of ghost populations
(the max
est lhood
was larger than the max obs lhood).
Corrected
bug occurring when computing the position of the
next recombination position in case of very small recombination rates
(thanks
to Silvert Martin)
Corrected
important bug (thanks to David Marques) in case of
the introduction of population growth at a given point in a population
of
initial constant size. The population size was adjusted as if there had
been
growth since generation zero.
Corrected
bug (thanks to Yu Sugihara) when generating
diversity based on random parameters and using -Ex
option when x
>1.
Corrected bug (thank to
Jason Weir) when
simulations scenarios with both migration and exponential growth. It
led to program crashes and incorrect migration patterns.
Changes in fsc26
relative to fsc25
ver 2.21 (November 2015)
New features
Simple implementation of individual inbreeding
The
average inbreeding coefficient of individuals in a population can now
be specified as a third optional parameter in the sample size
definition. In this case, the sample age needs to be defined (set to
zero in most applications), as:
<sample size>
<sample age>
<inbreeding
coefficient>
Possibility to define initial parameter values for
demographic inference
Option
-initvalues
file.pv , where file.pv
lists initial non-complex parameter values to use. This option is
mainly useful when computing bootstrap confidence intervals, as it
allows one to use less replicates for each bootstrap data set. A *.pv
file is now automatically generated after each parameter estimation by
fsc26
Computation of MAF 1D and 2D SFS with option --foldedSFS
by simply folding the corresponding unfolded SFS (for compatibility
with angsd, where the minor allele is computed separately for each SFS)
Optional faster but approximate log computations with
option --logprecision
n,
where n is a number between 10 and 23 specifying the precision of the
computation of logarithms. 23 means full precision and is the default
value.
Optional parameter optimization without taking singletons
into account specified with option --nosingleton
Syntax changes
For parameter optimization,
-N
option has been suppressed, and maximum no. of iteration is now equal
to that set by the -n
option
The number of cycles to performed is now fixed and only
specified with option -L
The -l
option is now optional and means something different. It is now used to
specify the number of cycles where information on monomorphic sites is
used. After these initial cycles, likelihood will only be computed (and
optimized) on the polymorphic sites. This option needs to be used
together with the “reference”
keyword in the .est file (see section on est file).
The –M
option is now just a flag mentioning we want to perform parameter
estimation from the observed SFS. It should therefore not be followed
by any number.
Removed -D
option to produce output in dadi format.
Implementation of instantaneous bottlenecks with keyword instbotadded to
historical event definition. Only works in absence of recombination for
the moment.
Bug corrections
Expected marginal SFS were not computed when
computing
expected SFS with FREQ data
Wrong
likelihoods were computed
with option-0
Changes in fastsimcoal25 (fsc25)
relative to
fastsimcoal21 (August
2014)
In
addition to overall polishing and bug corrections, the main innovation
of ver 2.5 of fastsimcoal2 is the introduction of multithreading
(with the -c option). This option aims at offering the possibility of
doing parameter optimization on desktop machines, as most modern
machines have multiple cores. Note that there is no strict
linear
increase in the performance of multithreaded runs and no. of threads
(cores), so that it is not recommended to use more than one thread on a
linux cluster.
New features
The
fastsimcoal2 program ver2.5 has been renamed fsc25
(shorter name is better)
Use of a different
random number
generator (same seed will produce different results than in
fastsimcoal21)
Code optimization
resulting in up to
1-75% speed gain for single threaded version (see benchmark)
Multithreading (64
bit only),
for more speed gain on a multicore processor
desktop
machine (see benchmark)
Result files for
parameter estimation
now output in separate result directory
More options to
generate SNP data
New specification
for MAF SFS
Added a version
for macOSX running in
earlier versions (e.g. from 10.6 upwards) (thanks to Iain Mathieson)
More
tolerant
reading of input files (thanks to Allan Strand)
Rules
in est files can now be used for
parameter estimations
Bug corrections in fsc25 relative to fsc21
Corrected bug where maximum number of simulations was set
to
lower number of simulations during Brent optimization
Program crashed when trying to compute SFS when too many
polymorphic sites need to be kept in memory (this number can be changed
with
-k option)
Solved problems when multiple parameter definitions are
listed in
def files
Corrected
bug preventing recombination to be simulated when several block
structures were defined in par file (thanks to Thomas Willems)
Incorrect computatin of likelihood when using fractional
numbers
i observed SFS (thanks to Andreas Kautt)
Bug corrections in ver 2.5.0.2 (August 2014)
Growth rates was inactivated in ver 2.5.0, and all
simulatins
were perfromed with a constant size population (thanks to Melissa
Wilson Sayres for pointing this out)
Bug corrections and modifications in ver
2.5.1 (October
2014)
Example files are back in zip files (thanks to Alfredo)
Description
of the exact format of the multiSFS format has been modified in the
manual (thanks to Vitor Sousa and Raphael Leblois)
Problem in
implementing
recombination with multiple runs (option -nx where x>1) (thanks
to
Vitor Sousa and Yang)
More precision on
branch
length when outputing tree in NEXUS format (thanks
to Shuo Yang)
New faster
way to implement
recombination under the SMC' algorithm and its extension to multiple
recombinations between sites
Bug corrections and modifications in ver
2.5.2 (March
2015)
1.Bug
corrections:
fsc251
asked for a joint SFS when two populations samples were listed in tpl
file but only one contained active lineages. Bug found by Charleston
Chiang.
TMRCA
was not found
in case of recombination and demes with some inactive lineages. Bug
found by Ryan Bohlender)
fsc251
was not
generating output files when path was provided before input file names
(par or tpl).Note that fsc25 should always be run from the directory
containing the input files, even though the program can be can be
physically located elsewhere. Bug found by Greer Dolby.
fsc251
was not taking
into account growth rate changes specified in historical events (bug
introduced in 2.5.1, and it was not present in ver 2.5.0).
–k
option has no upper
limit anymore, and its default value is 100,000
Added
new –P command
line option, allowing to get the global pooled SFS obtained by pooling
all lineages as if in a single population
Added
two new operators
in est file for complex parameters: %min% and %max%
Added
new functions in
est files for complex parameters: abs(), exp(), log(), log10(), pow10()
Added
a new "bounded"
keyword in est file to specify that the upper range of a simple
parameter is bounded. Needs to be listed after the "output" or "hide"
keywords.
Added
two new keywords
for historical events: "keep" and "nomig".
Expected
joint SFS is
now rescaled such that the sum of sfs entries for polymorphic sites is
1. This
shoudl lead to more exact lhood computation from multiple 2D SFS.
Bug corrections and modifications in ver
2.5.2.8 (May
2015)
1.Bug
corrections:
incorrect
simulation
of mutations in case of high recombination
rates. There was a strong negative correlation between the
recombination rate and
the number of polymorphic loci, when adjacent sites were the object of
recombination.
The number of mutation was underestimated for recombination rates, say
>1e-7. This bug affected ALL previous fsc releases
Possible
overestimation of
TMRCA and overall tree size
in case of recombination. Bug present since early fsc2 release.
Crash
of fsc2 in case
of very high
recombination rate with DNA da
Incorrect
writing of recombination positions in output arp file when
simulating several threads
maxObsLhood
was not correctly
computed when estimation of parameters in a scenario with a single
population
Change
of migration matrix not implemented after first recombination (thanks
to Stefano Mona)
Computation
of MAF SFS incorrect in case of multiple mutations per site ( when -I
option not provided and high mutation rates) (thanks to
Jason
Weir).
Speed
optimization
Output
of
random DNA nucleotides instead of N for monomorphic loci with the –S
option.
Possibility
to run fsc without command line option if file "fsc_run.txt" is present
and contains run path and command line options in current working
directory
Bug corrections and modifications in ver
2.5.2.21
(November 2015)
1.Bug
corrections:
Non
implementation of exponential growth at time zero
for the first simulated tree. Initial population size therefore does
not change
for that tree. Note that specifications of exponential growth rates in
historical
events are correctly implemented even in the first tree. Exponential
growth is
then correctly implemented in the next simulated trees (thanks to Anand
Bhaskar)
Crash in case of very large
samples sizes
(e.g. 60,000) (thanks to Anand Bhaskar)
Incorrect
computation of the max lhood
when non integers sre used in the observed sfs (thanks to Andi Knautt)
Reported expected
SFS was that of the
last iteration and not that associated to the max lhood parameter
estimates
In case of
crash due to bad tpl
file, parameters reported in file called <generic
name>_bad.par were
not those leading to the crash
Speed
optimization. Up to 30% speed gain.
Output
of
time to MRCA in file
<generic
name>_mrca.txt with new compiler directive --recordMRCA. Beware
that this
option really slows down computations. Note that we also output the he
deme in which MRCA occurred.
Changes in fastsimcoal21 relative to fastsimcoal2
(December
2013)
New features
64 bit windows version of fastsimcoal2 (20% speed gain
compared
to 32 bit version!)
Modified output of
monomorphic samples.
By default, fastsimcoal2 only outputs polymorphic sites for DNA data.
If the coalescent tree is too shallow, no mutation can occur on a given
tree. In that case, fastsimcoal2 now outputs a single loci with "N" for
all individual instead of missing data in arlequin files
(*.arp).
This change prevents a bug when analysing simulated arlequin files with
arlsumstat.
Optional
use of a manual seed
for the random number generator (--seed xxx command line option)
Outputs
par file with estimated maximum likelihood parameters. This file can be
used to generate pseudo-observed SFS to estimate parametric bootstrap
confidence intervals around the ML parameters.
Bug corrections
With -s0 option, the number of reported polymorphic sites
in file
"<file_name>_numPolymSites.obs" was incorrectly set to
zero and
the maximum likelihood reported in the file
"<file_name>.lhoodObs" was set to INF. These two problems
are now
corrrected.
Multiple whitespace or multiple tabs between parameters in
historical events caused erratic behavior. Multiple separators are now
allowed in historical events.
When
estimating a single parameter from the SFS, the number of performed ECM
loops could be smaller (usually 2) than the required minimum.
Changes in fastsimcoal2 relative to fastsimcoal
New features
Optional output of all
simulated sites (including monomorphic sites) (-S command line option)
Optional
use of a manual seed
for the random number generator (--seed xxx command line option)
Simulation
of ascertained SNP
data
Generation
of the (joint) site
frequency spectrum (SFS) from DNA sequence data
Generation
of multidimensional
(>2D) SFS
Generation of Nexus coalescent trees with branch lengths
now
expressed in fractions of generations (e.g. 1205.123)
Ability to
estimate
demographic
parameters from the site frequency spectrum inferred from DNA sequences
or
ascertained SNP chips
Need to specify number of
SNPs to output with
-s option (specify 0 to output all SNPs)
Bug corrections
Potential crash when generating scenarios with historical
events
and recombination
Crash when simulation of samples of size zero and
recombination
The pattern of polymorphisms obtained in a population for a
given
past demography changed depending or not if other samples
were
simulated as well, in presence of recombination
Non-convergence to the MRCA when simulating serial samples
of age
zero
Notes that bugs 1-3 were due to the same problem in the code.