AQUASPLATCHE
A program to simulate genetic diversity based on a realistic vectorized environment
AQUASPLATCHE
Demographic and spatial expansion module |
|
Classical models of structured populations do not apply well to populations of freshwater fishes, since they evolve in complex networks of river systems that are intermediate between one-dimensional and two-dimensional stepping-stone models. In order to allow the simulation of the genetic diversity of populations drawn from such river systems, we have developed a new simulation program called AQUASPLATCHE. It starts by dividing a realistic vectorized network of river streams into segments of arbitrary length. The program then proceeds by simulating the colonization of the streams from an arbitrary source, recording the evolution of the segment densities and the migration events between adjacent segments over time. This demographic history is then used to generate genetic data of population samples located in various segments of the river system, using a backward coalescent framework.
Principles
The demographic and spatial expansion module allows to simulate a demographic
and spatial expansion from one or more initial populations. The simulation uses
discrete time and space. The unit of time is the generation, while the unit of
the space is a segment, also called a deme. Each segment has a certain length
and can be considered as a homogeneous subpopulation. Each segment undergoes an
independent population growth and can exchange emigrants with its direct
neighboring demes. Each segment is also considered as a sub-unit of the
environment. Variations through time of the range extension are also possible,
which is defined as a dynamic environment.
.
K is the carrying capacity for a segment, N is the population size of the segment and r is the growth rate.
Figure 1. Migration rate mD depending on the local density. mCol is the given migration rate at low density and mOcc is the given migration rate at high densities. In this figure mCol is bigger than mOcc implying that this species migrates faster during the colonization process compared to the equilibrium phase.
The corresponding equation is
,
where mCol is the migration rate during the colonization phase (un-colonization habitats), mOcc is the migration rate during the equilibrium phase (colonized habitats), D is the current local density defined as N/K (population size divided by the carrying capacity), A is an absolute term set to 1000, and L is L=2*ln(A). The bigger A is, the smoother the migration curve is. L is calculated in order that the mean value between the two migration rates corresponds to a density of 50%. If mCol is bigger than mOcc the migration rate is higher during colonization and vise versa if mCol is smaller than mOcc. If the two migration rates are equal the migration rate is constant for all densities. The number of emigrants M is then distributed among the neighboring segments taking into account the neighboring segment densities Di. The probability of sending emigrants is calculated as
,
whereas neighboring segments at low population densities are favored to obtain immigrants compared to neighboring segments at high population densities. The parameter f represents the directional migration and depends on the physical position the neighboring segment (neig) has in relation to the local segment (loc):
,
where “loc < neig” means that the local segment is physically below the neighboring segment (downstream) and consequently the water flows from the neighboring to the current segment. F is the probability of upstream migration compared to downstream migration (upstream migration/downstream migration), which has to be specified. If F>1 then upstream migration is more probable than downstream migration and the opposite is true for F<1. If F=1 then the species has no preferences for directional migration.
The effective numbers of emigrants send to neighboring segment i is
.
Principles
The genetic simulation procedure is implemented according to the program
SPLATCHE (Currat et al., 2004), with some modifications when generating
microsatellite data. Genetic simulations are always preceded by a demographic
simulation as they use the demographic information stored in the data base
generated during the demographic phase. The genetic phase is based on the
“coalescent theory”, initially described by Kingman (1982a; 1982b) and developed
in following papers (Ewens, 1990; Hudson, 1990; Donnelly & Tavaré, 1995). This
theory allows the reconstruction of the genealogy of a series of sampled genes
until their most recent common ancestor (MRCA). For neutral genes, the genealogy
essentially depends on the demographic factors that have influenced the history
of the populations from whom the genes are drawn. The implementation of the
coalescent theory is a modified version of SIMCOAL (Excoffier et al., 2000). The
principal difference with SIMCOAL is that the demographic information used by
genetic simulations does not come from the “migration matrix” and "historical
events" anymore, but from the data base generated during the demographic
simulation.
The genetic simulation itself follows the procedure described in Excoffier et al. (2000) and consists in two phases:
1°)
Reconstruction of the genealogy:
The reconstruction of the genealogy is independent on the mutational
process. Basically, a number n of genes is chosen. These genes are only
identified by their number and they have no genetic variability during this
first phase. All the n genes are associated with a geographic position in
the virtual river system where the demography is simulated. These genes could
belong to different segments in the river system. Then, going backward in time,
the genealogy of these genes is reconstructed until their most recent common
ancestor (MRCA) in the following way:
Going backward in time, at each generation, two events can occur:
Coalescent event: If at least two genes are in the same segment, they can potentially have a common ancestor at the preceding generation (a so-called coalescent event). This probability depends on the population size of the segment where the genes are located. Each pair of genes has a probability 1/Ni of coalescence (if Ni is the number of haploid individual in the segment i). If there are ni genes on the segment then the probability of one coalescent event becomes ni (ni -1)/2Ni. Only one coalescent event is allowed per segment and per generation (see Ray et al. (2003) for a discussion about this assumption).
Migration: Each gene could have arrived with an immigrant from a different segment. When going backward in time, it means that the gene could leave the current segment with the immigrant. So, the probability of migration from a segment i to a segment j for a gene depends on the number of individuals that have arrived from segment j to segment i at this generation. For each gene belonging to the segment i, the probability of migration from segment j is equal to mji/Ni where mji is the number of immigrants from segment j to segment i during the demographic phase. All the segment densities and the numbers of immigrant between segments are taken from the database generated during the demographic simulation.
2°)
Generation of the genetic diversity:
The
second phase of a genetic simulation consists in generating the genetic
diversity of the samples. This operation is done by adding independent mutations
over the branches of the genealogy assuming a uniform and constant Poisson
process. At the end of this process all sampled genes have a specific genetic
identity. The genetic process is entirely stochastic.
The coalescent backward approach does not generate the history of the whole
population, but only that of sampled genes and their ancestors. Thus this
approach is much less demanding in terms of memory and computation time. It
allows the simulation of complex demographic scenarios within a very broad
geographical and temporal framework.
Genetic data
Different types of molecular data can be generated (Microsatellites, RFLP, DNA,
Standard, and SNP), each with its own specificities:
Microsatellite data
A generalized stepwise mutation model (GSM, Zhivotovsky et al., 1997; Estoup
et al., 2002) is implemented, with or without constraint on the total size of
the microsatellite. Several fully linked microsatellite loci can be simulated
under the same mutation model constraints. The output for each loci is listed as
a number of repeat, having started arbitrarily at 5,000 repeats. The number of
repeats for each gene should thus be centered around that value of 5,000.
RFLP data
Only a pure 2-allele model is implemented. Several fully linked RFLP
loci can be simulated, assuming a homogeneous mutational process over all loci.
A finite-sites model is used, and mutations can hit the same site several times,
switching the RFLP site on and off. We thus assume that there is the same
probability for a site loss or for a site gain.
DNA sequence data
Several simple finite-sites mutational models are implemented. The user
can specify the percentage of substitutions that are transitions (the transition
bias), the amount of heterogeneity in mutation rates along a DNA sequence
according to either a discrete or continuous Gamma distribution. We can
therefore simulate DNA sequences under a Jukes and Cantor model (Jukes & Cantor,
1969) or under a Kimura-2-parameter model (Kimura, 1980), with or without Gamma
correction for heterogeneity of mutation rates (Jin & Nei, 1990). Other mutation
models that depend on the nucleotide composition of the sequence were not
considered here, because of their complexity and because they require specifying
many additional parameters, like the mutation transition matrix and the
equilibrium nucleotide composition.
Standard data
Following the definition given in ARLEQUIN User Manual (Schneider et
al., 2000) this type defines data for which the molecular basis is not
particularly defined, such as mere allele frequencies. The comparison between
alleles is done at each locus. For each locus, the alleles could be either
similar or different.
SNP data
SNP data consist in loci with two different states: ancestral (0) and
mutant (1). There is no information about the molecular difference between the 2
states. It is possible in AQUASPLATCHE
to specify a minimum frequency for the minor allele (the less frequent of the 2
states) over all samples or at least within one sample.
References
-Currat, M., N. Ray, et al. (2004).
Splatche: a program to simulate genetic diversity taking into account
environmental heterogeneity, Molecular Ecology
Notes, Volume 4, Issue 1, Page 139-142
-Donnelly, P. and S. Tavaré (1995). "Coalescents and
genealogical structure under neutrality." Annu. Rev. Genet. 29: 401-421.
-Estoup A., P. Jarne , J. M. Cornuet (2002). "Homoplasy and mutation model at
microsatellite loci and their consequences for population genetics analysis".
Molecular Ecology 11, 1591-1604.
-Ewens, W. J. (1990). Population Genetics Theory - The
Past and the Future. Mathematical and Statistical developments of Evolutionary
Theory. S. Lessar. Dordrecht, Kluwer Academic Publishers: 177-227.
-Excoffier, L., J. Novembre, et al. (2000). "SIMCOAL: A
general coalescent program for the simulation of molecular data in
interconnected populations with arbitrary demography." J. Heredity 91: 506-510.
-Hudson, R. (1990). Gene genealogies and the coalescent
process. oxford, Oxford University Press.
-Jin, L. and M. Nei (1990). "Limitations of the
evolutionary parsimony method of phylogenetic analysis." Mol. Biol. Evol. 7:
82-102.
-Jukes, T. and C. Cantor (1969). Evolution of protein
molecules. Mamalian Protein Metabolism. H. N. Munro. New York, Academic press:
21-132.
-Kimura, M. (1980). "A simple method for estimating
evolutionary rate of base substitution through comparative studies of nucleotide
sequences." J. Mol. Evol. 16: 111-120.
-Kingman, J. F. C. (1982). "The coalescent." Stoch. Proc.
Appl. 13: 235-248.
-Kingman, J. F. C. (1982). "On the genealogy of large
populations." J. Appl. Proba. 19A: 27-43.
-Ray, N., M. Currat, et al. (2003). "Intra-deme molecular
diversity in spatially expanding populations." Molecular Biology and Evolution
20(1): 76-86.
-Schneider, S., D. Roessli, et al. (2000). Arlequin: a
software for population genetics data analysis. User manual ver 2.000. Geneva,
Genetics and Biometry Lab, Dept. of Anthropology, University of Geneva.
-Zhivotovsky, L. A., M. W. Feldman , S. A. Grishechkin (1997). "Biased mutations
and microsatellite variation." Molecular Biology and Evolution 14,
926-933
There are several versions of AQUASPLATCHE available. The described installation procedure is for the graphical version for Windows.
Download AQUASPLATCHE to any temporary directory.
Extract all files contained in the compressed file to a directory of your choice.
Start AQUASPLATCHE by double-clicking on the file AquaSplatche.exe, which is the main executable file.
Reading the manual helps to get familiar with AQUASPLATCHE
Download AQUASPLATCHE for Windows (graphical version) | |
Download AQUASPLATCHE for Windows (consol version) | |
Download AQUASPLATCHE for Linux (consol version) | |
Download user manual (pdf) |
This software was
developed by
Samuel Neuenschwander and the reference to cite is:
Neuenschwander S (2006) AQUASPLATCHE: a program to
simulate genetic diversity in populations living in linear habitats.
Molecular Ecology Notes 6 583-585.
File & Image settings |
Network transformation |
Demographic simulation settings |
Genetic simulation settings |
Time Series |
I am grateful to Laurent Excoffier, Mathias Currat, and Nicolas Ray for sharing ideas and pieces of code with me. This work was supported by a Swiss NSF grant no. 3100A0-100800 to Laurent Excoffier.
Contact: Samuel Neuenschwander, Computational and Molecular Population Genetics Lab, University of Bern, Switzerland
Last edited on 16.08.06 (13:08)