SAMOVA 1.0
A program to define by a simulated annealing
approach the genetic structure of populations
The program SAMOVA 1.0 implements an approach to define
groups of populations that are geographically homogeneous and maximally
differentiated from each other. As a by-product, it also leads to the
identification of genetic barriers between these groups. The method is based on
a simulated annealing procedure that aims at maximizing the proportion of total
genetic variance due to differences between groups of populations (SAMOVA,
Spatial Analysis of MOlecular VAriance). The method is described in Dupanloup,
Schneider and Excoffier (2002).
SAMOVA 1.0 runs on PC. The Apple version is not available
yet.
SAMOVA 1.0 takes two input files. The first one (*.geo)
must contain the geographic coordinates of the sampling localities of your
populations. The second one (*.arp) is in fact an Arlequin input file (called
Arlequin project file) containing the genetic data in your populations. The
Arlequin file must have the same name as the geographical file with the
extension (*.arp). The order of the populations in the two input files MUST BE
THE SAME !!!
The file containing the geographic coordinates of the
sampling localities of your populations must have the *.geo extension. Important
notice: SAMOVA 1.0 does not work if two sampling localities have the same
geographical coordinates.
The geographical input file must be structured the
following way. Each line corresponds to a population. Each line must contain
five fields separated by a tab character:
- an integer number corresponding to the line in the
file
- the name of your population within quotes
- the longitude of your sampling point
- the latitude of your sampling point
- an integer (for example, 1).
Example of geographic file inputdata.geo :
1 "Egyptiens" 31.23 31.03 1
2 "Tunisiens" 10.13 36.5 1
3
"Albanais" 15.01 41.55 1
4 "Lithuaniens" 23.2 55.51 1
In the Arlequin project file, the order of the
populations, which means the order in which the genetic data in your samples is
defined, MUST BE THE SAME than in the file containing the geographic coordinates
of your sampling points. For more informations on Arlequin project files, you
can download Arlequin program (Schneider et al., 2000) and Arlequin help file
through Arlequin web site.
Example of Arlequin project file (for the same
populations listed in the geographical input file) inputdata.arp :
#AMOVA analysis
[Profile]
Title="A New Sample
File Designed To Compute AMOVA"
NbSamples=4
GenotypicData=0
DataType=DNA
LocusSeparator=WHITESPACE
MissingData='?'
[Data]
[[Samples]]
SampleName="Egyptiens"
SampleSize=2
SampleData= {
Egy1 1 AAAAAAAAAAAAAATTAAAA
Egy2 1
AAAAAACCAAAAAATTAAAA
}
SampleName="Tunisiens"
SampleSize=2
SampleData= {
Tun1 1 TTTTTTTAAAAAAATTAAAA
Tun2 1
AAAAAACCAAAAAATTAAAA
}
SampleName="Albanais"
SampleSize=2
SampleData= {
Alb1 1 AAAAAGGGAAAAAATTAAAA
Alb2 1
AAAAAACCAAAAGATTAAAA
}
SampleName="Lithuaniens"
SampleSize=2
SampleData= {
Lit1 1 AAAAAAAAAAGGGATTAAAA
Lit2 1
AAAAAACCAAAAAATTAAAA
}
SAMOVA needs:
- the name of the input files (for example: inputdata,
in this case, you MUST have in the directory containing the soft the 2
inputfiles used by SAMOVA and these files MUST be called inputdata.geo and
inputdata.arp).
- the number K of groups of populations you wish to
define (the final structure defined by SAMOVA will contain K groups)
- the number of simulated annealing processes you wish
to perform (100 seems a good choice)
- the type of molecular distance between haplotypes you
want to compute (SAMOVA like AMOVA is based on a matrix of distances between
haplotypes observed in the whole set of samples). With this option, you can
choose between pairwise differences between haplotypes (for DNA data) or sum
of square size differences between haplotypes (for microsatellite data).
When the SAMOVA window disappears from your
screen that means that the computations are finished. It takes time and this
time depends on the number of populations you have and the number of simulated
annealing processes you wish to perform.
A set of output files are created by SAMOVA:
- SAMOVA_results_arlequin.txt: the genetic structure
defined by SAMOVA as well as the fixation indices corresponding to this group
structure and their significance level evaluated by 1,000 permutations of
populations among groups.
- SAMOVA.log: this file
contains all the steps done by SAMOVA and, in case of problems, the location
of the problems.
- SAMOVA_finalstructure.arp:
an arlequin project file created by appending the input arlequin project file
with the genetic structure defined by SAMOVA.
- SAMOVA_results.ps: this
files (eps) can be read with GSview for Windows or Adobe Illustrator 7.0; it
contains a map of the sampling points and the barriers between the groups of
populations defined by SAMOVA.
- Arlequin.log: this file is
generated during the computation of the fixation indices corresponding to the
genetic structure defined by SAMOVA. It contains all the run-time WARNINGS and
ERRORS encountered during this computations.
- Dupanloup, I., Schneider, S., Excoffier, L. (2002) A
simulated annealing approach to define the genetic structure of populations.
Molecular Ecology 11(12):2571-81.
See also:
- Excoffier, L., Smouse, P., Quattro, J.M. (1992)
Analysis of molecular variance inferred from metric distances among DNA
haplotypes : application to human mitochondrial DNA restriction data. Genetics
131: 479-491.
- Schneider, S., Roessli, D., Excoffier, L. (2000)
Arlequin: A software for population genetic data. Genetics and Biometry
Laboratory, University of Geneva, Switzerland.
Isabelle
Dupanloup, Institute of Ecology and Evolution, University of Bern