SAMOVA 1.0

A program to define by a simulated annealing approach the genetic structure of populations

Introduction Input files Running Output files References Download

Introduction

The program SAMOVA 1.0 implements an approach to define groups of populations that are geographically homogeneous and maximally differentiated from each other. As a by-product, it also leads to the identification of genetic barriers between these groups. The method is based on a simulated annealing procedure that aims at maximizing the proportion of total genetic variance due to differences between groups of populations (SAMOVA, Spatial Analysis of MOlecular VAriance). The method is described in Dupanloup, Schneider and Excoffier (2002).

SAMOVA 1.0 runs on PC. The Apple version is not available yet.


Input files

SAMOVA 1.0 takes two input files. The first one (*.geo) must contain the geographic coordinates of the sampling localities of your populations. The second one (*.arp) is in fact an Arlequin input file (called Arlequin project file) containing the genetic data in your populations. The Arlequin file must have the same name as the geographical file with the extension (*.arp). The order of the populations in the two input files MUST BE THE SAME !!!

The file containing the geographic coordinates of the sampling localities of your populations must have the *.geo extension. Important notice: SAMOVA 1.0 does not work if two sampling localities have the same geographical coordinates.
The geographical input file must be structured the following way. Each line corresponds to a population. Each line must contain five fields separated by a tab character:

  1. an integer number corresponding to the line in the file
  2. the name of your population within quotes
  3. the longitude of your sampling point
  4. the latitude of your sampling point
  5. an integer (for example, 1).
Example of geographic file inputdata.geo :
1 "Egyptiens" 31.23 31.03 1
2 "Tunisiens" 10.13 36.5 1
3 "Albanais" 15.01 41.55 1
4 "Lithuaniens" 23.2 55.51 1

In the Arlequin project file, the order of the populations, which means the order in which the genetic data in your samples is defined, MUST BE THE SAME than in the file containing the geographic coordinates of your sampling points. For more informations on Arlequin project files, you can download Arlequin program (Schneider et al., 2000) and Arlequin help file through Arlequin web site.

Example of Arlequin project file (for the same populations listed in the geographical input file) inputdata.arp :
#AMOVA analysis

[Profile]
Title="A New Sample File Designed To Compute AMOVA"
NbSamples=4
GenotypicData=0
DataType=DNA
LocusSeparator=WHITESPACE
MissingData='?'

[Data]
[[Samples]]
SampleName="Egyptiens"
SampleSize=2
SampleData= {
Egy1 1 AAAAAAAAAAAAAATTAAAA
Egy2 1 AAAAAACCAAAAAATTAAAA
}
SampleName="Tunisiens"
SampleSize=2
SampleData= {
Tun1 1 TTTTTTTAAAAAAATTAAAA
Tun2 1 AAAAAACCAAAAAATTAAAA
}
SampleName="Albanais"
SampleSize=2
SampleData= {
Alb1 1 AAAAAGGGAAAAAATTAAAA
Alb2 1 AAAAAACCAAAAGATTAAAA
}
SampleName="Lithuaniens"
SampleSize=2
SampleData= {
Lit1 1 AAAAAAAAAAGGGATTAAAA
Lit2 1 AAAAAACCAAAAAATTAAAA
}


Running

SAMOVA needs:

  1. the name of the input files (for example: inputdata, in this case, you MUST have in the directory containing the soft the 2 inputfiles used by SAMOVA and these files MUST be called inputdata.geo and inputdata.arp).
  2. the number K of groups of populations you wish to define (the final structure defined by SAMOVA will contain K groups)
  3. the number of simulated annealing processes you wish to perform (100 seems a good choice)
  4. the type of molecular distance between haplotypes you want to compute (SAMOVA like AMOVA is based on a matrix of distances between haplotypes observed in the whole set of samples). With this option, you can choose between pairwise differences between haplotypes (for DNA data) or sum of square size differences between haplotypes (for microsatellite data).
When the SAMOVA window disappears from your screen that means that the computations are finished. It takes time and this time depends on the number of populations you have and the number of simulated annealing processes you wish to perform.

Output files

A set of output files are created by SAMOVA:

  1. SAMOVA_results_arlequin.txt: the genetic structure defined by SAMOVA as well as the fixation indices corresponding to this group structure and their significance level evaluated by 1,000 permutations of populations among groups.
  2. SAMOVA.log: this file contains all the steps done by SAMOVA and, in case of problems, the location of the problems.
  3. SAMOVA_finalstructure.arp: an arlequin project file created by appending the input arlequin project file with the genetic structure defined by SAMOVA.
  4. SAMOVA_results.ps: this files (eps) can be read with GSview for Windows or Adobe Illustrator 7.0; it contains a map of the sampling points and the barriers between the groups of populations defined by SAMOVA.
  5. Arlequin.log: this file is generated during the computation of the fixation indices corresponding to the genetic structure defined by SAMOVA. It contains all the run-time WARNINGS and ERRORS encountered during this computations.

References

See also:

Isabelle Dupanloup, Institute of Ecology and Evolution, University of Bern