Arlequin 3.11

This page has been viewed times since October 15th 2005

Arlequin ver 3.11 (released 19 February 2007)

NEW: Arlequin ver 3.5 is now available

An Integrated Software for Population Genetics Data Analysis

Arlequin ver 3.11

Why Arlequin?

Philosophy

Implemented methods

System requirements

Installation

What's new in ver. 3.11

How to cite Arlequin

Discussion forum - FAQ

Downloads

Screenshots

Links

Why is it called Arlequin?

Arlequin is the French translation of "Arlecchino", a famous character of the Italian "Commedia dell'Arte". As a character he has many aspects, but he has the ability to switch among them very easily according to its needs and to necessities. This polymorphic ability is symbolized by his colorful costume, from which the Arlequin icon was designed.

Arlequin philosophy

The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.

The graphical interface is designed to allow users to rapidly select the different analyses they want to perform on their data. We felt important to be able to explore the data, to analyze several times the same data set from different perspectives, with different selected options.

The statistical tests implemented in Arlequin have been chosen such as to minimize hidden assumptions and to be as powerful as possible. Thus, they often take the form of either permutation tests or exact tests, with some exceptions.

Finally, we wanted Arlequin to be able to handle genetic data under many different forms, and to try to carry out the same types of analyses irrespective of the format of the data.

Because Arlequin has a rich set of features and many options, it means that the user has to spend some time in learning them. However, we hope that the learning curve will not be that steep.

Arlequin is made available free of charge, as long as we have enough local resources to support the development of the program.

Implemented methods

The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods. In the first category statistical information is extracted independently from each population, whereas in the second category, samples are compared to each other.

*Intra-population methods:*	*Short description:*
Standard indices	Some diversity measures like the number of polymorphic sites, gene diversity.
Molecular diversity	Calculates several diversity indices like nucleotide diversity, different estimators of the population parameter q.
Mismatch distribution	The distribution of the number of pairwise differences between haplotypes, from which parameters of a demographic (NEW in ver 3.x) or spatial population expansion can be estimated
Haplotype frequency estimation	Estimates the frequency of haplotypes present in the population by maximum likelihood methods.
Gametic phase estimation (NEW in ver 3.x)	Estimates the most like gametic phase of multi-locus genotypes using a pseudo-Bayesian approach (ELB algorithm).
Linkage disequilibrium	Test of non-random association of alleles at different loci.
Hardy-Weinberg equilibrium	Test of non-random association of alleles within diploid individuals.
Tajima’s neutrality test	Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model.
Fu's F_S neutrality test	Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model.
Ewens-Watterson neutrality test	Tests of selective neutrality based on Ewens sampling theory under the infinite alleles model.
Chakraborty’s amalgamation test	A test of selective neutrality and population homogeneity. This test can be used when sample heterogeneity is suspected.
Minimum Spanning Network (MSN)	Computes a Minimum Spanning Tree (MST) and Network (MSN) among haplotypes. This tree can also be computed for all the haplotypes found in different populations if activated under the AMOVA section.


*Inter-population methods:*	*Short description:*
Search for shared haplotypes between populations	Comparison of population samples for their haplotypic content. All the results are then summarized in a table.
AMOVA	Different hierarchical Analyses of Molecular Variance to evaluate the amount of population genetic structure.
Pairwise genetic distances	F_ST based genetic distances for short divergence time.
Exact test of population differentiation	Test of non-random distribution of haplotypes into population samples under the hypothesis of panmixia.
Assignment test of genotypes	Assignment of individual genotypes to particular populations according to estimated allele frequencies.

*Mantel test:*	*Short description:*
Correlations or partial correlations between a set of 2 or 3 matrices	Can be used to test for the presence of isolation-by-distance

System requirements

Windows 95/98/NT/2000/XP.
A minimum of 128 MB RAM, and more to avoid swapping.
At least 10Mb free hard disk space.

Installation

Download Arlequin31.zip to any temporary directory.
Extract all files contained in Arlequin31.zip in the directory of your choice.
Start Arlequin by double-clicking on the file WinArl3.exe, which is the main executable file.
Configure Arlequin: Choose which Text Editor to use when editing project files in the "Arlequin Configuration" tab.

The first thing to do before running Arlequin for the first time is certainly to read the manual. it will provide you with most of the information you are looking for. So, take some time to read it before you seriously start analyzing your data.

What's new?

Version 3.0: Compared to version 2, Arlequin version 3 now integrates the core computational routines and the interface in a single program written in C++. Therefore Arlequin does not rely on Java anymore. This has two consequences: the new graphical interface is nicer and faster, but it is less portable than before. At the moment we release a Windows version (2000, XP, and above) and we shall probably release later a Linux. Support for the Mac has been discontinued.

Other main changes include:

Correction of many small bugs
Incorporation of two new methods to estimate gametic phase and haplotype frequencies
1. EM zipper algorithm: An extension of the EM algorithm allowing one to handle a larger number of polymorphic sites than the plain EM algorithm.
2. ELB algorithm: a pseudo-Bayesian approach to specifically estimate gametic phase in recombining sequences.
Incorporation of a least-square approach to estimate the parameters of an instantaneous spatial expansion from DNA sequence diversity within samples, and computations of bootstrap confidence intervals using coalescent simulations.
Estimation of confidence intervals for F-statistics, using a bootstrap approach when genetic data on more than 8 loci are available.
Update of the java-script routines in the output html files, making them fully compatible with Firefox 1.X.
A completely rewritten and more robust input file parsing procedure, giving more precise information on the location of potential syntax and format mistakes.
Use of the ELB algorithm described above to generate samples of phased multi-locus genotypes, which allows one to analyse unphased multi-locus genotype data as if the phase was known. The phased data sets are output in Arlequin projects that can be analysed in a batch mode to obtain the distribution of statistics taking phase uncertainty into account.
No need to define a web browser for consulting the results. Arlequin will automatically present the results in your default web browser (we recommend the use of Firefox freely available on http://www.mozilla.org/products/firefox/central.html.

Version 3.01: Compared to version 3.0, Arlequin 3.01 include some bug corrections and some additional features:

Bug corrections:

Minimum Spanning Tree Checkbox was not available for Genotypic data with known Gametic Phase.
Choice of how FST is computed was not available when computing pairwise distances. Now, it is synchronized with the choice of distance in the AMOVA panel.
"Search for shared haplotypes" did not work for Genotypic Data with known Gametic Phase. This has been corrected and Arlequin now ouputs a list of haplotypes before the table of frequencies.
[[Mantel]] section was not recognized if located after a [[Structure]] section.
Improved conversion between GenePop and Arlequin formats.
"Diploid Data" option is now present when converting from Genepop to Arlequin.
Output of s.d. of the number of alleles (k) was sometimes zero in output of Fu's FS test. This is now corrected and annoying warning messages about " "No molecular diversity within a sample while performing Fu's test" have been suppressed in output file.

Additions:

New editor of genetic structure allowing one to modify the current Genetic Structure directly in the graphical interface.
Computation of population-specific F_ST indices, when a single group is defined in the Genetic Structure. This may be useful to recognize population contributing particularly to the global F_ST measure. This is also available in the locus-by-locus AMOVA section.

Version 3.1: Compared to version 3.01, Arlequin 3.1 includes cosmetic and speed improvements, several bug corrections and additional features:

Bug corrections:

Locus-by-locus AMOVA failed for on DNA sequences when corrections for multiple hits were selected.
File conversion towards the Phylip format could not be done.
It was impossible to change the default significance level of 0.05 for highlighting significant genetic distances in output file.
Missing data identifier other than "?" was not accepted.
If a a project file (or the path to it) contained the letters "arb", then it was erroneously considered as a Batch file.
Reported confidence interval around FST were badly reported by the bootstrap procedure in case of a single group and no Individual Level taken into account.
Estimation of haplotype frequencies from distance matrix was not performed when "Conventional FST" option was selected.
Locus-by-locus AMOVA reported incorrect results for Genotypic data when individuals had missing data for only one of their gene copy at a given locus.
Reported number of indels differed according to the weight given to indels in the Option panel. This bug did not affect AMOVA computations.
For sequence data, a mixture of N's and missing data led to problems in identifying distinct DNA sequences from distance matrix, leading to slightly incorrect FST computations.
Exact test of population differentiation could not be performed when gametic phase was unknown. Now, this option has been restored, like in ver. 2.
Arlequin hanged when a given population was entered several times in the definition of group for the computation of genetic structure. Now, the error is simply flagged but the program does not hang.
For frequency data, it was impossible to use a predefined distance matrix.
Beta approximation of the significance of Tajima's D gave wrong results. This approximation has been suppressed and now we only report the significance level obtained from coalescent
Bad computation of inbreeding coefficients under the locus-by-locus AMOVA approach for genotypic data when phenotype frequencies were larger than one. The bug caused an overestimation of the local (FIS) and total (FIT) inbreeding level. For samples where phenotype frequencies were all set to 1, the inbreeding coefficients were correctly estimated.
Expected heterozygosity reported under HWE exact test section was inaccurately computed. This inaccuracy however did not affect the results of the HWE exact test, which does not use information on observed and expected heterozygosity.

Improvements

Locus-by-locus AMOVA can now be performed independently from conventional AMOVA. This can lead to faster computations for large sample sizes and large number of population samples.
Faster routines to handle long DNA sequences or large number of microsatellites.
Faster reading of input file
Faster computation of demographic parameters from mismatch distribution. Improved convergence of least-square fitting algorithm.

Additions:

Computations of population specific inbreeding coefficients and computations of their significance level.
Computation of the number of alleles as well as observed and expected heterozygosity per locus
Computation of the Garza-Williamson statistic for MICROSAT data.
In batch mode, the summary file (*.sum) now report the name of the analyzed file as well as the name of the analyzed population sample.
When saving current settings, user are now asked to choose a file name. Default is "project file name".ars.
New sections are provided at the end of the result file, in order to report summary statistics computed over all populations:
- Basic properties of the samples (size, no. of loci, etc...)
- Heterozygosity per locus
- Number of alleles + total no. of alleles over all pops
- Allelic range + total allelic range over all pops (for microsatellite data)
- Garza-Williamson index (for microsatellite data)
- Number of segregating sites, + total over all pops
- Molecular diversity indices (theta values)
- Neutrality tests summary statistics and p-values
- Demographic parameters estimated from the mismatch distribution and p-values.
New shortcuts are provided in the left pane of the html result file for F-statistics bootstrap confidence intervals, population specific FIS, and summary of intra-population statistics.

Version 3.11: Compared to version 3.1, Arlequin 3.11 is mainly an update of ver 3.1, and there is no new manual.

Bug corrections:

Significance level of F_SC and Var(b). The p-value associated to the variance component due to differences between populations within groups was erroneously computed when the number of samples in the genetic structure to test was identical to the total number of samples defined in the Samples section but the order of the samples in the Genetic Structure section was different from that in the Samples section. This bug has been around since the first release of Arlequin 2.0... Thanks to Romina Piccinali for finding it.

The expected homozygosity reported in the Ewens-Watterson test in the samples summary section was that of the last simulated sample. Correct value was reported only if no permutations were done.

Total number of alleles reported in the statistics summary section also included the missing data allele.

The population labels were incorrectly reported when computing population-specific F_IS statistics. The reported order corresponded to that of the last permutation. The population labels were only correct when the significance of the

F_IS statistic was not tested. Thanks to Jeff Lozier for finding this bad bug.

Modifications

Mean expected heterozygosity and mean allele number are reported over polymorphic sites in the Sample section, while they are reported over all loci in the statistics summaries at the end of the result file.

Additions:

Sample allele frequencies can now be output in locus-specific files, if this option is selected in the Molecular Diversity tab. Locus-specific files are output in the Arlequin project result directory.

How to cite Arlequin

Excoffier, L. G. Laval, and S. Schneider (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50.

Arlequin discussion forum and FAQ

Problems can be reported on the Arlequin Discussion Forum located on the Genetic Software Forum (GSF: http://www.rannala.org/gsf) on and hosted by Bruce Rannala.

This Arlequin forum will also be used as a Frequently Asked Questions (FAQ) page

Downloads

Download Arlequin ver 3.11 for Windows (posted on 19.02.2007)

Download Arlequin 3.1 User Manual

Screenshots

Configuration

Project wizard

Import-Export

Genetic structure Editor

Arlequin settings

Arlequin configuration

ELB settings

AMOVA settings

LD settings

Neutrality tests settings

Tabbed Output File in Firefox

Links

Arlequin 2.0 web site

Arlequin 3.1 user manual

Arlequin Discussion Forum

Arlequin FAQ topic

File conversion programs

     Name
Input format

Output format

    Convert Excel WhichRun, GeneClass, GDA, Microsat, Arlequin, Cervus, Fstat, Structure, Phylip

    Formatomatic Raw (csv), GenePop Raw (csv), GenePop, Arlequin, Immanc

    GenePop on the web GenePop Arlequin, Biosys, Fstat, Structure, Phylip

Name	Input format	Output format
Convert	Excel	WhichRun, GeneClass, GDA, Microsat, Arlequin, Cervus, Fstat, Structure, Phylip
Formatomatic	Raw (csv), GenePop	Raw (csv), GenePop, Arlequin, Immanc
GenePop on the web	GenePop	Arlequin, Biosys, Fstat, Structure, Phylip

Other free population genetics software programs

    Program Short description

    Batwing Bayesian Analysis of Trees With Internal Node Generation

    BayesAss Bayesian estimation of recent migration rates using multilocus genotypes

    DnaSP A software package for the analysis of nucleotide polymorphism from aligned DNA sequence data

    Fstat
Computer package which estimates and tests gene diversities and differentiation statistics from codominant genetic markers

    GDA
Computes linkage and Hardy-Weinberg disequilibrium, some genetic distances, and provides method-of-moments estimators for hierarchical F-statistics

    GeneClass
GeneClass is a program for assignation and exclusion using molecular markers

    GenePop,
    GenePop on the web
Software package for the computations of various population genetics parameters

    Genetix
Software package for the computations of various population genetics parameters

    Hickory
Bayesian estimation of F-statistics from dominant marker data

    Immanc
Detection of immigrant by using multilocus genotypes

    Jody Hey's programs

    Lamarc
Suite of programs which estimate population-genetic parameters, based on McMC likelihood methods

    Mark Beaumont's programs

    Mega Integrated tool for automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses

    PAML
A package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood

    Phase
Program for haplotype reconstruction, and recombination rate estimation from population data

    PopGene
Analysis of genetic variation among and within populations using co-dominant and dominant markers

    Populations
Computation of various genetic distances

    Structure
Software package for using multi-locus genotype data to investigate population structure

    WhichRun
A computer program for population assignment of individuals based on multilocus genotype data

Program	Short description
Batwing	Bayesian Analysis of Trees With Internal Node Generation
BayesAss	Bayesian estimation of recent migration rates using multilocus genotypes
DnaSP	A software package for the analysis of nucleotide polymorphism from aligned DNA sequence data
Fstat	Computer package which estimates and tests gene diversities and differentiation statistics from codominant genetic markers
GDA	Computes linkage and Hardy-Weinberg disequilibrium, some genetic distances, and provides method-of-moments estimators for hierarchical F-statistics
GeneClass	GeneClass is a program for assignation and exclusion using molecular markers
GenePop, GenePop on the web	Software package for the computations of various population genetics parameters
Genetix	Software package for the computations of various population genetics parameters
Hickory	Bayesian estimation of F-statistics from dominant marker data
Immanc	Detection of immigrant by using multilocus genotypes
Jody Hey's programs
Lamarc	Suite of programs which estimate population-genetic parameters, based on McMC likelihood methods
Mark Beaumont's programs
Mega	Integrated tool for automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses
PAML	A package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood
Phase	Program for haplotype reconstruction, and recombination rate estimation from population data
PopGene	Analysis of genetic variation among and within populations using co-dominant and dominant markers
Populations	Computation of various genetic distances
Structure	Software package for using multi-locus genotype data to investigate population structure
WhichRun	A computer program for population assignment of individuals based on multilocus genotype data

A more extensive list of population genetics programs can be found here, here, here, and here.

We have also written a review on available programs in population genetics, which contains the description of about 25 different programs, with links to many more. It can be found here.

Laurent Excoffier, CMPG, Institute of Ecology and Evolution, University of Bern

Last edited on 17.02.07 (18:21)