This page has been viewed times since October 15th 2005

Arlequin ver 3.11   

(released 19 February 2007)

NEW: Arlequin ver 3.5 is now available

An Integrated Software for Population Genetics Data Analysis

 

Arlequin ver 3.11

 

Why Arlequin?
 

Philosophy

 
Implemented methods
 
System requirements
 
Installation
 
What's new in ver. 3.11
 
How to cite Arlequin
 
Discussion forum - FAQ
 
Downloads
 
Screenshots
 
Links
 

 

   

Why is it called Arlequin?

Arlequin is the French translation of "Arlecchino", a famous character of the Italian "Commedia dell'Arte". As a character he has many aspects, but he has the ability to switch among them very easily according to its needs and to necessities. This polymorphic ability is symbolized by his colorful costume, from which the Arlequin icon was designed.


Arlequin philosophy

The goal of Arlequin is to provide the average user in population genetics with quite a large set of basic methods and statistical tests, in order to extract information on genetic and demographic features of a collection of population samples.

The graphical interface is designed to allow users to rapidly select the different analyses they want to perform on their data. We felt important to be able to explore the data, to analyze several times the same data set from different perspectives, with different selected options.

The statistical tests implemented in Arlequin have been chosen such as to minimize hidden assumptions and to be as powerful as possible. Thus, they often take the form of either permutation tests or exact tests, with some exceptions.

Finally, we wanted Arlequin to be able to handle genetic data under many different forms, and to try to carry out the same types of analyses irrespective of the format of the data.

Because Arlequin has a rich set of features and many options, it means that the user has to spend some time in learning them. However, we hope that the learning curve will not be that steep.

Arlequin is made available free of charge, as long as we have enough local resources to support the development of the program.


Implemented methods

The analyses Arlequin can perform on the data fall into two main categories: intra-population and inter-population methods. In the first category statistical information is extracted independently from each population, whereas in the second category, samples are compared to each other.

Intra-population methods:

Short description:

Standard indices

Some diversity measures like the number of polymorphic sites, gene diversity.

Molecular diversity

Calculates several diversity indices like nucleotide diversity, different estimators of the population parameter q.

Mismatch distribution

The distribution of the number of pairwise differences between haplotypes, from which parameters of a demographic (NEW in ver 3.x) or spatial population expansion can be estimated

Haplotype frequency estimation

Estimates the frequency of haplotypes present in the population by maximum likelihood methods.

Gametic phase estimation
(NEW in ver 3.x)

Estimates the most like gametic phase of multi-locus genotypes using a pseudo-Bayesian approach (ELB algorithm).

Linkage disequilibrium

Test of non-random association of alleles at different loci.

Hardy-Weinberg equilibrium

Test of non-random association of alleles within diploid individuals.

Tajima’s neutrality test

Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model.

Fu's FS neutrality test 

Test of the selective neutrality of a random sample of DNA sequences or RFLP haplotypes under the infinite site model.

Ewens-Watterson neutrality test

Tests of selective neutrality based on Ewens sampling theory under the infinite alleles model.

Chakraborty’s amalgamation test

A test of selective neutrality and population homogeneity. This test can be used when sample heterogeneity is suspected.

Minimum Spanning Network (MSN)

Computes a Minimum Spanning Tree (MST) and Network (MSN) among haplotypes. This tree can also be computed for all the haplotypes found in different populations if activated under the AMOVA section.

   

Inter-population methods:

Short description:

Search for shared haplotypes between populations

Comparison of population samples for their haplotypic content. All the results are then summarized in a table.

AMOVA

Different hierarchical Analyses of Molecular Variance to evaluate the amount of population genetic structure.

Pairwise genetic distances

FST based genetic distances for short divergence time.

Exact test of population differentiation

Test of non-random distribution of haplotypes into population samples under the hypothesis of panmixia.

Assignment test of genotypes

Assignment of individual genotypes to particular populations according to estimated allele frequencies.

   

Mantel test:

Short description:

Correlations or partial correlations between a set of 2 or 3 matrices

Can be used to test for the presence of isolation-by-distance

   

System requirements


Installation

  1. Download Arlequin31.zip to any temporary directory.

  2. Extract all files contained in Arlequin31.zip in the directory of your choice.

  3. Start Arlequin by double-clicking on the file WinArl3.exe, which is the main executable file.

  4. Configure Arlequin: Choose which Text Editor to use when editing project files in the "Arlequin Configuration" tab.

The first thing to do before running Arlequin for the first time is certainly to read the manual. it will provide you with most of the information you are looking for. So, take some time to read it before you seriously start analyzing your data.


What's new?

Version 3.0: Compared to version 2, Arlequin version 3 now integrates the core computational routines and the interface in a single program written in C++. Therefore Arlequin does not rely on Java anymore. This has two consequences: the new graphical interface is nicer and faster, but it is less portable than before. At the moment we release a Windows version (2000, XP, and above) and we shall probably release later a Linux. Support for the Mac has been discontinued.

Other main changes include:

  1. Correction of many small bugs

  2. Incorporation of two new methods to estimate gametic phase and haplotype frequencies

    1. EM zipper algorithm: An extension of the EM algorithm allowing one to handle a larger number of polymorphic sites than the plain EM algorithm.

    2. ELB algorithm: a pseudo-Bayesian approach to specifically estimate gametic phase in recombining sequences.

  3. Incorporation of a least-square approach to estimate the parameters of an instantaneous spatial expansion from DNA sequence diversity within samples, and computations of bootstrap confidence intervals using coalescent simulations.

  4. Estimation of confidence intervals for F-statistics, using a bootstrap approach when genetic data on more than 8 loci are available.

  5. Update of the java-script routines in the output html files, making them fully compatible with Firefox 1.X.

  6. A completely rewritten and more robust input file parsing procedure, giving more precise information on the location of potential syntax and format mistakes.

  7. Use of the ELB algorithm described above to generate samples of phased multi-locus genotypes, which allows one to analyse unphased multi-locus genotype data as if the phase was known. The phased data sets are output in Arlequin projects that can be analysed in a batch mode to obtain the distribution of statistics taking phase uncertainty into account.
  8. No need to define a web browser for consulting the results. Arlequin will automatically present the results in your default web browser (we recommend the use of Firefox freely available on http://www.mozilla.org/products/firefox/central.html.

Version 3.01: Compared to version 3.0, Arlequin 3.01 include some bug corrections and some additional features:

Bug corrections:

  1. Minimum Spanning Tree Checkbox was not available for Genotypic data with known Gametic Phase.

  2. Choice of how FST is computed was not available when computing pairwise distances. Now, it is synchronized with the choice of distance in the AMOVA panel.

  3. "Search for shared haplotypes" did not work for Genotypic Data with known Gametic Phase. This has been corrected and Arlequin now ouputs a list of haplotypes before the table of frequencies.

  4. [[Mantel]] section was not recognized if located after a [[Structure]] section.

  5. Improved conversion between GenePop and Arlequin formats.

  6. "Diploid Data" option is now present when converting from Genepop to Arlequin.

  7. Output of s.d. of the number of alleles (k) was sometimes zero in output of Fu's FS test. This is now corrected and annoying warning messages about " "No molecular diversity within a sample while performing Fu's test" have been suppressed in output file.

Additions:

Version 3.1: Compared to version 3.01, Arlequin 3.1 includes cosmetic and speed improvements, several bug corrections and additional features:

Bug corrections:

  1. Locus-by-locus AMOVA failed for on DNA sequences when corrections  for multiple hits were selected.
  2. File conversion towards the Phylip format could not be done.
  3. It was impossible to change the default significance level of  0.05 for highlighting significant genetic distances in output file.
  4. Missing data identifier other than "?" was not accepted.
  5. If a a project file (or the path to it) contained the letters "arb", then it was erroneously considered as a Batch file.
  6. Reported confidence interval around FST were badly reported by the bootstrap procedure in case of a single group and no Individual     Level taken into account.
  7. Estimation of haplotype frequencies from distance matrix was not performed when "Conventional FST" option was selected.
  8. Locus-by-locus AMOVA reported incorrect results for Genotypic data when individuals had missing data for only one of their gene copy at a given locus.
  9. Reported number of indels differed according to the weight given to indels in the Option panel. This bug did not affect AMOVA computations.
  10. For sequence data, a mixture of N's and missing data led to problems in identifying distinct DNA sequences from distance matrix, leading to slightly incorrect FST computations.
  11. Exact test of population differentiation could not be performed when gametic phase was unknown. Now, this option has been restored, like  in ver. 2.
  12. Arlequin hanged when a given population was entered several times in the definition of group for the computation of genetic structure. Now, the error is simply flagged but the program does not hang.
  13. For frequency data, it was impossible to use a predefined distance matrix.
  14. Beta approximation of the significance of Tajima's D gave wrong results. This approximation has been suppressed and now we only report the significance level obtained from coalescent
  15. Bad computation of inbreeding coefficients under the locus-by-locus AMOVA approach for genotypic data when phenotype frequencies were larger than one. The bug caused an overestimation of the local (FIS) and total (FIT) inbreeding level. For samples where phenotype frequencies were all set to 1, the inbreeding coefficients were     correctly estimated.
  16. Expected heterozygosity reported under HWE exact test section was inaccurately computed. This inaccuracy however did not affect the results of the HWE exact test, which does not use information on  observed and expected heterozygosity.

Improvements

Additions:

Version 3.11: Compared to version 3.1, Arlequin 3.11 is mainly an update of ver 3.1, and there is no new manual.

Bug corrections:

  1. Significance level of FSC and Var(b). The p-value associated to the variance component due to differences between populations within groups was erroneously computed when the number of samples in the genetic structure to test was identical to the total number of samples defined in the Samples section but the order of the samples in the Genetic Structure section was different from that in the Samples section. This bug has been around since the first release of Arlequin 2.0... Thanks to Romina Piccinali for finding it.
  2. The expected homozygosity reported in the Ewens-Watterson test in the samples summary section was that of the  last simulated sample. Correct value was reported only if no permutations were done.
  3. Total number of alleles reported in the statistics summary section also included the missing data allele.
  4. The population labels were incorrectly reported when computing population-specific FIS statistics. The reported order corresponded to that of the last permutation. The population labels were only correct when the significance of the
  5. global FIS statistic was not tested. Thanks to Jeff Lozier for finding this bad bug.

Modifications

Additions:


How to cite Arlequin

Excoffier, L. G. Laval, and S. Schneider (2005) Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50.


Arlequin discussion forum and FAQ

Problems can be reported on the Arlequin Discussion Forum located on the Genetic Software Forum (GSF: http://www.rannala.org/gsf) on  and hosted by Bruce Rannala.

This Arlequin forum will also be used as a Frequently Asked Questions (FAQ) page


Downloads

Download Arlequin ver 3.11 for Windows (posted on 19.02.2007)

Download Arlequin 3.1 User Manual


Screenshots

Configuration

 

Project wizard

 

Import-Export

 

Genetic structure Editor

Arlequin settings

 

Arlequin configuration

 

ELB settings

 

AMOVA settings

 

LD settings

 

Neutrality tests settings

Tabbed Output File in Firefox

 

 

Links

Arlequin 2.0 web site

Arlequin 3.1 user manual

Arlequin Discussion Forum

Arlequin FAQ topic

File conversion programs
 
     Name

Input format

Output format

    Convert Excel WhichRun, GeneClass, GDA, Microsat, Arlequin, Cervus, Fstat, Structure, Phylip
    Formatomatic Raw (csv), GenePop Raw (csv), GenePop, Arlequin, Immanc
    GenePop on the web GenePop Arlequin,  Biosys, Fstat, Structure, Phylip

Other free population genetics software programs
 
    Program Short description
    Batwing Bayesian Analysis of Trees With Internal Node Generation
    BayesAss Bayesian estimation of recent migration rates using multilocus genotypes
    DnaSP A software package for the analysis of nucleotide polymorphism from aligned DNA sequence data
    Fstat

Computer package which estimates and tests gene diversities and differentiation statistics from codominant genetic markers

    GDA

Computes linkage and Hardy-Weinberg disequilibrium, some genetic distances, and provides method-of-moments estimators for hierarchical F-statistics

    GeneClass

GeneClass is a program for assignation and exclusion using molecular markers

    GenePop,
    GenePop on the web

Software package for the computations of various population genetics parameters

    Genetix

Software package for the computations of various population genetics parameters

    Hickory

Bayesian estimation of F-statistics from dominant marker data

    Immanc

Detection of immigrant by using multilocus genotypes

    Jody Hey's programs


    Lamarc

Suite of programs which estimate population-genetic parameters, based on McMC likelihood methods

    Mark Beaumont's programs


    Mega Integrated tool for automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses
    PAML

A package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood

    Phase

Program for haplotype reconstruction, and recombination rate estimation from population data

    PopGene

Analysis of genetic variation among and within populations using co-dominant and dominant markers

    Populations

Computation of various genetic distances

    Structure

Software package for using multi-locus genotype data to investigate population structure

    WhichRun

A computer program for population assignment of individuals based on multilocus genotype data

A more extensive list of population genetics programs can be found here, here, here, and here.

We have also written a review on available programs in population genetics, which contains the description of about 25 different programs, with links to many more. It can be found here.


Laurent Excoffier, CMPG, Institute of Ecology and Evolution, University of Bern

Last edited on 17.02.07 (18:21)