Arlequin 3.5 - What's new

What's new

Version 3.5.2.2 (02.08.2015)

This version adds one new feature and corrects a few bugs as compared to ver 3.5.2.1:

Computation of the site frequency spectrum from DNA sequences data directly, but only for SFS based on the minor allele frequency spectrum

Bug corrections :

Unable to open batch (.arb) files
Bad computation of SFS when (assumed ancestral allele 0) was not present at a site
Different SFS produced by arlequin and fastsimcoal2 (mainly for MAF-based SFS)
Bug if a file name is longer than 255 characters
No mention of SFS computations in arlequin log file

Version 3.5.2 (12.04.2015)

This version introduces a few changes as compared to ver 3.5.1.3:

Computation of the site frequency spectrum from DNA sequence data, which can be used as input for demographic parameter inference with our software fastsimcoal2 (http://cmpg.unibe.ch/software/fastsimcoal2/)
Enables the analysis of DNA sequence data coded as SNP (i.e. 0,1,2,3 instead of C,A,T,G), as in the output of our coalescent simulation software fastsimcoal2
Possibility to use the 64 bit version of arlecore to do computations (faster and on potentially larger data sets)
More efficient (faster) reading of long DNA sequences
Arlequin can now read lines of up to 1 million characters, and thus very long DNA sequences.

Note that some programs can now be used to translate VCF files into arlequin project (*.arp) files like:

PGDSpider java program by Heidi Lischer.
VCF2Arlequin python script by Nicolas Feau (UBC, Canada) also available from this web page

Bug corrections :

Detection of outlier zero hanged when observed F-Statistics were < 0
Removed illegitimate characters in xml files
In the detection of outlier loci,

Computation of negative FST p-values was not correct, and CI limits were also not computed correctly.
It was not possible to compute p-values of loci in case of STANDARD data if LocusSeparator=NONE.

Incorrect computation of average heterozygosity when there was a single polymorphic site.
Correction of a memory leak that sometimes led to error messages reporting that "phenotypes have different number of loci".

Version 3.5.1.3 (17.09.2011)

Version 3.5.1.3 is a maintenance version correcting a few bugs and introducing small modifications compared to version 3.5.1.2.

Detection of outlier loci:

cannot be done anymore on FREQUENCY data type (made no sense)
cannot be done when less than 10 loci are available
cannot be done if some F-statistics are negative

Ewens-watterson neutrality test:

changed output labeling

Corrected bibliographic references including an "&" causing problems in XML outputs
Inclusion of new R functions for XML graphical output that are compatible with R 2.11, 2.12, and 2.13

WinArl on Mac OS X (11.10.2010)

Kent Holsinger has kindly developed a MacOSX binary version of Winarl35 under WineBottler, which needs to be installed on your Mac.
More information and a downloadable version of WinArl35.dmg can be found here.

Version 3.5.1.2 (22.04.2010)

Version 3.5.1.2 just include a bug correction compared to version 3.5.
The bug occured when trying to estimate allele or haplotype frequencies via the EM algorithm on recessive data. The program was stalling due to the presence of an infinite loop. All versions on all platforms were affected. This bug has now been corrected in ver 3.5.1.2, which is available on the download page.

Version 3.5

Compared to version 3.11, Arlequin 3.5 includes several bug corrections, addition of new computations, and several significant improvements. The main improvement is its interfacing with the R statistical package, allowing one to produce high qualitiy graphs of many results found in the result files. We also introduce new console versions of Arlequin for both Windows and Linux.

Additions:

New procedure to detect loci under selection from hierarchical F-statistics, as implemented in Excoffier et al. (2009)
Computation of allele frequencies at all loci for all populations, which are output in locus-specific files.
Computation of the genetic distance (δμ)² for microsatellite data.
Possibility to output results as an XML file with a dedicated style sheet.
R-lequin:

Developments of R functions to parse the XML output file and produce publication quality graphics
Graphics can be directly embedded into the XML result file below result tables.

R functions can be modified by the user to customize graphics.

Console version of Arlequin, called arlecore, for both Windows and Linux, allowing the analysis of a large number of files with bash scripts. See readme file.
Modified console version of Arlequin, called arlsumstat, for Windows and Linux, to compute specific summary statistics on data found in an arlequin project. This version is specifically intended to be used on a large number of simulated data files, for instance in an Approximate Bayesian Computation (ABC) framework. See readme file to see how to use it.

Modifications:

All computations can now be performed at the group level, by automatically pooling all population samples from a given group defined in the [STRUCTURE] section into a single artificial population.
Maximum number of characters in input line is now 250,000, which limits the maximum sizes of, say, DNA sequences that can be read.
Removed the computation of population specific FSTs
Changed the order of the presentation of the results. Now it begins with the intra-population computations and then output inter-population computations
Individuals with partially missing data at a given locus are now excluded in the locus by locus amova analysis when taking individual level into account (i.e. when computing FIS)

Bug corrections:

In the summary statistics, the reported mean number of alleles was zero when there was a single monomorphic locus.
LocusSeparator = None was not recognized (NONE was needed)
Chakraborty's neutrality test: there was an overflow when the number of allele was larger than 265.
In the molecular diversity summary table, the number of sites with transversion was incorrectly reported as the number of sites with transitions
The total number of polymorphic sites reported in summary stat table was not really the total number of polymorphic sites that would be computed on the pooled populations. It was rather the total number of sites that were found polymorphic within populations.
Errors when computing average summary statistics within-samples, if some loci were monomorphic in some populations.
Wrong computations of standard deviations of some summary statistics (Garza-Williamson, modified Garza-Williamson, total range) and Theta(H) for microsatellite data.
Option to use associated settings did not work anymore
Error when computing statistics within groups and within samples when DNA sequences contained white spaces and LocusSeparator was set to Whitespace.
No message was issued when a population contained only missing data at a given locus and one was attempting to perform a locus-by-locus analysis. The locus was just not listed in the locus-by-locus AMOVA.Now, a warning message is issued.
Bad handling of diploid individuals having partially missing data (on one chromosome only) when one attempts to compute locus by locus AMOVA with individual level (FIS and FIT).
Setting file (arl_run.ars) was always saved in the arlequin directory instead of the directory chosen in the dialog box.
It was impossible to compute the expected mismatch distribution under the demographic and the range expansion models at the same time
Mantel test was not performed when a custom Ymatrix was provided.
When there is a single polymorphic microsat locus, the reported average Garza-Williamson statistics was the number of loci...

Version 3.11

Compared to version 3.1, Arlequin 3.11 is mainly an update of ver 3.1, and there is no new manual.

Bug corrections:

Significance level of FSC and Var(b). The p-value associated to the variance component due to differences between populations within groups was erroneously computed when the number of samples in the genetic structure to test was identical to the total number of samples defined in the Samples section but the order of the samples in the Genetic Structure section was different from that in the Samples section. This bug has been around since the first release of Arlequin 2.0... Thanks to Romina Piccinali for finding it.
The expected homozygosity reported in the Ewens-Watterson test in the samples summary section was that of the last simulated sample. Correct value was reported only if no permutations were done.
Total number of alleles reported in the statistics summary section also included the missing data allele.
The population labels were incorrectly reported when computing population-specific FIS statistics. The reported order corresponded to that of the last permutation. The population labels were only correct when the significance of the global FIS statistic was not tested. Thanks to Jeff Lozier for finding this bad bug.

Modifications

Mean expected heterozygosity and mean allele number are reported over polymorphic sites in the Sample section, while they are reported over all loci in the statistics summaries at the end of the result file.

Additions:

Sample allele frequencies can now be output in locus-specific files, if this option is selected in the Molecular Diversity tab. Locus-specific files are output in the Arlequin project result directory.

Version 3.1

Compared to version 3.01, Arlequin 3.1 includes cosmetic and speed improvements, several bug corrections and additional features:

Bug corrections:

Locus-by-locus AMOVA failed for on DNA sequences when corrections for multiple hits were selected.
File conversion towards the Phylip format could not be done.
It was impossible to change the default significance level of 0.05 for highlighting significant genetic distances in output file.
Missing data identifier other than "?" was not accepted.
If a a project file (or the path to it) contained the letters "arb", then it was erroneously considered as a Batch file.
Reported confidence interval around FST were badly reported by the bootstrap procedure in case of a single group and no Individual Level taken into account.
Estimation of haplotype frequencies from distance matrix was not performed when "Conventional FST" option was selected.
Locus-by-locus AMOVA reported incorrect results for Genotypic data when individuals had missing data for only one of their gene copy at a given locus.
Reported number of indels differed according to the weight given to indels in the Option panel. This bug did not affect AMOVA computations.
For sequence data, a mixture of N's and missing data led to problems in identifying distinct DNA sequences from distance matrix, leading to slightly incorrect FST computations.
Exact test of population differentiation could not be performed when gametic phase was unknown. Now, this option has been restored, like in ver. 2.
Arlequin hanged when a given population was entered several times in the definition of group for the computation of genetic structure. Now, the error is simply flagged but the program does not hang.
For frequency data, it was impossible to use a predefined distance matrix.
Beta approximation of the significance of Tajima's D gave wrong results. This approximation has been suppressed and now we only report the significance level obtained from coalescent
Bad computation of inbreeding coefficients under the locus-by-locus AMOVA approach for genotypic data when phenotype frequencies were larger than one. The bug caused an overestimation of the local (FIS) and total (FIT) inbreeding level. For samples where phenotype frequencies were all set to 1, the inbreeding coefficients were correctly estimated.
Expected heterozygosity reported under HWE exact test section was inaccurately computed. This inaccuracy however did not affect the results of the HWE exact test, which does not use information on observed and expected heterozygosity.

Improvements

Locus-by-locus AMOVA can now be performed independently from conventional AMOVA. This can lead to faster computations for large sample sizes and large number of population samples.
Faster routines to handle long DNA sequences or large number of microsatellites.
Faster reading of input file
Faster computation of demographic parameters from mismatch distribution. Improved convergence of least-square fitting algorithm.

Additions:

Computations of population specific inbreeding coefficients and computations of their significance level.
Computation of the number of alleles as well as observed and expected heterozygosity per locus
Computation of the Garza-Williamson statistic for MICROSAT data.
In batch mode, the summary file (*.sum) now report the name of the analyzed file as well as the name of the analyzed population sample.
When saving current settings, user are now asked to choose a file name. Default is "project file name".ars.
New sections are provided at the end of the result file, in order to report summary statistics computed over all populations:

Basic properties of the samples (size, no. of loci, etc...)
Heterozygosity per locus
Number of alleles + total no. of alleles over all pops
Allelic range + total allelic range over all pops (for microsatellite data)
Garza-Williamson index (for microsatellite data)
Number of segregating sites, + total over all pops
Molecular diversity indices (theta values)
Neutrality tests summary statistics and p-values
Demographic parameters estimated from the mismatch distribution and p-values.

New shortcuts are provided in the left pane of the html result file for F-statistics bootstrap confidence intervals, population specific FIS, and summary of intra-population statistics.

Version 3.01

Compared to version 3.0, Arlequin 3.01 include some bug corrections and some additional features:

Bug corrections:

Minimum Spanning Tree Checkbox was not available for Genotypic data with known Gametic Phase.
Choice of how FST is computed was not available when computing pairwise distances. Now, it is synchronized with the choice of distance in the AMOVA panel.
"Search for shared haplotypes" did not work for Genotypic Data with known Gametic Phase. This has been corrected and Arlequin now ouputs a list of haplotypes before the table of frequencies.
[[Mantel]] section was not recognized if located after a [[Structure]] section.
Improved conversion between GenePop and Arlequin formats.
"Diploid Data" option is now present when converting from Genepop to Arlequin.
Output of s.d. of the number of alleles (k) was sometimes zero in output of Fu's FS test. This is now corrected and annoying warning messages about " "No molecular diversity within a sample while performing Fu's test" have been suppressed in output file.

Additions:

New editor of genetic structure allowing one to modify the current Genetic Structure directly in the graphical interface.
Computation of population-specific FST indices, when a single group is defined in the Genetic Structure. This may be useful to recognize population contributing particularly to the global FSTmeasure. This is also available in the locus-by-locus AMOVA section.

Version 3.0

Compared to version 2, Arlequin version 3 now integrates the core computational routines and the interface in a single program written in C++. Therefore Arlequin does not rely on Java anymore. This has two consequences: the new graphical interface is nicer and faster, but it is less portable than before. At the moment we release a Windows version (2000, XP, and above) and we shall probably release later a Linux. Support for the Mac has been discontinued.

Other main changes include:

Correction of many small bugs
Incorporation of two new methods to estimate gametic phase and haplotype frequencies

EM zipper algorithm: An extension of the EM algorithm allowing one to handle a larger number of polymorphic sites than the plain EM algorithm.
ELB algorithm: a pseudo-Bayesian approach to specifically estimate gametic phase in recombining sequences.

Incorporation of a least-square approach to estimate the parameters of an instantaneous spatial expansion from DNA sequence diversity within samples, and computations of bootstrap confidence intervals using coalescent simulations.
Estimation of confidence intervals for F-statistics, using a bootstrap approach when genetic data on more than 8 loci are available.
Update of the java-script routines in the output html files, making them fully compatible with Firefox 1.X.
A completely rewritten and more robust input file parsing procedure, giving more precise information on the location of potential syntax and format mistakes.
Use of the ELB algorithm described above to generate samples of phased multi-locus genotypes, which allows one to analyse unphased multi-locus genotype data as if the phase was known. The phased data sets are output in Arlequin projects that can be analysed in a batch mode to obtain the distribution of statistics taking phase uncertainty into account.
No need to define a web browser for consulting the results. Arlequin will automatically present the results in your default web browser (we recommend the use of Firefox freely available on http://www.mozilla.org/products/firefox/central.html.

Last edit: 02.08.2015 by Laurent Excoffier

Arlequin ver 3.5.2.2

What's new

Version 3.5.2.2 (02.08.2015)

Version 3.5.2 (12.04.2015)

Version 3.5.1.3 (17.09.2011)

WinArl on Mac OS X (11.10.2010)

Version 3.5.1.2 (22.04.2010)

Version 3.5

Additions:

Modifications:

Bug corrections:

Version 3.11

Bug corrections:

Modifications

Additions:

Version 3.1

Bug corrections:

Improvements

Additions:

Version 3.01

Bug corrections:

Additions:

Version 3.0

Other main changes include: