What's new
Version 3.5.2.2 (02.08.2015)
This version adds one new feature and corrects a few bugs as compared to ver 3.5.2.1:
- Computation
of the site frequency spectrum from DNA sequences data directly, but only for SFS based on the minor allele frequency spectrum
Bug
corrections:
-
Unable to open batch (.arb) files
- Bad computation of SFS when (assumed ancestral allele 0) was not present at a site
- Different SFS produced by arlequin and fastsimcoal2 (mainly for MAF-based SFS)
- Bug if a file name is longer than 255 characters
- No mention of SFS computations in arlequin log file
Version 3.5.2 (12.04.2015)
This version introduces a few changes as compared to ver 3.5.1.3:
- Computation
of the site frequency spectrum from DNA sequence data, which can be used as
input for demographic parameter inference with our software fastsimcoal2 (http://cmpg.unibe.ch/software/fastsimcoal2/)
- Enables
the analysis of DNA sequence data coded as SNP (i.e. 0,1,2,3 instead of
C,A,T,G), as in the output of our coalescent simulation software fastsimcoal2
- Possibility
to use the 64 bit version of arlecore to do computations (faster and on
potentially larger data sets)
- More
efficient (faster) reading of long DNA sequences
- Arlequin can now read lines of up to 1 million characters, and thus very long DNA sequences.
- Note that some programs can now be used to translate VCF files into arlequin project (*.arp) files like:
Bug
corrections:
-
Detection
of outlier zero hanged when observed F-Statistics were < 0
- Removed
illegitimate characters in xml files
- In the
detection of outlier loci,
- Computation of negative FST p-values was not
correct, and CI limits were also not computed correctly.
- It was not possible to compute p-values of
loci in case of STANDARD data if LocusSeparator=NONE.
- Incorrect
computation of average heterozygosity when there was a single polymorphic site.
- Correction
of a memory leak that sometimes led to error messages reporting that "phenotypes
have different number of loci".
Version 3.5.1.3
(17.09.2011)
Version 3.5.1.3 is a maintenance version correcting a few bugs
and introducing small
modifications compared to version 3.5.1.2.
- Detection of outlier loci:
- cannot be done anymore on FREQUENCY data type (made no sense)
- cannot be done when less than 10 loci are available
- cannot be done if some F-statistics are negative
- Ewens-watterson neutrality test:
- Corrected bibliographic references including an "&" causing
problems in XML outputs
- Inclusion of new R functions for XML graphical output that are
compatible with R 2.11, 2.12, and 2.13
WinArl on Mac OS X (11.10.2010)
Kent Holsinger has
kindly developed a MacOSX binary version of Winarl35 under
WineBottler, which needs
to be installed on your Mac.
More information and a downloadable version of WinArl35.dmg can be
found
here.
Version 3.5.1.2 (22.04.2010)
Version 3.5.1.2 just include a
bug
correction compared to version 3.5.
The bug occured when trying to estimate
allele or haplotype frequencies via the EM algorithm on recessive data.
The program was stalling due to the presence of an infinite loop. All
versions on all platforms were affected. This bug has now been
corrected in ver 3.5.1.2, which is available on the
download page.
Version 3.5
Compared to version 3.11,
Arlequin
3.5 includes several bug corrections, addition of new computations, and
several significant
improvements. The main improvement is its interfacing with the R statistical package, allowing
one to produce high qualitiy graphs of many results found in the result
files. We also introduce new console versions of Arlequin for both
Windows and Linux.
Additions:
- New procedure to detect loci under selection from hierarchical
F-statistics, as implemented in Excoffier et al. (2009)
- Computation of allele frequencies at all loci for all
populations, which are output in locus-specific files.
- Computation of the genetic distance (δμ)² for microsatellite data.
- Possibility to output results as an XML file with a dedicated
style sheet.
- R-lequin:
- Developments of R functions to parse the XML output file and
produce publication quality graphics
- Graphics can be directly embedded into the XML result file
below result tables.
- R functions can be modified by the user to customize graphics.
- Console version of Arlequin, called arlecore, for both Windows
and Linux,
allowing the analysis of a large number of files with bash scripts. See
readme file.
- Modified
console version of Arlequin, called arlsumstat, for
Windows and Linux, to compute specific summary statistics on data found
in an arlequin
project. This version is specifically intended to be used on a large
number of simulated data files, for instance in an Approximate Bayesian
Computation (ABC) framework. See readme
file to see how to use it.
Modifications:
- All computations can now be performed at the group level, by
automatically pooling all population samples from a given group defined
in the [STRUCTURE] section into a single artificial population.
- Maximum number of characters in input line is now 250,000, which
limits the maximum sizes of, say, DNA sequences that can be read.
- Removed the computation of population specific FSTs
- Changed the order of the presentation of the
results. Now it begins with the intra-population computations and then
output inter-population computations
- Individuals with partially missing data at a given locus are now
excluded in the locus by locus amova analysis when taking individual
level into account (i.e. when computing FIS)
Bug corrections:
- In the summary statistics, the reported mean number of alleles
was zero when there was a single monomorphic locus.
- LocusSeparator = None was not recognized (NONE was needed)
- Chakraborty's neutrality test: there was an overflow when the
number of allele was larger than 265.
- In the molecular diversity summary table, the number of sites
with transversion was incorrectly reported as the number of sites with
transitions
- The total number of polymorphic sites reported in summary stat
table was not really the total number of polymorphic sites that would
be computed on the pooled populations. It was rather the total number
of
sites that were found polymorphic within populations.
- Errors when computing average summary statistics within-samples,
if some loci were monomorphic in some populations.
- Wrong computations of standard deviations of some summary
statistics (Garza-Williamson, modified Garza-Williamson, total range)
and Theta(H) for microsatellite data.
- Option to use associated settings did not work anymore
- Error when computing statistics within groups and within samples
when DNA sequences contained white spaces and LocusSeparator was set to
Whitespace.
- No message was issued when a population contained only missing
data at a given locus and one was attempting to perform a
locus-by-locus analysis. The locus was just not listed in the
locus-by-locus AMOVA.Now, a warning message is issued.
- Bad handling of diploid individuals having partially missing data
(on one chromosome only) when one attempts to compute locus by locus
AMOVA with individual level (FIS and FIT).
- Setting file (arl_run.ars) was always saved in the arlequin
directory instead of the directory chosen in the dialog box.
- It was impossible to compute the expected mismatch distribution
under the demographic and the range expansion models at the same time
- Mantel test was not performed when a custom Ymatrix was provided.
- When there is a single polymorphic microsat locus, the reported
average Garza-Williamson statistics was the number of loci...
Version 3.11
Compared to version 3.1, Arlequin 3.11 is mainly an update of ver 3.1,
and there is no new manual.
Bug corrections:
- Significance level of FSC and Var(b). The
p-value associated to the variance component due to differences between
populations within groups was erroneously computed when the number of
samples in the genetic structure to test was identical to the total
number of samples defined in the Samples section but the order of the
samples in the Genetic Structure section was different from that in the
Samples section. This bug has been around since the first release of
Arlequin 2.0... Thanks to Romina Piccinali for finding it.
- The expected homozygosity reported in the Ewens-Watterson test in
the
samples summary section was that of the last simulated
sample. Correct value was reported only if no permutations were done.
- Total number of alleles reported in the statistics summary
section
also included the missing data allele.
- The population labels were incorrectly reported when computing
population-specific FIS statistics. The reported order corresponded to
that of the last permutation. The population labels were only correct
when the significance of the global FIS statistic
was not tested. Thanks to Jeff Lozier for finding this bad bug.
Modifications
- Mean expected heterozygosity
and mean allele number are reported over polymorphic sites in
the Sample section, while they are reported over all loci in the
statistics summaries at the end of the result file.
Additions:
- Sample allele frequencies can now be
output in locus-specific files, if this option is selected in the
Molecular Diversity tab. Locus-specific files are output in the
Arlequin project result directory.
Version
3.1
Compared to version 3.01, Arlequin 3.1 includes cosmetic and speed
improvements, several bug corrections and additional features:
Bug corrections:
- Locus-by-locus AMOVA failed for on DNA
sequences when corrections for multiple hits were selected.
- File conversion towards the Phylip format could
not be done.
- It was impossible to change the default
significance level of 0.05 for highlighting significant
genetic distances in output file.
- Missing data identifier other than "?" was not
accepted.
- If a a project file (or the path to it)
contained the letters "arb", then it was erroneously considered as a
Batch file.
- Reported confidence interval around FST were
badly reported by the bootstrap procedure in case of a single group and
no Individual Level taken into
account.
- Estimation of haplotype frequencies from
distance matrix was not performed when "Conventional FST" option was
selected.
- Locus-by-locus AMOVA reported incorrect results
for Genotypic data when individuals had missing data for only one of
their gene copy at a given locus.
- Reported number of indels differed according to
the weight given to indels in the Option panel. This bug did not affect
AMOVA computations.
- For sequence data, a mixture of N's and missing data led
to problems in identifying distinct DNA sequences from distance matrix,
leading to slightly incorrect FST computations.
- Exact test of population differentiation could not be
performed when gametic phase was unknown. Now, this option has been
restored, like in ver. 2.
- Arlequin hanged when a given population was entered
several times in the definition of group for the computation of genetic
structure. Now, the error is simply flagged but the program does not
hang.
- For frequency data, it was impossible to use a
predefined distance matrix.
- Beta approximation of the significance of Tajima's D
gave wrong results. This approximation has been suppressed and now we
only report the significance level obtained from coalescent
- Bad computation of inbreeding coefficients under the
locus-by-locus AMOVA approach for genotypic data when phenotype
frequencies were larger than one. The bug caused an overestimation of
the local (FIS) and total (FIT) inbreeding level. For samples where
phenotype frequencies were all set to 1, the inbreeding coefficients
were correctly estimated.
- Expected heterozygosity reported under HWE exact test
section was inaccurately computed. This inaccuracy however did not
affect the results of the HWE exact test, which does not use
information on observed and expected heterozygosity.
Improvements
- Locus-by-locus AMOVA can now be
performed independently from conventional AMOVA. This can lead to
faster computations for large sample sizes and large number of
population samples.
- Faster routines to handle long DNA
sequences or large number of microsatellites.
- Faster reading of input file
- Faster computation of demographic
parameters from mismatch distribution. Improved convergence of
least-square fitting algorithm.
Additions:
- Computations of population specific
inbreeding coefficients and computations of their significance level.
- Computation of the number of alleles
as well as observed and expected heterozygosity per locus
- Computation of the Garza-Williamson
statistic for MICROSAT data.
- In batch mode, the summary file
(*.sum) now report the name of the analyzed file as well as the name of
the analyzed population sample.
- When saving current settings, user are
now asked to choose a file name. Default is "project file name".ars.
- New sections are provided at the end
of the result file, in order to report summary statistics computed over
all populations:
- Basic properties of the samples (size, no. of loci, etc...)
- Heterozygosity per locus
- Number of alleles + total no. of alleles over all pops
- Allelic range + total allelic range over all pops (for
microsatellite
data)
- Garza-Williamson index (for microsatellite data)
- Number of segregating sites, + total over all pops
- Molecular diversity indices (theta values)
- Neutrality tests summary statistics and p-values
- Demographic parameters estimated from the mismatch distribution
and
p-values.
- New shortcuts are provided in the left
pane of the html result file for F-statistics bootstrap confidence
intervals, population specific FIS, and summary of intra-population
statistics.
Version 3.01
Compared to version 3.0, Arlequin 3.01 include some bug
corrections and some additional features:
Bug corrections:
- Minimum Spanning
Tree Checkbox was not available for Genotypic data with known Gametic
Phase.
- Choice of how FST is
computed was not available when computing pairwise distances. Now, it
is synchronized with the choice of distance in the AMOVA panel.
- "Search for shared
haplotypes" did not work for Genotypic Data with known Gametic Phase.
This has been corrected and Arlequin now ouputs a list of haplotypes
before the table of frequencies.
- [[Mantel]] section
was not recognized if located after a [[Structure]] section.
- Improved conversion
between GenePop and Arlequin formats.
- "Diploid Data"
option is now present when converting from Genepop to Arlequin.
- Output of s.d. of
the number of alleles (k) was sometimes zero in output of Fu's FS test.
This is now corrected and annoying warning messages about " "No
molecular diversity within a sample while performing Fu's test" have
been suppressed in output file.
Additions:
- New editor of genetic structure
allowing one to modify the current Genetic Structure directly in the
graphical interface.
- Computation of population-specific FST
indices, when a single group is defined in the Genetic Structure. This
may be useful to recognize population contributing particularly to the
global FSTmeasure. This is also available in the locus-by-locus AMOVA
section.
Version 3.0
Compared to version 2, Arlequin version 3 now integrates the core
computational routines and the interface in a single program written in
C++. Therefore Arlequin does not rely on Java anymore. This has two
consequences: the new graphical interface is nicer and faster, but it
is less portable than before. At the moment we release a Windows
version (2000, XP, and above) and we shall probably release later a
Linux. Support for the Mac has been discontinued.
Other main changes
include:
- Correction of many
small bugs
- Incorporation of two
new methods to estimate gametic phase and haplotype frequencies
- EM zipper algorithm: An extension of the EM algorithm allowing
one to
handle a larger number of polymorphic sites than the plain EM algorithm.
- ELB algorithm: a pseudo-Bayesian approach to specifically
estimate
gametic phase in recombining sequences.
- Incorporation of a
least-square approach to estimate the parameters of an instantaneous
spatial expansion from DNA sequence diversity within samples, and
computations of bootstrap confidence intervals using coalescent
simulations.
- Estimation of
confidence intervals for F-statistics, using a bootstrap approach when
genetic data on more than 8 loci are available.
- Update of the
java-script routines in the output html files, making them fully
compatible with Firefox 1.X.
- A completely
rewritten and more robust input file parsing procedure, giving more
precise information on the location of potential syntax and format
mistakes.
- Use of the ELB algorithm described above to
generate samples of phased multi-locus genotypes, which allows one to
analyse unphased multi-locus genotype data as if the phase was known.
The phased data sets are output in Arlequin projects that can be
analysed in a batch mode to obtain the distribution of statistics
taking phase uncertainty into account.
- No need to define a
web browser for consulting the results. Arlequin will automatically
present the results in your default web browser (we recommend the use
of Firefox freely available on
http://www.mozilla.org/products/firefox/central.html.
Last
edit: 02.08.2015 by Laurent
Excoffier