Arlequin: About Arlequin

Intra-population level

Computation of different diversity indices
- Number of polymorphic loci or number of segregating sites
- Number of different haplotypes
- Number of alleles per locus
- Number of nucleotide sites with:
  - substitutions
  - transitions
  - transversion
  - insertion-deletion
Nucleotide frequencies in a sample of DNA sequences
Estimation of distances between molecular haplotypes
- Number of pairwise differences
- Proportion of pairwise differences
- Jukes and Cantor correction for multiple hits per site, with or without gamma-correction for heterogeneous mutation rates (Jukes and Cantor, 1969)
- Kimura 2-parameters correction for transition-transversion bias, with or without gamma-correction for heterogeneous mutation rates (Kimura, 1980)
- Tajima-Nei correction for unequal substitution rates among nucleotides, with or without gamma-correction for heterogeneous mutation rates (Tajima and Nei, 1984)
- Tamura's correction for transition-transversion bias and unequal substitution rates among nucleotides, with or without gamma-correction for heterogeneous mutation rates (Tamura, 1992)
- Tamura-Nei's gamma-correction for heterogeneous mutation rates (Tamura and Nei, 1993)
Estimation of maximum-likelihood allele frequencies, with or without a recessive allele
Estimation of maximum-likelihood multi-locus haplotype frequencies
- Using a gene counting method when the gametic phase is known
- Using an EM (expectation-maximization) algorithm when the gametic phase is unknown, or in the presence of recessive alleles.
  - Standard deviations are obtained by a bootstrap procedure (Efron 1982)
Estimation of the mutation parameter q=4Nu, and its confidence interval, from
- the observed number of alleles (haplotypes) k and the sample size n.
- the observed number of segregating sites S and the sample size n.
- The sample homozygosity (H)
- The mean number of pairwise differences (p) between all pairs of haplotypes in the sample
Generation of expected allele (haplotype) frequencies under the infinite allele model, conditional on sample size n and observed number of alleles (haplotypes) k, using a simulation procedure adapted from Stewart (1977).
Estimation of sample molecular diversity (mean number of pairwise site differences, p).
Estimation of sample nucleotide diversity (mean heterozygosity per nucleotide site).
Estimation of sample heterozygosity and sample homozygosity (unbiased estimates).
Computation of the distribution of the number of pairwise differences between all pairs of chromosomes in the sample.
Exact test of Hardy Weinberg equilibrium, using a Markov-chain approach modified from Guo and Thomson (1992).
Exact test of linkage disequilibrium between any pair of loci when the gametic phase is known, using a Markov chain approach.
Likelihood ratio test of linkage disequilibrium when gametic phase is unknown (Chi-square approximation)
Likelihood ratio test of linkage disequilibrium when gametic phase is unknown (non-parametric test based on permutation of alleles among haplotypes)
Ewens exact test of selective neutrality, using a procedure adapted from Slatkin (1994), applicable to any number of alleles per locus and any sample size.
Ewens-Watterson F-test (Watterson 1978) of selective neutrality based on sample autozygosity (F)
Tajima's selective neutrality test (Tajima 1989a) based on the comparison between the sample mean number of pairwise differences (p) and the number of segregating sites (S).

Inter-population level

Search for shared alleles or haplotypes between populations
Population genetic structure is estimated from haplotypic data using an analysis of molecular variance (AMOVA) framework (Excoffier et al. 1992) with a maximum of four hierarchical levels:
- alleles (or haplotypes) within individuals
- individuals within demes
- demes within populations
- populations
It allows the estimation of unbiased fixation indices (Weir and Cockerham 1984, Weir 1990), for any combination of these 4 sources of variability. the following data types can be accommodated:
- RFLPs
- DNA sequences
- Microsatellite data
- Standard data (allele frequencies)
Population genetic structure is estimated from genotypic data for the same molecular data types and hierarchical levels, using the approach described in Michalakis and Excoffier (1996).
The significance of the fixation indices are tested using non-parametric permutation approaches. Different permutation schemes are implemented when testing the different fixation indices depending on a given hierarchical structure.
Pairwise FST's, coancestry coefficients and Nm estimates can be computed for all pairs of populations. Their significance is also tested by a non-parametric permutation approach. Pairwise FST 's can then be translated into divergence times between populations.
Exact test of population differentiation based on the comparison of haplotype or genotype frequencies