Matthieu Foll 2011

Detecting natural selection from population-based genetic data

atentionNew version 2.1 available. It's much faster with the support of multicore machines and adds q-value calculation !


This program, BayeScan aims at identifying candidate loci under natural selection from genetic data, using differences in allele frequencies between populations. BayeScan is based on the multinomial-Dirichlet model. One of the simplest possible scenarios covered consists of an island model in which subpopulation allele frequencies are correlated through a common migrant gene pool from which they differ in varying degrees. The difference in allele frequency between this common gene pool and each subpopulation is measured by a subpopulation specific FST coefficient. Therefore, this formulation can consider realistic ecological scenarios where the effective size and the immigration rate may differ among subpopulations.

Being Bayesian, BayeScan incorporates the uncertainty on allele frequencies due to small sample sizes. In practice, very small sample size can be use, with the risk of a low power, but with no particular risk of bias. Allele frequencies are estimated using different statistical models depending on the type of genetic marker used. In BayeScan, three different types of data can be used: (i) codominant data (as SNPs or microsatellites), (ii) dominant binary data (as AFLPs) and (iii) AFLP amplification intensity, which are neither considered as dominant nor codominant.

Selection is introduced by decomposing FST coefficients into a population-specific component (beta) shared by all loci, and a locus-specific component (alpha) shared by all the populations using a logistic regression. Departure from neutrality at a given locus is assumed when the locus-specific component is necessary to explain the observed pattern of diversity (alpha significantly different from 0). A positive value of alpha suggests diversifying selection, whereas negative values suggest balancing or purifying selection. This leads to two alternative models for each locus, including or not the alpha component to model selection. For each locus, a reversible-jump MCMC explores models with and without selection (alpha component being either present or absent, respectively) and estimates their relative posterior probabilities.


BayeScan and its improvements have been described successively in: