SIB

BayeScan

Unibe
 Matthieu Foll 2011

Detecting natural selection from population-based genetic data



Version 2.1 (21.01.2012)

This is an important release of BayeScan, containing new features, bug corrections, and updated R scripts.

New features:

  • The command line version of BayeScan has been optimized to work on multicore processors. In practice this means that it will be much faster than previous versions as most machines now have at least 4 cores. This has been done using the very convinient OpenMP API for parallel programming. Unfortunatly, for the moment the Windows graphical version does not support this feature... We encourage people with large data sets to use the command line version which is also available under windows, and does support this feature !
  • BayeScan now directly calculates q-values for each locus and we encourage users to use (and report) this measure to make decisions. q-values are the False Discovery Rate (FDR) analogue of the p-value. The q-value of given locus is the minimum FDR at which this locus may become significant. In practice, if you choose, for example, a q-value threshold of 10%, it means that 10% of the corresponding outlier markers (those having a q-value lower than 10%) are expected to be false positives.

Improvements:

  • The MCMC trace output file (*.sel) no longer contains the alpha (selection) parameters by default, as it can lead to very large files when using a large number of markers. It is still possible to do it by using the "-all_trace" option in the command line.
  • The R plot script has been updated to directly use the q-values calculated by BayeScan. It is now much faster and plots the q-value on the x-axis instead of the PO. It also adds the option to highlight a particular set of makers.

Bug corrections:

  • BayeScan remained stuck when a codominant marker was declared in the input file to have a single allele (monomorphic marker).

Version 2.01 (24.01.2011)

This version only includes two bugs corrections identified by the early users of 2.0:

Bug corrections:

  • A bug in the likelihood calculation of AFLP amplification intensity data. This created errors like: "Floating point overflow", "Log: DOMAIN error" and "nan" values in the output. 
  • The new prior odds option was not correcly taken into account in the calculations.

Version 2.0 (2011)

This is a major release of BayeScan, containing several new features, bug corrections, and new R scripts for better analysis and plotting.

New features:

  • The prior odds for the neutral model can now be set by the user. In the context of multiple testing, prior odds are used to incorporate our skepticism about the chance that each locus is under selection. The prior odds was fixed to 1:1 in previous versions which may not be appropriate when using a large number of markers or relatively uninformative data. The default option is now 10:1, which assumes that the neutral model is 10 times more likely than the model with selection for each marker. Consequently Bayes Factors (BF) are now replaced by Posterior Odds (PO) in the output (PO=BF if the prior odds is fixed to 1:1).
  • The R function for plotting results has been improved and now allows to control for the False Discovery Rate (FDR). FDR is defined as the expected proportion of false positives among outlier markers. Controlling for FDR has a much greater power than controlling for familywise error rate using Bonferroni correction for example. The user can now choose  a target FDR, and the R function finds automatically the posterior odds threshold to used achieving this FDR or lower.
  • The prior distribution for FIS coefficients when using dominant markers can now be changed. Users can either use a uniform distribution and set the lower and upper bounds, or a beta distribution and set the mean and standard deviation. Note that the parameters of the beta distribution are automatically computed from the mean and the standard deviation chosen, but some combinations of parameters may not be possible to use with a beta distribution. The default behavior is to use a uniform distribution between zero and one, as it was the case in version 1.0.
  • AFLP amplification intensity model to estimate F-statistics is now implemented in BayeScan, as proposed in:
Foll M, Fischer MC, Heckel G and L Excoffier (2010) Estimating population structure from AFLP amplification intensity. Molecular Ecology 19: 4638-4647
Fischer MC, Foll M, Excoffier L and G Heckel (in review) Enhanced AFLP genome scans detect local adaptation in high-altitude populations of a small rodent (Microtus arvalis)
  • Some new R functions specifically designed to deal with AFLP amplification intensity data are included. They allow data handling, cleaning and conversion.
  • The possibility to provide a list of loci to discard from the calculation. This can be convenient for example to remove monomorphic loci without having to manually delete them from the input file.
  • Option to only calculate F-statistics, without trying to identify loci under selection. This option has been used in Foll et al. (2010).
  • Possibility to use a matrix of SNP genotypes. This option has been used in Foll et al. (2010) to compare the power of AFLPs and SNPs to estimate inbreeding coefficient FIS. If you are not directly interested in FIS, you should rather use SNPs as a regular codominant data, which leads to much faster computation.

Improvements:

  • For dominant and AFLP amplification intensity data, a matrix of allele frequency posterior mean can be produced instead of the very big files containing the full trace of the MCMC algorithm.
  • The manual has been deeply rewritten.
  • The Microsoft Windows plot program is no longer supported and new output files are not compatible with it. It is replaced by a R function which provides much better quality graphics, ready for publication, that can be exported in pdf format. 
  • The Windows Graphical User Interface (GUI) has been modified to incorporate the new features and to be more coherent with the command line version. MCMC algorithm options have been simplified and the proposal distributions can no longer be modified as they are automatically tuned by pilot runs.
  • The GENEPOP converter for codominant data is no longer supported. Instead, we advice people to use PGD spider, a software converter developed by Heidi Lischer available freely.
  • C++ code has been cleaned for compatibility with latest gcc versions.
  • Speed optimizations.

Bug corrections:

  • The prior distribution for FST coefficients was too extremely skewed to zero, leading to convergence problems with data containing small amount of information (typically pair of low differentiated populations with a small number of markers).
  • On a few machines the GUI version crashed just after the pilot runs and burn-in.
  • The GUI version crashed when the decimal separator was not set to "." (this is the case for some non-English Windows versions). Any decimal separator can now be used.
  • The estimated time left was not correctly calculated in the GUI when using very large datasets.
  • The program no longer displays an error message when the help file is not found.

Version 1.1 (2010)

This version has not been officially released but I shared it with a few people. There were two differences with the previous version:
  • The option to change the prior distribution for FIS coefficients when using dominant markers (Uniform or Beta, see version 2.0).
  • A bug correction concerning the prior distribution for FST coefficients. The prior distribution was too extremely skewed to zero, leading to convergence problems with data containing small amount of information (typically pair of low differentiated populations with a small number of markers). 

Version 1.0 (2008)

This was the first public release of BayeScan published in:
Foll, M and OE Gaggiotti (2008) A genome scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics 180: 977-993