This
is an important release of BayeScan, containing new features, bug
corrections, and updated R scripts.
New
features:
- The command line version of BayeScan has been optimized to
work on multicore processors. In practice this means that it will be
much faster than previous versions as most machines now have at least 4
cores. This has been done using the very convinient OpenMP API for parallel programming. Unfortunatly, for the moment the Windows graphical version does not support this feature...
We encourage people with large data sets to use the command line
version which is also available under windows, and does support this
feature !
- BayeScan now directly calculates q-values for each locus and we encourage users to use (and report) this measure to make decisions. q-values are the False Discovery Rate (FDR) analogue of the p-value.
The q-value of given locus is the minimum FDR at which this locus may
become significant. In practice, if you choose, for example, a q-value
threshold of 10%, it means that 10% of the corresponding outlier
markers (those having a q-value lower than 10%) are expected to be false positives.
Improvements:
- The MCMC trace output file (*.sel) no longer contains the
alpha (selection) parameters by default, as it can lead to very large
files when using a large number of markers. It is still possible to do
it by using the "-all_trace" option in the command line.
- The R plot script has been updated to directly use the
q-values calculated by BayeScan. It is now much faster and plots the
q-value on the x-axis instead of the PO. It also adds the option to highlight a particular set of makers.
Bug corrections:
- BayeScan remained stuck when a codominant marker was declared in the input file to have a single allele (monomorphic marker).
|
This version only includes two bugs
corrections identified by the early users of 2.0:
Bug corrections:
- A bug in the likelihood calculation of AFLP amplification
intensity data. This created errors like: "Floating point overflow",
"Log: DOMAIN error" and "nan" values in the output.
- The new prior odds option was not correcly taken into
account in the calculations.
|
This
is a major release of BayeScan, containing several new features, bug
corrections, and new R scripts for better analysis and
plotting.
New
features:
- The prior odds for the neutral model can now be set by
the user. In the context of multiple testing, prior odds
are used to incorporate our skepticism about the chance that each locus
is under selection. The prior odds was fixed to 1:1 in
previous versions which may not be appropriate when using a large
number of markers or relatively
uninformative data. The default option is
now 10:1, which assumes that the neutral model is 10 times more likely
than the model with selection for each marker. Consequently Bayes
Factors (BF) are
now replaced by Posterior Odds (PO) in the output (PO=BF if the
prior odds is fixed to 1:1).
- The R function
for plotting results has been improved and now allows to control for
the False Discovery Rate
(FDR). FDR is defined as the expected proportion of false positives
among outlier markers. Controlling for FDR has a much greater power
than controlling for familywise error rate using Bonferroni correction
for example. The user can now choose a target FDR, and the R
function finds automatically the posterior odds threshold to used
achieving this FDR or lower.
- The prior distribution for FIS
coefficients when using dominant markers can now be changed. Users can
either use a uniform distribution and set the lower and upper
bounds, or a beta distribution
and set the mean and standard deviation. Note that the parameters of
the beta distribution are automatically computed from the mean and the
standard deviation chosen, but some combinations of parameters may not
be possible to use with a beta distribution. The default behavior
is to use a uniform distribution between zero and one, as it was the
case in version 1.0.
- AFLP amplification intensity model to estimate F-statistics
is now implemented in BayeScan, as proposed in:
Fischer
MC, Foll M, Excoffier L and G Heckel (in review) Enhanced AFLP genome
scans detect local adaptation in high-altitude populations of a small
rodent (Microtus arvalis)
- Some new R functions specifically designed to deal with
AFLP
amplification intensity data are included. They allow data handling,
cleaning and conversion.
- The possibility to provide a list of loci to discard from
the
calculation. This can be convenient for example to remove monomorphic
loci without having to manually delete them from the input file.
- Option to only calculate F-statistics, without trying to
identify loci under selection. This option has been used in Foll et al.
(2010).
- Possibility to use a matrix of SNP genotypes. This option
has been used in Foll et al.
(2010) to compare the power of AFLPs and SNPs to estimate
inbreeding coefficient FIS. If you are not directly
interested in FIS, you should rather use SNPs as a regular
codominant data, which leads to much faster computation.
Improvements:
- For dominant and AFLP amplification intensity data, a
matrix of
allele frequency posterior mean can be produced instead of the very big
files containing the full trace of the MCMC algorithm.
- The manual has been deeply rewritten.
- The
Microsoft Windows plot program is no longer supported and new output
files are not compatible with it. It is replaced by a R
function which provides much better quality graphics, ready for
publication, that can be exported in pdf format.
- The Windows Graphical User Interface (GUI) has been
modified to
incorporate the new features and to be more coherent with the command
line version. MCMC algorithm options have been simplified and
the proposal distributions can no longer be modified
as they are
automatically tuned by pilot runs.
- The GENEPOP
converter for codominant data is no longer supported. Instead, we
advice people to use PGD
spider, a software converter developed by Heidi Lischer available freely.
- C++ code has been cleaned for compatibility with latest gcc versions.
- Speed optimizations.
Bug corrections:
- The prior
distribution for FST
coefficients was too extremely skewed to zero,
leading to convergence problems with data containing small amount of
information (typically pair of low differentiated populations with a
small number of markers).
- On a few machines the GUI version crashed just after the
pilot runs and burn-in.
- The GUI version crashed when the decimal separator was not
set to
"." (this is the case for some non-English Windows versions). Any
decimal separator can now be used.
- The estimated time left was not correctly calculated in the
GUI when using very large datasets.
- The program no longer displays an error message when the
help file is not found.
|
This
version has not been officially released but I shared it with a few
people. There were two differences with the previous version:
- The option to change the prior distribution for FIS
coefficients when using dominant markers (Uniform or Beta, see version
2.0).
- A bug correction concerning the prior
distribution
for FST
coefficients. The prior distribution was too extremely skewed to zero,
leading to convergence problems with data containing small amount of
information (typically pair of low differentiated populations with a
small number of markers).
|
This was the
first public release of BayeScan published in:
Foll,
M and OE Gaggiotti (2008) A genome scan method to identify selected
loci appropriate for both dominant and codominant markers: A Bayesian
perspective. Genetics 180: 977-993
|