abcEst3.exe ============ abcEst3 is a program to estimate parameters by a locally weighted regression approach, as described in Beaumont et al. (2002). abcEst needs 3 files: 1) A file being the results of simulations, containing as many lines as simulations. On each line, one should list the parameters used for the simulations, and the summary statistics computed on th esimulated data. 2) A file containing as many lines as estimations to be performed. On each line, one should list the summary statistics computed on the observed data set for which one wants to estimate parameters. 3) A settings file, listing the name of the two above files, some parameters for the estimation procedure, as well as parameter to define output file. In this package we provide: =========================== 1) abcEst3.exe: The MS-Windows executable console program. 2) Example files to perform a basic estimation procedure: - simFile100K.txt: A file containing the product of 100,000 simulations. - observedStats.txt: A file containing 10 summary statistics computed on an observed STR data set. - settings.txt: The setting file mentioned above. What is not provided: ===================== We do not provide the other programs necessary for making ABC estimations, i.e. those involved in steps 1 and 2 in Figure 2 of Excoffier et al. (2005), which are: A program to draw parameters from prior distributions and write input files. A program for simulating genetic data given selected parameters. A program computing summary statistics from simulated and observed genetic data, and writing parameters and summary statistics in the simulation file. abcEst is totally independent from the above programs, and it is left to the reponsability of the user to generate the necessary input files for abcEst. How to launch the program ========================= In a console window, at the command prompt, just write "abcEst ", where the settings file name is the file containing all necessary information for a successful run of abcEst. With the examples files above you would just need to type: > abcEst3 settings.txt Settings file ============= In this file one specifies different settings for the computation of the posterior densities: - Name of the file containing the simulated paraters and summary statistics 8a header line is optionaly in this file, see below) - Name of the file containing the observed summary statistics (It shoudl not contain any header line) - Number of parameters to estimate - Number of summary statistics - No of simulations to retain for parameter estimation - Distance choice: 1=Euclidean distance as defined in Beaumont et al. 2002 (DO NOT PUT ANYTHING ELSE) - Generic name for output files (e.g. my_output), which wil be used as a prefix for all output files. - Transformation of parameters before regression (0: none; 1: log; 2: log(tan^-1) - see Hamilton and Excoffier, 2005, PNAS) - Compute and output posterior densities of parameters 1 = yes, 0 = no output - Presence of header line in simulation file 1 = yes, 0 = no - Maximum number of lines to read in simulation file - Output Euclidean distances 0=no, 1 = yes - Compute multiple correlation coefficients 0=no, 1=yes(KEEP ZERO) - Do Manly77 transform of statistics prior to regression 0=no, 1=yes(KEEP ZERO) - Type of kernel to use for density estimation, and bandwidth of this kernel: Epanechnikov (0) or Gaussian (1) kernel for density. The 2nd parameter is the bandwidth Output of abcEst ================ abcEst outputs many files, which should all begin by a prefix you can defined in the settings file. In the example given the prefix is set to "my_output". I list below the files of interest for most users: Parameter estimates and quantiles of the posterior distribution --------------------------------------------------------------- - my_output_resSummary.txt In these files, I provide different point estimators for all estimated parameters: - alpha_eq6: Estimator estimated from equation 6 in Beaumont et al. (2002) obtained from the regression. - alpha_eq7: Estimator estimated from equation 7 in Beaumont et al. (2002) obtained from the regression. - Mean: Mean of the posterior distribution - Mode: Mode of the posterior distribution - Median: Median of the posterior distribution - qt_x: xth quantile of the posterior distribution, which can be used to compute credible intervals for the parameters - my_output_resParam_i.txt, where i is the parameter index. These files contain the estimates of each parameter separately, as well as the quantiles of their posterior distribution. Posterior densities ------------------- - my_output_raw_posterior_PhiStars_i_Paramj.txt, where i is the observation index, and j is the parameter index This file contains the raw posterior values of the parameters, and densities can be computed from these values with a separate program, e.g. R. - my_output_postDens_Unweighted_Obs_i_Paramj.txt, where i is the observation index, and j is the parameter index. Output of the unweighted posterior and prior distributions of the parameters, useful for making graphs. Note that the posterior densities are computed using a specific kernel and bandwidth defined in the setings file. The posterior distribution is computed for 1000 equally spced points, and the prior distribution is computed for 200 equally spaced points. - my_output_HPDU.txt: Contains point estimates computed on the unweighted posterior distribution (amd which thus depends on the selected bandwidth and kernel), as well as Highest Posterior Density credible intervals. - my_output_postDens_Weighted_Obs_i_Paramj.txt, where i is the observation index, and j is the parameter index. Output of the weighted posterior and prior distributions of the parameters, useful for making graphs. Note that the posterior densities are computed using a specific kernel and bandwidth defined in the setings file. The posterior distribution is computed for 1000 equally spced points, and the prior distribution is computed for 200 equally spaced points. Compared with the unweighted distribution, the Phi-stars values are here weighted again according to their associated euclidean distance, as recommended in Beaumont et al. (2002) equation 9. - my_output_HPDW.txt: Contains point estimates computed on the weighted posterior distribution (amd which thus depends on the selected bandwidth and kernel), as well as Highest Posterior Density credible intervals. - Other posterior densities are provided as a convenience, but users should perhaps concentrate on those liste above. References: ========== Beaumont MA, Zhang W and Balding DJ. 2002. Approximate Bayesian Computation in Population Genetics. Genetics. 162: 2025-2035. Excoffier L, Estoup A and Cornuet J-M. 2005. Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers. Genetics. 169: 1727-1738. Hamilton G, Currat M, Ray N et al. 2005. Bayesian estimation of recent migration rates after a spatial expansion. Genetics. 170: 409-417.