Accounting for historical demographic features, such as the strength and timing of gene flow and divergence times between closely related lineages, is vital for many inferences in evolutionary biology. Approximate Bayesian computation (ABC) is one method commonly used to estimate demographic parameters. However, the DNA sequences used as input for this method, often microsatellites or RADseq loci, usually represent a small fraction of the genome. Whole genome sequencing (WGS) data, on the other hand, have been used less often with ABC, and questions remain about the potential benefit of, and how to best implement, this type of data; we used pseudo-observed data sets to explore such questions. Specifically, we addressed the potential improvements in parameter estimation accuracy that could be associated with WGS data in multiple contexts; namely, we quantified the effects of (a) more data, (b) haplotype-based summary statistics, and (c) locus length. Compared with a hypothetical RADseq data set with 2.5 Mbp of data, using a 1 Gbp data set consisting of 100 Kbp sequences led to substantial gains in the accuracy of parameter estimates, which was mostly due to haplotype statistics and increased data. We also quantified the effects of including (a) locus-specific recombination rates, and (b) background selection information in ABC analyses. Importantly, assuming uniform recombination or ignoring background selection had a negative effect on accuracy in many cases. Software and results from this method validation study should be useful for future demographic history analyses.
Keywords: approximate Bayesian computation; background selection; demographic history; msABC; whole genome sequencing.
© 2019 John Wiley & Sons Ltd.