Modeling recent positive selection using identity-by-descent segments

Seth D Temple; Ryan K Waples; Sharon R Browning

doi:10.1016/j.ajhg.2024.08.023

Modeling recent positive selection using identity-by-descent segments

Am J Hum Genet. 2024 Nov 7;111(11):2510-2529. doi: 10.1016/j.ajhg.2024.08.023. Epub 2024 Oct 2.

Authors

Seth D Temple¹, Ryan K Waples², Sharon R Browning³

Affiliations

¹ Department of Statistics, University of Washington, Seattle, WA, USA. Electronic address: [email protected].
² Department of Biostatistics, University of Washington, Seattle, WA, USA.
³ Department of Biostatistics, University of Washington, Seattle, WA, USA. Electronic address: [email protected].

PMID: 39362217
PMCID: PMC11568764 (available on 2025-05-07)
DOI: 10.1016/j.ajhg.2024.08.023

Abstract

Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s≥0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.

Keywords: confidence intervals; identity-by-descent; selection coefficient; selective sweeps.

MeSH terms

Alleles
Computer Simulation
Gene Frequency*
Genetics, Population
Haplotypes* / genetics
Humans
Models, Genetic*
Selection, Genetic*
White People / genetics