In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using local ancestry inference (LAI). Accurate LAI is crucial to ensure that downstream analyses accurately reflect the genetic ancestry of research participants. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries-African (AFR), Amerindigenous (AMR), and European (EUR). Simulating linkage-disequilibrium-informed admixed haplotypes under a variety of 2- and 3-way admixture models, we implemented a standard LAI pipeline, testing the impact of reference panel composition, DNA data type, demography, and software parameters to quantify ancestry-specific LAI accuracy. We observe that across all models, AMR tracts have notably reduced LAI accuracy as compared to EUR and AFR tracts, with true positive rate means for AMR ranging from 88% to 94%, EUR from 96% to 99%, and AFR from 98% to 99%. When LAI miscalls occurred, they most frequently erroneously called EUR ancestry in true AMR sites. Concerning reference panel curation, we find that using a reference panel well matched to the target population, even with a smaller sample size, was accurate and the most computationally efficient. Imputation did not harm LAI performance in our tests; rather, we observed that higher variant density improved accuracy. While directly responsive to admixed Latin American cohort compositions, these trends are broadly useful for informing best practices for LAI across admixed populations. Our findings reinforce the need for the inclusion of more underrepresented populations in sequencing efforts to improve reference panels.
Keywords: ancestry; bioinformatics; genetic admixture; local ancestry inference; population genetics; reference panels.
Copyright © 2024 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.