Characterizing features affecting local ancestry inference performance in admixed populations

Jessica Honorato-Mauer; Nirav N Shah; Adam X Maihofer; Clement C Zai; Sintia Belangero; Caroline M Nievergelt; Psychiatric Genomics Consortium for PTSD Ancestry Working Group; Marcos Santoro; Elizabeth G Atkinson

doi:10.1016/j.ajhg.2024.12.005

Characterizing features affecting local ancestry inference performance in admixed populations

Am J Hum Genet. 2024 Dec 26:S0002-9297(24)00447-6. doi: 10.1016/j.ajhg.2024.12.005. Online ahead of print.

Authors

Jessica Honorato-Mauer¹, Nirav N Shah¹, Adam X Maihofer², Clement C Zai³, Sintia Belangero⁴, Caroline M Nievergelt²; Psychiatric Genomics Consortium for PTSD Ancestry Working Group; Marcos Santoro⁵, Elizabeth G Atkinson⁶

Affiliations

¹ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
² Department of Psychiatry, School of Medicine, University of California at San Diego, La Jolla, CA 92093, USA.
³ Department of Psychiatry, Institute of Medical Science, Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, USA.
⁴ Department of Morphology and Genetics, Universidade Federal de São Paulo, São Paulo 04023-062, Brazil.
⁵ Department of Biochemistry, Molecular Biology Division, Universidade Federal de São Paulo, São Paulo 04023-062, Brazil. Electronic address: [email protected].
⁶ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; The Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA. Electronic address: [email protected].

PMID: 39753130
DOI: 10.1016/j.ajhg.2024.12.005

Abstract

In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using local ancestry inference (LAI). Accurate LAI is crucial to ensure that downstream analyses accurately reflect the genetic ancestry of research participants. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries-African (AFR), Amerindigenous (AMR), and European (EUR). Simulating linkage-disequilibrium-informed admixed haplotypes under a variety of 2- and 3-way admixture models, we implemented a standard LAI pipeline, testing the impact of reference panel composition, DNA data type, demography, and software parameters to quantify ancestry-specific LAI accuracy. We observe that across all models, AMR tracts have notably reduced LAI accuracy as compared to EUR and AFR tracts, with true positive rate means for AMR ranging from 88% to 94%, EUR from 96% to 99%, and AFR from 98% to 99%. When LAI miscalls occurred, they most frequently erroneously called EUR ancestry in true AMR sites. Concerning reference panel curation, we find that using a reference panel well matched to the target population, even with a smaller sample size, was accurate and the most computationally efficient. Imputation did not harm LAI performance in our tests; rather, we observed that higher variant density improved accuracy. While directly responsive to admixed Latin American cohort compositions, these trends are broadly useful for informing best practices for LAI across admixed populations. Our findings reinforce the need for the inclusion of more underrepresented populations in sequencing efforts to improve reference panels.

Keywords: ancestry; bioinformatics; genetic admixture; local ancestry inference; population genetics; reference panels.