Large-scale selection of highly informative microhaplotypes for ancestry inference and population specific informativeness

Maria Luisa de Barros Rodrigues; Marcelo Porto Rodrigues; Heather L Norton; Celso Teixeira Mendes-Junior; Aguinaldo Luiz Simões; Daniel John Lawson

doi:10.1016/j.fsigen.2024.103153

Large-scale selection of highly informative microhaplotypes for ancestry inference and population specific informativeness

Forensic Sci Int Genet. 2025 Jan:74:103153. doi: 10.1016/j.fsigen.2024.103153. Epub 2024 Oct 5.

Authors

Maria Luisa de Barros Rodrigues¹, Marcelo Porto Rodrigues², Heather L Norton³, Celso Teixeira Mendes-Junior⁴, Aguinaldo Luiz Simões⁵, Daniel John Lawson⁶

Affiliations

¹ Programa de Pós-Graduação em Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900, Brazil. Electronic address: [email protected].
² Retired.
³ Department of Anthropology, University of Cincinnati, Cincinnati, OH 45221, United States.
⁴ Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil.
⁵ Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900, Brazil.
⁶ Institute of Statistical Sciences, School of Mathematics, Woodland Road, University of Bristol, Bristol BS8 1UG, UK; MRC Integrative Epidemiology Unit, School of Medicine, Oakfield Grove, University of Bristol, Bristol BS8 2BN, UK. Electronic address: [email protected].

PMID: 39378714
DOI: 10.1016/j.fsigen.2024.103153

Abstract

Microhaplotypes (MHs) describe physically close genetic markers that are inherited together and are gaining prominence due to their efficiency in forensic, clinical, and population studies. They excel in kinship analysis, DNA mixture detection, and ancestry inference, offering advantages in precision over individual SNPs and STRs. In this study, a pipeline was developed to efficiently select highly informative MHs from large-scale genomic datasets. Over 120,000 MHs were identified from almost a million markers, which allow this non-independent information to be efficiently used for inference. The MHs were compared to SNPs in terms of their informativeness and performance of their subsets in ancestry inference and all the results consistently favored MHs. A method for ranking markers by specific population informativeness was also introduced, which showed improvement in the accuracy of Native American ancestry estimation, overcoming the challenges of its underrepresentation in datasets. In conclusion, this study presents a comprehensive way for selecting highly informative MHs for accurate ancestry inference. The proposed approach and the subsets selected by specific population informativeness offer valuable tools for improving ancestry inference accuracy, particularly for admixed populations as demonstrated for a Brazilian dataset.

Keywords: Ancestry; Brazilian population; Informativeness; Microarray; Microhaplotypes; Native Americans.

MeSH terms

Brazil
DNA Fingerprinting
Genetic Markers
Genetics, Population*
Haplotypes*
Humans
Polymorphism, Single Nucleotide*
Racial Groups / genetics

Substances

Genetic Markers