Background: Historically, geneticists have relied on genotyping arrays and imputation to study human genetic variation. However, an underrepresentation of diverse populations has resulted in arrays that poorly capture global genetic variation, and a lack of reference panels. This has contributed to deepening global health disparities. Whole genome sequencing (WGS) better captures genetic variation but remains prohibitively expensive. Thus, we explored WGS at "mid-pass" 1-7x coverage.
Results: Here, we developed and benchmarked methods for mid-pass sequencing. When applied to a population without an existing genomic reference panel, 4x mid-pass performed consistently well across ethnicities, with high recall (98%) and precision (97.5%).
Conclusion: Compared to array data imputed into 1000 Genomes, mid-pass performed better across all metrics and identified novel population-specific variants with potential disease relevance. We hope our work will reduce financial barriers for geneticists from underrepresented populations to characterize their genomes prior to biomedical genetic applications.
© 2021. The Author(s).