The genetic structure differences in population is one of the key elements in medical research involving multi-population samples. A set of ancestry-informative single nucleotide polymorphisms (AI-SNPs) can be utilized to analyze genetic component of a population, infer ancestral origin of individuals and pre-filter samples to reduce the impact of population genetic structure differences on medical research. However, most of the published studies were focused on revealing the differences between populations of continents or regions of a continent. In this paper, AI-SNPs were screened by calculating FST value in each pair of five East Asian populations: Japanese in Tokyo (JPT), Han Chinese in Beijing (CHB), Southern Han Chinese (CHS), Chinese Dai in Xishuangbanna (CDX) and Kinh in Ho Chi Minh City (KHV) in the 1000 Genomes Project phase 3 (GRCh37.p13) to analyze differences in subcontinent populations. The results demonstrate that the five East Asian populations in our study were assigned to three clusters: JPT, CHB and CHS, CDX and KHV. A set of AI-SNPs can be used for analysis of individual genetic composition and selection of representative individuals. Individuals with over 80% population representative genetic components have good representativeness of a population. This paper demonstrated the practical value of the method, which was performed to verify the ancestral composition and select representative samples with a panel of screened AI-SNPs by FST value, thereby reducing the influence of genetic structure differences in subcontinent populations on population-related medical research.
在涉及多群体样本的医学研究中,群体遗传结构差异是不容忽视的影响因素之一。利用族源信息单核苷酸多态性遗传标记(ancestry-informative single nucleotide polymorphism, AI-SNP),通过分析群体遗传成分、推断个体遗传背景并对群体样本进行预筛选,可以有效降低群体遗传结构差异对医学研究影响。鉴于已发表的研究多为解析大陆间、大陆次级区域间的群体遗传结构差异,本研究拟基于千人基因组计划(GRCh37.p13)中东亚五群体:日本东京群体(Japanese in Tokyo, JPT)、北京汉族(Han Chinese in Beijing, CHB)、南方汉族(Southern Han Chinese, CHS)、西双版纳傣族(Chinese Dai in Xishuangbanna, CDX)、越南京族(Kinh in Ho Chi Minh City, KHV)的数据,以FST值为标准筛选AI-SNP并分析大陆次级区域内群体遗传结构差异。结果表明,研究涉及的东亚群体可分为三簇:JPT、CHB和CHS、CDX和KHV。利用AI-SNP可成功解析个体的遗传背景,而群体代表性遗传成分占比超过80%的个体具有良好的群体代表性。本研究表明,基于FST值筛选一组AI-SNP用于核验样本遗传背景、筛选群体代表性样本的方法在降低大陆次级区域内群体遗传结构差异对群体相关医学研究的影响中具有实际应用价值。.
Keywords: East Asian populations; ancestry-informative marker; genetic structure differences; single nucleotide polymorphism (SNP).