New clustering methods for population comparison on paternal lineages

Mol Genet Genomics. 2015 Apr;290(2):767-84. doi: 10.1007/s00438-014-0949-7. Epub 2014 Nov 12.

Abstract

The goal of this study is to show two new clustering and visualising techniques developed to find the most typical clusters of 18-dimensional Y chromosomal haplogroup frequency distributions of 90 Western Eurasian populations. The first technique called "self-organizing cloud (SOC)" is a vector-based self-learning method derived from the Self Organising Map and non-metric Multidimensional Scaling algorithms. The second technique is a new probabilistic method called the "maximal relation probability" (MRP) algorithm, based on a probability function having its local maximal values just in the condensation centres of the input data. This function is calculated immediately from the distance matrix of the data and can be interpreted as the probability that a given element of the database has a real genetic relation with at least one of the remaining elements. We tested these two new methods by comparing their results to both each other and the k-medoids algorithm. By means of these new algorithms, we determined 10 clusters of populations based on the similarity of haplogroup composition. The results obtained represented a genetically, geographically and historically well-interpretable picture of 10 genetic clusters of populations mirroring the early spread of populations from the Fertile Crescent to the Caucasus, Central Asia, Arabia and Southeast Europe. The results show that a parallel clustering of populations using SOC and MRP methods can be an efficient tool for studying the demographic history of populations sharing common genetic footprints.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Chromosomes, Human, Y / genetics*
  • Cluster Analysis
  • Europe
  • Genetics, Population
  • Haplotypes
  • Human Migration
  • Humans
  • Male
  • Models, Genetic*