Highly scalable genotype phasing by entropy minimization

Bogdan Pasaniuc; Ion Mandoiu

doi:10.1109/IEMBS.2006.259355

Highly scalable genotype phasing by entropy minimization

Conf Proc IEEE Eng Med Biol Soc. 2006:2006:3482-6. doi: 10.1109/IEMBS.2006.259355.

Authors

Bogdan Pasaniuc¹, Ion Mandoiu

Affiliation

¹ Dept. of Comput. Sci. & Eng., Connecticut Univ., Storrs, CT 06269-2155, USA. [email protected]

PMID: 17946566
DOI: 10.1109/IEMBS.2006.259355

Abstract

A Single Nucleotide Polymorphism (SNP) is a position in the genome at which two or more of the possible four nucleotides occur in a large percentage of the population. SNPs account for most of the genetic variability between individuals, and mapping SNPs in the human population has become the next high-priority in genomics after the completion of the Human Genome project. In diploid organisms such as humans, there are two non-identical copies of each autosomal chromosome. A description of the SNPs in a chromosome is called a haplotype. At present, it is prohibitively expensive to directly determine the haplotypes of an individual, but it is possible to obtain rather easily the conflated SNP information in the so called genotype. Computational methods for genotype phasing, i.e., inferring haplotypes from genotype data, have received much attention in recent years as haplotype information leads to increased statistical power of disease association tests. However, existing algorithms have impractical running time for phasing large genotype datasets such as those generated by the international HapMap project. In this paper we propose a highly scalable algorithm based on entropy minimization. Our algorithm is capable of phasing genotype data coming from either unrelated individuals or families consisting of a child and one or both parents. Experimental results show that our algorithm achieves a phasing accuracy close to that of best existing methods while being several orders of magnitude faster.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Adult
Algorithms*
Biomedical Engineering
Child
Databases, Genetic
Female
Genotype*
Haplotypes
Humans
Male
Polymorphism, Single Nucleotide*