Haplotype inference from short sequence reads using a population genealogical history model

Jin Zhang; Yufeng Wu

doi:10.1142/9789814335058_0030

Haplotype inference from short sequence reads using a population genealogical history model

Pac Symp Biocomput. 2011:288-99. doi: 10.1142/9789814335058_0030.

Authors

Jin Zhang¹, Yufeng Wu

Affiliation

¹ Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA. [email protected]

PMID: 21121056
DOI: 10.1142/9789814335058_0030

Abstract

High-throughput sequencing is currently a major transforming technology in biology. In this paper, we study a population genomics problem motivated by the newly available short reads data from high-throughput sequencing. In this problem, we are given short reads collected from individuals in a population. The objective is to infer haplotypes with the given reads. We first formulate the computational problem of haplotype inference with short reads. Based on a simple probabilistic model on short reads, we present a new approach of inferring haplotypes directly from given reads (i.e. without first calling genotypes). Our method is finding the most likely haplotypes whose local genealogical history can be approximately modeled as a perfect phylogeny. We show that the optimal haplotypes under this objective can be found for many data using integer linear programming for modest sized data when there is no recombination. We then develop a related heuristic method which can work with larger data, and also allows recombination. Simulation shows that the performance of our method is competitive against alternative approaches.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Computational Biology
Genealogy and Heraldry*
Genetics, Population / statistics & numerical data*
Haplotypes*
High-Throughput Nucleotide Sequencing / statistics & numerical data
Humans
Models, Genetic
Polymorphism, Single Nucleotide
Recombination, Genetic
Software