Haplotype phasing by multi-assembly of shared haplotypes: phase-dependent interactions between rare variants

Pac Symp Biocomput. 2011:88-99. doi: 10.1142/9789814335058_0010.

Abstract

In this paper we propose algorithmic strategies, Lander-Waterman-like statistical estimates, and genome-wide software for haplotype phasing by multi-assembly of shared haplotypes. Specifically, we consider four types of results which together provide a comprehensive workflow of GWAS data sets: (1) statistics of multi-assembly of shared haplotypes (2) graph theoretic algorithms for haplotype assembly based on conflict graphs of sequencing reads (3) inference of pedigree structure through haplotype sharing via tract finding algorithms and (4) multi-assembly of shared haplotypes of cases, controls, and trios. The input for the workflows that we consider are any of the combination of: (A) genotype data (B) next generation sequencing (NGS) (C) pedigree information. (1) We present Lander-Waterman-like statistics for NGS projects for the multi-assembly of shared haplotypes. Results are presented in Sec. 2. (2) In Sec. 3, we present algorithmic strategies for haplotype assembly using NGS, NGS + genotype data, and NGS + pedigree information. (3) This work builds on algorithms presented in Halldórsson et al. and are part of the same library of tools co-developed for GWAS workflows. (4) Section 3.3.1 contains algorithmic strategies for multi-assembly of GWAS data. We present algorithms for assembling large data sets and for determining and using shared haplotypes to more reliably assemble and phase the data. Workflows 1-4 provide a set of rigorous algorithms which have the potential to identify phase-dependent interactions between rare variants in linkage equilibrium which are associated with cases. They build on our extensive work on haplotype phasing, haplotype assembly, and whole genome assembly comparison.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology
  • Genetic Variation*
  • Genome-Wide Association Study / statistics & numerical data
  • Haplotypes*
  • Humans
  • Polymorphism, Single Nucleotide
  • Software