Complete genome phasing of family quartet by combination of genetic, physical and population-based phasing analysis

PLoS One. 2013 May 31;8(5):e64571. doi: 10.1371/journal.pone.0064571. Print 2013.

Abstract

Phased genome maps are important to understand genetic and epigenetic regulation and disease mechanisms, particularly parental imprinting defects. Phasing is also critical to assess the functional consequences of genetic variants, and to allow precise definition of haplotype blocks which is useful to understand gene-flow and genotype-phenotype association at the population level. Transmission phasing by analysis of a family quartet allows the phasing of 95% of all variants as the uniformly heterozygous positions cannot be phased. Here, we report a phasing method based on a combination of transmission analysis, physical phasing by pair-end sequencing of libraries of staggered sizes and population-based analysis. Sequencing of a healthy Caucasians quartet at 120x coverage and combination of physical and transmission phasing yielded the phased genotypes of about 99.8% of the SNPs, indels and structural variants present in the quartet, a phasing rate significantly higher than what can be achieved using any single phasing method. A false positive SNP error rate below 10*E-7 per genome and per base was obtained using a combination of filters. We provide a complete list of SNPs, indels and structural variants, an analysis of haplotype block sizes, and an analysis of the false positive and negative variant calling error rates. Improved genome phasing and family sequencing will increase the power of genome-wide sequencing as a clinical diagnosis tool and has myriad basic science applications.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromosome Mapping / methods*
  • Chromosome Mapping / statistics & numerical data
  • Family
  • Genome, Human*
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / statistics & numerical data
  • Haplotypes
  • High-Throughput Nucleotide Sequencing
  • Humans
  • INDEL Mutation
  • Inheritance Patterns*
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA / statistics & numerical data*

Grants and funding

All of the authors were supported in part by grants C024405 and C024172 from NYSTEM, the funding agency of New York State Empire Stem Cell Board. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding received for this study.