Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet

Genome Biol. 2023 Dec 4;24(1):277. doi: 10.1186/s13059-023-03116-3.

Abstract

Background: Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology).

Results: The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study.

Conclusions: In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking*
  • East Asian People* / genetics
  • Genomics
  • Haplotypes
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA
  • Twin Studies as Topic
  • Twins, Monozygotic* / genetics