A statistical approach to detection of copy number variations in PCR-enriched targeted sequencing data

BMC Bioinformatics. 2016 Oct 22;17(1):429. doi: 10.1186/s12859-016-1272-6.

Abstract

Background: Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool.

Results: We have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm's sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance.

Conclusion: We showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.

Keywords: Germline CNV; MPS; Machine learning; Multiplex PCR; Targeted amplification.

MeSH terms

  • Algorithms*
  • DNA Copy Number Variations / genetics*
  • Data Interpretation, Statistical*
  • Genetic Variation / genetics*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Polymerase Chain Reaction / methods*
  • Sequence Analysis, DNA / methods