CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing

Getiria Onsongo; Linda B Baughn; Matthew Bower; Christine Henzler; Matthew Schomaker; Kevin A T Silverstein; Bharat Thyagarajan

doi:10.1016/j.jmoldx.2016.07.001

CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing

J Mol Diagn. 2016 Nov;18(6):872-881. doi: 10.1016/j.jmoldx.2016.07.001. Epub 2016 Sep 3.

Authors

Getiria Onsongo¹, Linda B Baughn², Matthew Bower³, Christine Henzler², Matthew Schomaker⁴, Kevin A T Silverstein¹, Bharat Thyagarajan⁵

Affiliations

¹ Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota.
² Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota.
³ Division of Genetics and Metabolism, University of Minnesota, Minneapolis, Minnesota.
⁴ Molecular Diagnostics Laboratory, M Health, Minneapolis, Minnesota.
⁵ Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota. Electronic address: [email protected].

PMID: 27597741
DOI: 10.1016/j.jmoldx.2016.07.001

Abstract

Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology / methods
DNA Copy Number Variations*
Female
Gene Deletion
Gene Duplication
Genetic Markers
Genetic Testing* / methods
High-Throughput Nucleotide Sequencing* / methods
Humans
Male
Real-Time Polymerase Chain Reaction
Reproducibility of Results
Sensitivity and Specificity

Substances

Genetic Markers