Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning

Renjie Tan; Yufeng Shen

doi:10.1093/nar/gkac788

Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning

Nucleic Acids Res. 2022 Nov 28;50(21):e123. doi: 10.1093/nar/gkac788.

Authors

Renjie Tan¹, Yufeng Shen^{1

2

3}

Affiliations

¹ Department of Systems Biology, Columbia University, New York, NY 10032, USA.
² Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
³ JP Sulzberger Columbia Genome Center, Columbia University, New York, NY 10032, USA.

Abstract

Exome sequencing is widely used in genetic studies of human diseases and clinical genetic diagnosis. Accurate detection of copy number variants (CNVs) is important to fully utilize exome sequencing data. However, exome data are noisy. None of the existing methods alone can achieve both high precision and recall rate. A common practice is to perform heuristic filtration followed by manual inspection of read depth of putative CNVs. This approach does not scale in large studies. To address this issue, we developed a transfer learning method, CNV-espresso, for in silico confirming rare CNVs from exome sequencing data. CNV-espresso encodes candidate CNVs from exome data as images and uses pretrained convolutional neural network models to classify copy number states. We trained CNV-espresso using an offspring-parents trio exome sequencing dataset, with inherited CNVs as positives and CNVs with Mendelian errors as negatives. We evaluated the performance using additional samples that have both exome and whole-genome sequencing (WGS) data. Assuming the CNVs detected from WGS data as a proxy of ground truth, CNV-espresso significantly improves precision while keeping recall almost intact, especially for CNVs that span a small number of exons. CNV-espresso can effectively replace manual inspection of CNVs in large-scale exome sequencing studies.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
DNA Copy Number Variations*
Exome Sequencing
Exome* / genetics
High-Throughput Nucleotide Sequencing / methods
Humans
Machine Learning

Abstract

Publication types

MeSH terms

Grants and funding