Enhancing breakpoint resolution with deep segmentation model: A general refinement method for read-depth based structural variant callers

PLoS Comput Biol. 2021 Oct 11;17(10):e1009186. doi: 10.1371/journal.pcbi.1009186. eCollection 2021 Oct.

Abstract

Read-depths (RDs) are frequently used in identifying structural variants (SVs) from sequencing data. For existing RD-based SV callers, it is difficult for them to determine breakpoints in single-nucleotide resolution due to the noisiness of RD data and the bin-based calculation. In this paper, we propose to use the deep segmentation model UNet to learn base-wise RD patterns surrounding breakpoints of known SVs. We integrate model predictions with an RD-based SV caller to enhance breakpoints in single-nucleotide resolution. We show that UNet can be trained with a small amount of data and can be applied both in-sample and cross-sample. An enhancement pipeline named RDBKE significantly increases the number of SVs with more precise breakpoints on simulated and real data. The source code of RDBKE is freely available at https://github.com/yaozhong/deepIntraSV.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Genome, Human / genetics
  • Genomic Structural Variation / genetics*
  • Genomics
  • Humans
  • Models, Genetic*
  • Whole Genome Sequencing / methods*

Grants and funding

Y.Z., S.I., S.M. and R.Y. are supported by the project of conquering cancer through neo-dimensional systems understanding (15H05907), Grant-in-Aid for Scientific Research on Innovative Areas from the Ministry of Education, Culture, Sports, Science and Technology, Japan. S.I and R.Y are also supported by the Project for Cancer Research and Therapeutic Evolution (P-CREATE) from Japan Agency for Medical Research and Development (AMED) (18cm0106535h0001). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.