Sequencing artifacts derived from a library preparation method using enzymatic fragmentation

PLoS One. 2020 Jan 3;15(1):e0227427. doi: 10.1371/journal.pone.0227427. eCollection 2020.

Abstract

DNA fragmentation is a fundamental step during library preparation in hybridization capture-based, short-read sequencing. Ultra-sonication has been used thus far to prepare DNA of an appropriate size, but this method is associated with a considerable loss of DNA sample. More recently, studies have employed library preparation methods that rely on enzymatic fragmentation with DNA endonucleases to minimize DNA loss, particularly in nano-quantity samples. Yet, despite their wide use, the effect of enzymatic fragmentation on the resultant sequences has not been carefully assessed. Here, we used pairwise comparisons of somatic variants of the same tumor DNA samples prepared using ultrasonic and enzymatic fragmentation methods. Our analysis revealed a substantially larger number of recurrent artifactual SNVs/indels in endonuclease-treated libraries as compared with those created through ultrasonication. These artifacts were marked by palindromic structure in the genomic context, positional bias in sequenced reads, and multi-nucleotide substitutions. Taking advantage of these distinctive features, we developed a filtering algorithm to distinguish genuine somatic mutations from artifactual noise with high specificity and sensitivity. Noise cancelling recovered the composition of the mutational signatures in the tumor samples. Thus, we provide an informatics algorithm as a solution to the sequencing errors produced as a consequence of endonuclease-mediated fragmentation, highlighted for the first time in this study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artifacts*
  • DNA Fragmentation*
  • Gene Library*
  • Sequence Analysis, DNA*

Grants and funding

MEXT | Japan Society for the Promotion of Science (JSPS) - JP15K06861 [S. M. ]; MEXT | Japan Society for the Promotion of Science (JSPS) - JP17K18337 [O. G. ]; MEXT | Japan Society for the Promotion of Science (JSPS) - JP18K07338 [S. M. ]. This work was supported by JSPS KAKENHI Grant Numbers JP17K18337, JP15K06861, and JP18K07338. Data4C’s co. ltd. was neither funding nor funded by any governmental research funding agency. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Whereas Data4C’s co. ltd. provided support in the form of salaries for authors A.T. and T.H., Japanese Foundation for Cancer Research paid to Data4C’s co. ltd. as a business consignment for their assisting analytic parts of this work.