Circulating tumor DNA detection using next-generation sequencing (NGS) data of plasma DNA is promising for cancer identification and characterization. However, the tumor signal in the blood is often low and difficult to distinguish from errors. We present DREAMS (Deep Read-level Modelling of Sequencing-errors) for estimating error rates of individual read positions. Using DREAMS, we develop statistical methods for variant calling (DREAMS-vc) and cancer detection (DREAMS-cc). For evaluation, we generate deep targeted NGS data of matching tumor and plasma DNA from 85 colorectal cancer patients. The DREAMS approach performs better than state-of-the-art methods for variant calling and cancer detection.
Keywords: Cancer research; Circulating tumor DNA; Colorectal cancer; Machine learning; Next-generation sequencing.
© 2023. The Author(s).