Cancer is globally a leading cause of death that would benefit from diagnostic approaches detecting it in its early stages. However, despite much research and investment, cancer early diagnosis is still underdeveloped. Owing to its high sensitivity, surface-enhanced Raman spectroscopy (SERS)-based detection of biomarkers has attracted growing interest in this area. Oligonucleotides are an important type of genetic biomarkers as their alterations can be linked to the disease prior to symptom onset. We propose a machine-learning (ML)-enabled framework to analyze complex direct SERS spectra of short, single-stranded DNA and RNA targets to identify relevant mutations occurring in genetic biomarkers, which are key disease indicators. First, by employing ad hoc-synthesized colloidal silver nanoparticles as SERS substrates, we analyze single-base mutations in ssDNA and RNA sequences using a direct SERS-sensing approach. Then, an ML-based hypothesis test is proposed to identify these changes and differentiate the mutated sequences from the corresponding native ones. Rooted in "functional data analysis," this ML approach fully leverages the rich information and dependencies within SERS spectral data for improved modeling and detection capability. Tested on a large set of DNA and RNA SERS data, including from miR-21 (a known cancer miRNA biomarker), our approach is shown to accurately differentiate SERS spectra obtained from different oligonucleotides, outperforming various data-driven methods across several performance metrics, including accuracy, sensitivity, specificity, and F1-scores. Hence, this work represents a step forward in the development of the combined use of SERS and ML as effective methods for disease diagnosis with real applicability in the clinic.
Keywords: direct SERS; disease diagnostics; functional data analysis; genetic biomarkers; machine learning.