Next-generation sequencing has enabled the collection of large biological data sets, allowing novel molecular-based classification methods to be developed for increased understanding of disease. miRNAs are small regulatory RNA molecules that can be quantified using next-generation sequencing and are excellent classificatory markers. Herein, a deep cancer classifier (DCC) was adapted to differentiate neoplastic from nonneoplastic samples using comprehensive miRNA expression profiles from 1031 human breast and skin tissue samples. The classifier was fine-tuned and evaluated using 750 neoplastic and 281 nonneoplastic breast and skin tissue samples. Performance of the DCC was compared with two machine-learning classifiers: support vector machine and random forests. In addition, performance of feature extraction through the DCC was also compared with a developed feature selection algorithm, cancer specificity. The DCC had the highest performance of area under the receiver operating curve and high performance in both sensitivity and specificity, unlike machine-learning and feature selection models, which often performed well in one metric compared with the other. In particular, deep learning had noticeable advantages with highly heterogeneous data sets. In addition, our cancer specificity algorithm identified candidate biomarkers for differentiating neoplastic and nonneoplastic tissue samples (eg, miR-144 and miR-375 in breast cancer and miR-375 and miR-451 in skin cancer).
Copyright © 2022 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.