Background and objectives: Early breast cancer subtypes classification improves the survival rate as it facilitates prognosis of the patient. In literature this problem was prominently solved by various Machine Learning and Deep Learning techniques. However, these studies have three major shortcomings: huge Trainable Weight Parameters (TWP), suffer from low performance and class imbalance problem.
Methods: This paper proposes a lightweight model named EpiBrCan-Lite for classifying breast cancer subtypes using DNA methylation data. This model encompasses three blocks namely Data Encoding, TransGRU, and Classification blocks. In Data Encoding block, the input features are encoded into equal sized chunks and then passed down to TransGRU block which is a modified version of traditional Transformer Encoder (TE). In TransGRU block, MLP module of traditional TE is replaced by GRU module, consisting of two GRU layers to reduce TWP and capture the long-range dependencies of input feature data. Furthermore, output of TransGRU block is passed to Classification block for classifying breast cancer into their subtypes.
Results: The proposed model is validated using Accuracy, Precision, Recall, F1-score, FPR, and FNR metrics on TCGA breast cancer dataset. This dataset suffers from the class imbalance problem which is mitigated using Synthetic Minority Oversampling Technique (SMOTE). Experimentation results demonstrate that EpiBrCan-Lite model attained 95.85 % accuracy, 95.96 % recall, 95.85 % precision, 95.90 % F1-score, 1.03 % FPR, and 4.12 % FNR despite of utilizing only 1/1500 of TWP than other state-of-the-art models.
Conclusion: EpiBrCan-Lite model is efficiently classifying breast cancer subtypes, and being lightweight, it is suitable to be deployed on low computational powered devices.
Keywords: Breast cancer disease; DNA methylation data; Epigenomic; Gated recurrent unit; SMOTE; Transformer encoder.
Copyright © 2024. Published by Elsevier B.V.