EpiBrCan-Lite: A lightweight deep learning model for breast cancer subtype classification using epigenomic data

Punam Bedi; Surbhi Rani; Bhavna Gupta; Veenu Bhasin; Pushkar Gole

doi:10.1016/j.cmpb.2024.108553

EpiBrCan-Lite: A lightweight deep learning model for breast cancer subtype classification using epigenomic data

Comput Methods Programs Biomed. 2024 Dec 4:260:108553. doi: 10.1016/j.cmpb.2024.108553. Online ahead of print.

Authors

Punam Bedi¹, Surbhi Rani², Bhavna Gupta³, Veenu Bhasin⁴, Pushkar Gole⁵

Affiliations

¹ Department of Computer Science, University of Delhi, Delhi, India. Electronic address: [email protected].
² Department of Computer Science, University of Delhi, Delhi, India. Electronic address: [email protected].
³ Keshav Mahavidyalaya, University of Delhi, New Delhi, India. Electronic address: [email protected].
⁴ PGDAV College, University of Delhi, New Delhi, India. Electronic address: [email protected].
⁵ Department of Computer Science, University of Delhi, Delhi, India. Electronic address: [email protected].

PMID: 39667144
DOI: 10.1016/j.cmpb.2024.108553

Abstract

Background and objectives: Early breast cancer subtypes classification improves the survival rate as it facilitates prognosis of the patient. In literature this problem was prominently solved by various Machine Learning and Deep Learning techniques. However, these studies have three major shortcomings: huge Trainable Weight Parameters (TWP), suffer from low performance and class imbalance problem.

Methods: This paper proposes a lightweight model named EpiBrCan-Lite for classifying breast cancer subtypes using DNA methylation data. This model encompasses three blocks namely Data Encoding, TransGRU, and Classification blocks. In Data Encoding block, the input features are encoded into equal sized chunks and then passed down to TransGRU block which is a modified version of traditional Transformer Encoder (TE). In TransGRU block, MLP module of traditional TE is replaced by GRU module, consisting of two GRU layers to reduce TWP and capture the long-range dependencies of input feature data. Furthermore, output of TransGRU block is passed to Classification block for classifying breast cancer into their subtypes.

Results: The proposed model is validated using Accuracy, Precision, Recall, F1-score, FPR, and FNR metrics on TCGA breast cancer dataset. This dataset suffers from the class imbalance problem which is mitigated using Synthetic Minority Oversampling Technique (SMOTE). Experimentation results demonstrate that EpiBrCan-Lite model attained 95.85 % accuracy, 95.96 % recall, 95.85 % precision, 95.90 % F1-score, 1.03 % FPR, and 4.12 % FNR despite of utilizing only 1/1500 of TWP than other state-of-the-art models.

Conclusion: EpiBrCan-Lite model is efficiently classifying breast cancer subtypes, and being lightweight, it is suitable to be deployed on low computational powered devices.

Keywords: Breast cancer disease; DNA methylation data; Epigenomic; Gated recurrent unit; SMOTE; Transformer encoder.