Enhancing Cancerous Gene Selection and Classification for High-Dimensional Microarray Data Using a Novel Hybrid Filter and Differential Evolutionary Feature Selection

Arshad Hashmi; Waleed Ali; Anas Abulfaraj; Faisal Binzagr; Entisar Alkayal

doi:10.3390/cancers16233913

Enhancing Cancerous Gene Selection and Classification for High-Dimensional Microarray Data Using a Novel Hybrid Filter and Differential Evolutionary Feature Selection

Cancers (Basel). 2024 Nov 22;16(23):3913. doi: 10.3390/cancers16233913.

Authors

Arshad Hashmi¹, Waleed Ali², Anas Abulfaraj¹, Faisal Binzagr³, Entisar Alkayal²

Affiliations

¹ Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia.
² Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia.
³ Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia.

Abstract

Background: In recent years, microarray datasets have been used to store information about human genes and methods used to express the genes in order to successfully diagnose cancer disease in the early stages. However, most of the microarray datasets typically contain thousands of redundant, irrelevant, and noisy genes, which raises a great challenge for effectively applying the machine learning algorithms to these high-dimensional microarray datasets.

Methods: To address this challenge, this paper introduces a proposed hybrid filter and differential evolution-based feature selection to choose only the most influential genes or features of high-dimensional microarray datasets to improve cancer diagnoses and classification. The proposed approach is a two-phase hybrid feature selection model constructed using selecting the top-ranked features by some popular filter feature selection methods and then further identifying the most optimal features conducted by differential evolution (DE) optimization. Accordingly, some popular machine learning algorithms are trained using the final training microarray datasets with only the best features in order to produce outstanding cancer classification results. Four high-dimensional cancerous microarray datasets were used in this study to evaluate the proposed method, which are Breast, Lung, Central Nervous System (CNS), and Brain cancer datasets.

Results: The experimental results demonstrate that the classification accuracy results achieved by the proposed hybrid filter-DE over filter methods increased to 100%, 100%, 93%, and 98% on Brain, CNS, Breast and Lung, respectively. Furthermore, applying the suggested DE-based feature selection contributed to removing around 50% of the features selected by using the filter methods for these four cancerous microarray datasets. The average improvement percentages of accuracy achieved by the proposed methods were up to 42.47%, 57.45%, 16.28% and 43.57% compared to the previous works that are 41.43%, 53.66%, 17.53%, 61.70% on Brain, CNS, Lung and Breast datasets, respectively.

Conclusions: Compared to the previous works, the proposed methods accomplished better improvement percentages on Brain and CNS datasets, comparable improvement percentages on Lung dataset, and less improvement percentages on Breast dataset.

Keywords: brain cancer; breast cancer; cancer classification; central nervous system cancer; differential evolutionary algorithm; filter feature selection; gene selection; high-dimensional microarray dataset; lung cancer.

Grants and funding

IFPIP:645-830-1443/This research work was funded by the Institutional Fund Projects under grant no. (IFPIP:645-830-1443). The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah,