A comprehensive evaluation framework for benchmarking multi-objective feature selection in omics-based biomarker discovery

Luca Cattelani; Arindam Ghosh; Teemu Rintala; Vittorio Fortino

doi:10.1109/TCBB.2024.3480150

A comprehensive evaluation framework for benchmarking multi-objective feature selection in omics-based biomarker discovery

IEEE/ACM Trans Comput Biol Bioinform. 2024 Oct 14:PP. doi: 10.1109/TCBB.2024.3480150. Online ahead of print.

Authors

Luca Cattelani, Arindam Ghosh, Teemu Rintala, Vittorio Fortino

PMID: 39401114
DOI: 10.1109/TCBB.2024.3480150

Abstract

Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.