Fractal feature selection model for enhancing high-dimensional biological problems

BMC Bioinformatics. 2024 Jan 9;25(1):12. doi: 10.1186/s12859-023-05619-z.

Abstract

The integration of biology, computer science, and statistics has given rise to the interdisciplinary field of bioinformatics, which aims to decode biological intricacies. It produces extensive and diverse features, presenting an enormous challenge in classifying bioinformatic problems. Therefore, an intelligent bioinformatics classification system must select the most relevant features to enhance machine learning performance. This paper proposes a feature selection model based on the fractal concept to improve the performance of intelligent systems in classifying high-dimensional biological problems. The proposed fractal feature selection (FFS) model divides features into blocks, measures the similarity between blocks using root mean square error (RMSE), and determines the importance of features based on low RMSE. The proposed FFS is tested and evaluated over ten high-dimensional bioinformatics datasets. The experiment results showed that the model significantly improved machine learning accuracy. The average accuracy rate was 79% with full features in machine learning algorithms, while FFS delivered promising results with an accuracy rate of 94%.

Keywords: Bioinformatics; Feature selection; Fractal; High-dimensional datasets; Machine learning.

MeSH terms

  • Algorithms*
  • Computational Biology
  • Fractals*
  • Machine Learning