Comparative Study of Classification Algorithms for Various DNA Microarray Data

Genes (Basel). 2022 Mar 11;13(3):494. doi: 10.3390/genes13030494.

Abstract

Microarrays are applications of electrical engineering and technology in biology that allow simultaneous measurement of expression of numerous genes, and they can be used to analyze specific diseases. This study undertakes classification analyses of various microarrays to compare the performances of classification algorithms over different data traits. The datasets were classified into test and control groups based on five utilized machine learning methods, including MultiLayer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (KNN), and the resulting accuracies were compared. k-fold cross-validation was used in evaluating the performance and the result was analyzed by comparing the performances of the five machine learning methods. Through the experiments, it was observed that the two tree-based methods, DT and RF, showed similar trends in results and the remaining three methods, MLP, SVM, and DT, showed similar trends. DT and RF generally showed worse performance than other methods except for one dataset. This suggests that, for the effective classification of microarray data, selecting a classification algorithm that is suitable for data traits is crucial to ensure optimum performance.

Keywords: classification; decision tree; k-nearest neighbors; machine learning; microarray; multilayer perceptron; random forest; support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Machine Learning
  • Neural Networks, Computer
  • Oligonucleotide Array Sequence Analysis
  • Support Vector Machine*