Recently, research on mining microRNA (or miRNA) expression data has received a lot of attention, mainly because of its role in gene regulation. However, such type of data - usually saved in the form of microarrays - are very specific, because they contain only a small number of cases (often less than 100) compared with large number of attributes (equal to several hundreds or even tens of thousand). The small number of cases available during the learning process can cause instability of the newly created classifiers. Secondly, the huge number of attributes imposes the necessity of selecting only a few dominant attributes strongly correlated with the decision. Thus, an application of fundamental machine learning approaches of mining microarray data and its further classification is problematic or even could just fail.Thus, the main goal of our research is to develop the generalized algorithm of mining microarray data (including miRNA data sets), mainly to improve stability and, consequently, accuracy of classification for the newly created learning classifiers. The main concept of the novel approach is based on iteratively inducing many subsequent decision rule sets - called decision rule generations - instead of inducing only a single decision rule set, as it is done routinely. The decision rules have been chosen as the baseline classifiers of the newly developed LEMRG (Learning from Examples Module based on Rule Generations) algorithm mainly because the decision rule-based knowledge representation is easier for humans to comprehend, rather than other learning models. In our research we used a miRNA expression level learning data set describing 11 types of human cancers, while the testing data set contained poorly differentiated cases of only four types of cancers. As expected, our new classifiers - saved in the form of so-called cumulative decision rule sets - had better stability and accuracy of classification than single decision rule sets induced in the traditional manner. Furthermore, the LEMRG was compared with other machine learning models. It was proven that only 3 out of all 16 tested classifiers enabled so effective classification as our newly developed approach. Thus, using our cumulative set of decision rules, all cases of cancer from two selected concepts - colon and ovary - were correctly classified. Furthermore, we showed the role of these selected miRNAs as the potential biomarkers for diagnosis of tumors.A preliminary result of our research on decision rule generations was initially presented at the first International Conference of Digital Medicine and Medical 3D Printing (17-19.06.2016, Nanjing, China).
Keywords: AQ; Cumulative decision rule sets; Data mining; Decision rule generations; GTS; Induction of decision rules; LEM2; LEMRG; MLEM2; MiRNA; MicroRNA.