Background: DNA microarrays provide informative data for transcriptional profiling and identifying gene expression signatures to help prevent progression of latent tuberculosis infection (LTBI) to active disease. However, constructing a prognostic model for distinguishing LTBI from active tuberculosis (ATB) is very challenging due to the noisy nature of data and lack of a generally stable analysis approach.
Methods: In the present study, we proposed an accurate predictive model with the help of data fusion at the decision level. In this regard, results of filter feature selection and wrapper feature selection techniques were combined with multiple-criteria decision-making (MCDM) methods to select 10 genes from six microarray datasets that can be the most discriminative genes for diagnosing tuberculosis cases. As the main contribution of this study, the final ranking function was constructed by combining protein-protein interaction (PPI) network with an MCDM method (called Decision-making Trial and Evaluation Laboratory or DEMATEL) to improve the feature ranking approach.
Results: By applying data fusion at the decision level on the 10 introduced genes in terms of fusion of classifiers of random forests (RF) and k-nearest neighbors (KNN) regarding Yager's theory, the proposed algorithm reached a sensitivity of 0.97, specificity of 0.90, and accuracy of 0.95. Finally, with the help of cumulative clustering, the genes involved in the diagnosis of latent and activated tuberculosis have been introduced.
Conclusions: The combination of MCDM methods and PPI networks can significantly improve the diagnosis different states of tuberculosis.
Clinical trial number: Not applicable.
Keywords: Data fusion; Latent tuberculosis infection diagnosis; Multiple-criteria decision-making; Protein-protein interaction.
© 2024. The Author(s).