The Explainable Modular Neural Network (XModNN) enables the identification of biomarkers, facilitating the classification of diseases and clinical parameters in transcriptomic datasets. The modules within XModNN represent specific pathways or genes of a functional hierarchy. The incorporation of biological insights into the architectural design reduced the number of parameters. This is further reinforced by the weighted multi-loss progressive training, which enables successful classification with a reduced number of replicates. The combination of this workflow with layer-wise relevance propagation ensures a robust post hoc explanation of the individual module contribution. Two use cases were employed to predict sex and neuroblastoma cell states, demonstrating that XModNN, in contrast to standard statistical approaches, results in a reduced number of candidate biomarkers. Moreover, the architecture enables the training on a limited number of examples, attaining the same performance and robustness as support vector machine and random forests. The integrated pathway relevance analysis improves a standard gene set overrepresentation analysis, which relies solely on gene assignment. Two crucial genes and three pathways were identified for sex classification, while 26 genes and six pathways are highly important to discriminate adrenergic-mesenchymal cell states in neuroblastoma cancer.
Keywords: biomarker detection; explainable AI; modular neural network; next-generation sequencing.