The distinguishable subregions that compose the hippocampus are differently involved in functions associated with Alzheimer's disease (AD). Thus, the identification of hippocampal subregions and genes that classify AD and healthy control (HC) groups with high accuracy is meaningful. In this study, by jointly analyzing the multimodal data, we propose a novel method to construct fusion features and a classification method based on the random forest for identifying the important features. Specifically, we construct the fusion features using the gene sequence and subregions correlation to reduce the diversity in same group. Moreover, samples and features are selected randomly to construct a random forest, and genetic algorithm and clustering evolutionary are used to amplify the difference in initial decision trees and evolve the trees. The features in resulting decision trees that reach the peak classification are the important "subregion gene pairs". The findings verify that our method outperforms well in classification performance and generalization. Particularly, we identified some significant subregions and genes, such as hippocampus amygdala transition area (HATA), fimbria, parasubiculum and genes included RYR3 and PRKCE. These discoveries provide some new candidate genes for AD and demonstrate the contribution of hippocampal subregions and genes to AD.
Keywords: clustering evolution; genetic algorithm; hippocampus; random forest; subregion.