A self-knowledge distillation-driven CNN-LSTM model for predicting disease outcomes using longitudinal microbiome data

Bioinform Adv. 2023 May 18;3(1):vbad059. doi: 10.1093/bioadv/vbad059. eCollection 2023.

Abstract

Motivation: Human microbiome is complex and highly dynamic in nature. Dynamic patterns of the microbiome can capture more information than single point inference as it contains the temporal changes information. However, dynamic information of the human microbiome can be hard to be captured due to the complexity of obtaining the longitudinal data with a large volume of missing data that in conjunction with heterogeneity may provide a challenge for the data analysis.

Results: We propose using an efficient hybrid deep learning architecture convolutional neural network-long short-term memory, which combines with self-knowledge distillation to create highly accurate models to analyze the longitudinal microbiome profiles to predict disease outcomes. Using our proposed models, we analyzed the datasets from Predicting Response to Standardized Pediatric Colitis Therapy (PROTECT) study and DIABIMMUNE study. We showed the significant improvement in the area under the receiver operating characteristic curve scores, achieving 0.889 and 0.798 on PROTECT study and DIABIMMUNE study, respectively, compared with state-of-the-art temporal deep learning models. Our findings provide an effective artificial intelligence-based tool to predict disease outcomes using longitudinal microbiome profiles from collected patients.

Availability and implementation: The data and source code can be accessed at https://github.com/darylfung96/UC-disease-TL.