Background: Autoencoder (AE) is one of the deep learning techniques that uses an artificial neural network to reconstruct its input data in the output layer. We constructed a novel supervised AE model and tested its performance in the prediction of a co-existence of the disease of interest only using diagnostic codes.
Methods: Diagnostic codes of one million randomly sampled patients listed in the Korean National Health Information Database in 2019 were used to train, validate, and test the prediction model. The first used AE solely for a feature engineering tool for an input of a classifier. Supervised Multi-Layer Perceptron (sMLP) was added to train a classifier to predict a binary level with latent representation as an input (AE + sMLP). The second model simultaneously updated the parameters in the AE and the connected MLP classifier during the learning process (End-to-End Supervised AE [EEsAE]). We tested the performances of these two models against baseline models, eXtreme Gradient Boosting (XGB) and naïve Bayes, in the prediction of co-existing gastric cancer diagnosis.
Results: The proposed EEsAE model yielded the highest F1-score and highest area under the curve (0.86). The EEsAE and AE + sMLP gave the highest recalls. XGB yielded the highest precision. Ablation study revealed that iron deficiency anemia, gastroesophageal reflux disease, essential hypertension, gastric ulcers, benign prostate hyperplasia, and shoulder lesion were the top 6 most influential diagnoses on performance.
Conclusion: A novel EEsAE model showed promising performance in the prediction of a disease of interest.
Keywords: Deep Learning; Diagnosis Code; Gastric Cancer; Machine Learning; Prediction.
© 2023 The Korean Academy of Medical Sciences.