Machine learning-based prediction of Sasang constitution types using comprehensive clinical information and identification of key features for diagnosis

Integr Med Res. 2021 Sep;10(3):100668. doi: 10.1016/j.imr.2020.100668. Epub 2020 Sep 30.

Abstract

Background: Despite the importance of accurate Sasang type diagnosis, a unique form of Korean medicine, there have been concerns about consistency among diagnoses. We investigate a data-driven integrative diagnostic model by applying machine learning to a multicenter clinical dataset with comprehensive features.

Methods: Extremely randomized trees (ERT), support vector machines, multinomial logistic regression, and K-nearest neighbor were applied, and performances were evaluated by cross-validation. The feature importance of the classifier was analyzed to understand which information is crucial in diagnosis.

Results: The ERT classifier showed the highest performance, with an overall f1 score of 0.60 ± 0.060. The feature classes of body measurement, personality, general information, and cold-heat were more decisive than others in classifying Sasang types. Costal angle was the most informative feature. In pairwise classification, we found Sasang type-dependent distinctions that body measurement features played a key role in TE-SE and TE-SY datasets, while personality and cold-heat features showed importance in SE-SY dataset.

Conclusion: Current study investigated a comprehensive diagnostic model for Sasang type using machine learning and achieved better performance than previous studies. This study helps data-driven decision making in clinics by revealing key features contributing to the Sasang type diagnosis.

Keywords: Diagnostic model; Extremely randomized trees; Feature importance; Machine learning; Sasang constitutional medicine.