Background: Despite the importance of accurate Sasang type diagnosis, a unique form of Korean medicine, there have been concerns about consistency among diagnoses. We investigate a data-driven integrative diagnostic model by applying machine learning to a multicenter clinical dataset with comprehensive features.
Methods: Extremely randomized trees (ERT), support vector machines, multinomial logistic regression, and K-nearest neighbor were applied, and performances were evaluated by cross-validation. The feature importance of the classifier was analyzed to understand which information is crucial in diagnosis.
Results: The ERT classifier showed the highest performance, with an overall f1 score of 0.60 ± 0.060. The feature classes of body measurement, personality, general information, and cold-heat were more decisive than others in classifying Sasang types. Costal angle was the most informative feature. In pairwise classification, we found Sasang type-dependent distinctions that body measurement features played a key role in TE-SE and TE-SY datasets, while personality and cold-heat features showed importance in SE-SY dataset.
Conclusion: Current study investigated a comprehensive diagnostic model for Sasang type using machine learning and achieved better performance than previous studies. This study helps data-driven decision making in clinics by revealing key features contributing to the Sasang type diagnosis.
Keywords: Diagnostic model; Extremely randomized trees; Feature importance; Machine learning; Sasang constitutional medicine.
© 2021 Korea Institute of Oriental Medicine. Publishing services by Elsevier B.V.