Background: Osteoarthritis (OA) is a common degenerative disease of the joints. Risk factors for OA include non-modifiable factors such as age and sex, as well as modifiable factors like physical activity.
Objectives: this study aimed to construct a soft voting ensemble model to predict OA diagnosis using variables related to individual characteristics and physical activity and identify important variables in constructing the model through permutation importance.
Methods: By using the recursive feature elimination, cross-validated technique, the variables with the best predictive performance were selected among variables, and an ensemble model combining RandomForest, XGBoost, and LightGBM algorithms was constructed. The predictive performance and permutation importance of each variable were evaluated.
Results: The variables selected to construct the model were age, sex, grip strength, and quality of life, and the accuracy of the ensemble model was 0.828. The most important variable in constructing the model was age (0.199), followed by grip strength (0.053), quality of life (0.043), and sex (0.034).
Conclusion: The performance of the model for predicting OA was relatively good. If this model is continuously used and updated, it could be used to predict OA diagnosis, and the predictive performance of the OA model may be further improved.
Keywords: Osteoarthritis; ensemble; machine learning; physical activity; predictive model.
© 2024 John Wiley & Sons Ltd.