Purpose: To develop and validate machine learning (ML) models for predicting cycloplegic refractive error and myopia status using noncycloplegic refractive error and biometric data.
Methods: Cross-sectional study of children aged five to 18 years who underwent biometry and autorefraction before and after cycloplegia. Myopia was defined as cycloplegic spherical equivalent refraction (SER) ≤-0.5 Diopter (D). Models were evaluated for predicting SER using R2 and mean absolute error (MAE) and myopia status using area under the receiver operating characteristic (ROC) curve (AUC). Best-performing models were further evaluated using sensitivity/specificity and comparison of observed versus predicted myopia prevalence rate overall and in each age group. Independent data sets were used for training (n = 1938) and validation (n = 1476).
Results: In the validation dataset, ML models predicted cycloplegic SER with high R2 (0.913-0.935) and low MAE (0.393-0.480 D). The AUC for predicting myopia was high (0.984-0.987). The best-performing model for SER (XGBoost) had high sensitivity and specificity (91.1% and 97.2%). Random forest (RF), the best-performing model for myopia, had high sensitivity and specificity (92.2% and 96.9%). Within each age group, difference between predicted and actual myopia prevalence was within 4%.
Conclusions: Using noncycloplegic refractive error and ocular biometric data, ML models performed well for predicting cycloplegic SER and myopia status. When measuring cycloplegic SER is not feasible, ML may provide a useful tool for estimating cycloplegic SER and myopia prevalence rate in epidemiological studies.
Translational relevance: Using ML to predict cycloplegic refraction based on noncycloplegic data is a powerful tool for large, population-based studies of refractive error.