Aims: There is limited data on the prevalence and risk factors of colonic adenoma from the Indian sub-continent. We aimed at developing a machine-learning model to optimize colonic adenoma detection in a prospective cohort.
Methods: All consecutive adult patients undergoing diagnostic colonoscopy were enrolled between October 2020 and November 2022. Patients with a high risk of colonic adenoma were excluded. The predictive model was developed using the gradient-boosting machine (GBM)-learning method. The GBM model was optimized further by adjusting the learning rate and the number of trees and 10-fold cross-validation.
Results: Total 10,320 patients (mean age 45.18 ± 14.82 years; 69% men) were included in the study. In the overall population, 1152 (11.2%) patients had at least one adenoma. In patients with age > 50 years, hospital-based adenoma prevalence was 19.5% (808/4144). The area under the receiver operating curve (AUC) (SD) of the logistic regression model was 72.55% (4.91), while the AUCs for deep learning, decision tree, random forest and gradient-boosted tree model were 76.25% (4.22%), 65.95% (4.01%), 79.38% (4.91%) and 84.76% (2.86%), respectively. After model optimization and cross-validation, the AUC of the gradient-boosted tree model has increased to 92.2% (1.1%).
Conclusions: Machine-learning models may predict colorectal adenoma more accurately than logistic regression. A machine-learning model may help optimize the use of colonoscopy to prevent colorectal cancers.
Trial registration: ClinicalTrials.gov (ID: NCT04512729).
Keywords: Colonic adenoma; Colonoscopy; Colorectal cancer; Metabolic syndrome.
© 2024. Indian Society of Gastroenterology.