Predicting Pancreatic Cancer in New-Onset Diabetes Cohort Using a Novel Model With Integrated Clinical and Genetic Indicators: A Large-Scale Prospective Cohort Study

Cancer Med. 2024 Nov;13(21):e70388. doi: 10.1002/cam4.70388.

Abstract

Introduction: Individuals who develop new-onset diabetes have been identified as a high-risk cohort for pancreatic cancer (PC), exhibiting an incidence rate nearly 8 times higher than the general population. Hence, the targeted screening of this specific cohort presents a promising opportunity for early pancreatic cancer detection. We aimed to develop and validate a novel model capable of identifying high-risk individuals among those with new-onset diabetes.

Methods: Employing the UK Biobank cohort, we focused on those developing new-onset diabetes during follow-up. Genetic and clinical characteristics available at registration were considered as candidate predictors. We conducted univariate regression analysis to identify potential indicators and used a 5-fold cross-validation method to select optimal predictors for model development. Five machine learning algorithms were used for model development.

Results: Among 12,735 patients with new-onset diabetes, 100 (0.8%) were diagnosed with PC within 2 years. The final model (area under the curve, 0.897; 95% confidence interval, 0.865-0.929) included 5 clinical predictors and 24 single nucleotide polymorphisms. Two threshold cut-offs were established: 1.28% and 5.26%. The recommended 1.28% cut-off, based on model performance, reduces definitive testing to 13% of the total population while capturing 76% of PC cases. The high-risk threshold is 5.26%. Utilizing this threshold, only 2% of the population needs definitive testing, capturing nearly half of PC cases.

Conclusions: We, for the first time, combined clinical and genetic data to develop and validate a model to determine the risk of pancreatic cancer in patients with new-onset diabetes using machine learning algorithms. By reducing the number of unnecessary tests while ensuring that a substantial proportion of high-risk patients are identified, this tool has the potential to improve patient outcomes and optimize healthcare sources.

Keywords: early detection; machine learning; pancreatic cancer; single nucleotide polymorphism.

MeSH terms

  • Adult
  • Aged
  • Diabetes Mellitus / diagnosis
  • Diabetes Mellitus / epidemiology
  • Diabetes Mellitus / genetics
  • Early Detection of Cancer / methods
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Pancreatic Neoplasms* / diagnosis
  • Pancreatic Neoplasms* / epidemiology
  • Pancreatic Neoplasms* / genetics
  • Polymorphism, Single Nucleotide
  • Prospective Studies
  • Risk Assessment / methods
  • Risk Factors
  • United Kingdom / epidemiology