Internal-external cross-validation helped to evaluate the generalizability of prediction models in large clustered datasets

Toshihiko Takada; Steven Nijman; Spiros Denaxas; Kym I E Snell; Alicia Uijl; Tri-Long Nguyen; Folkert W Asselbergs; Thomas P A Debray

doi:10.1016/j.jclinepi.2021.03.025

Internal-external cross-validation helped to evaluate the generalizability of prediction models in large clustered datasets

J Clin Epidemiol. 2021 Sep:137:83-91. doi: 10.1016/j.jclinepi.2021.03.025. Epub 2021 Apr 6.

Authors

Toshihiko Takada¹, Steven Nijman¹, Spiros Denaxas², Kym I E Snell³, Alicia Uijl⁴, Tri-Long Nguyen⁵, Folkert W Asselbergs⁶, Thomas P A Debray⁷

Affiliations

¹ Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
² Health Data Research UK and Institute of Health Informatics, University College London, Gibbs Building, 215 Euston Road, London, NW1 2BE, United Kingdom; The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, United Kingdom; The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, Suite A, 1(st) floor, Maple House, 149 Tottenham Court Road, London, W1T 7DN, United Kingdom; British Heart Foundation Research Accelerator, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
³ Centre for Prognosis Research, School of Medicine, Keele University, Keele, Staffordshire, ST5 5BG, United Kingdom.
⁴ Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands; Division of Cardiology, Department of Medicine, Karolinska Institute, 171 77 Stockholm, Sweden; Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Utrecht University, Heidelberglaan 100, PO Box 85500, 3508GA, Utrecht, The Netherlands.
⁵ Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands; Section of Epidemiology, Department of Public Health, University of Copenhagen, CSS, Øster Farimagsgade 5, DK-1353 Copenhagen K, Denmark.
⁶ Health Data Research UK and Institute of Health Informatics, University College London, Gibbs Building, 215 Euston Road, London, NW1 2BE, United Kingdom; Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Utrecht University, Heidelberglaan 100, PO Box 85500, 3508GA, Utrecht, The Netherlands; Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
⁷ Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands; Health Data Research UK and Institute of Health Informatics, University College London, Gibbs Building, 215 Euston Road, London, NW1 2BE, United Kingdom. Electronic address: [email protected].

PMID: 33836256
DOI: 10.1016/j.jclinepi.2021.03.025

Abstract

Objective: To illustrate how to evaluate the need of complex strategies for developing generalizable prediction models in large clustered datasets.

Study design and setting: We developed eight Cox regression models to estimate the risk of heart failure using a large population-level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models' generalizability across the included general practices.

Results: Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke.

Conclusion: In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies.

Keywords: Calibration; Discrimination; Heterogeneity; Model comparison; Prediction model; Validation.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Cluster Analysis*
Datasets as Topic / statistics & numerical data*
Forecasting*
Humans
Models, Statistical*

Abstract

Publication types

MeSH terms

Grants and funding