The use of data from multiple studies or centers for the validation of a clinical test or a multivariable prediction model allows researchers to investigate the test's/model's performance in multiple settings and populations. Recently, meta-analytic techniques have been proposed to summarize discrimination and calibration across study populations. Here, we rather consider performance in terms of net benefit, which is a measure of clinical utility that weighs the benefits of true positive classifications against the harms of false positives. We posit that it is important to examine clinical utility across multiple settings of interest. This requires a suitable meta-analysis method, and we propose a Bayesian trivariate random-effects meta-analysis of sensitivity, specificity, and prevalence. Across a range of chosen harm-to-benefit ratios, this provides a summary measure of net benefit, a prediction interval, and an estimate of the probability that the test/model is clinically useful in a new setting. In addition, the prediction interval and probability of usefulness can be calculated conditional on the known prevalence in a new setting. The proposed methods are illustrated by 2 case studies: one on the meta-analysis of published studies on ear thermometry to diagnose fever in children and one on the validation of a multivariable clinical risk prediction model for the diagnosis of ovarian cancer in a multicenter dataset. Crucially, in both case studies the clinical utility of the test/model was heterogeneous across settings, limiting its usefulness in practice. This emphasizes that heterogeneity in clinical utility should be assessed before a test/model is routinely implemented.
Keywords: decision curves; diagnostic; meta-analysis; net benefit; test accuracy.
Copyright © 2018 John Wiley & Sons, Ltd.