-
Unmasking Bias: A Framework for Evaluating Treatment Benefit Predictors Using Observational Studies
Authors:
Yuan Xia,
Mohsen Sadatsafavi,
Paul Gustafson
Abstract:
Treatment benefit predictors (TBPs) map patient characteristics into an estimate of the treatment benefit tailored to individual patients, which can support optimizing treatment decisions. However, the assessment of their performance might be challenging with the non-random treatment assignment. This study conducts a conceptual analysis, which can be applied to finite-sample studies. We present a…
▽ More
Treatment benefit predictors (TBPs) map patient characteristics into an estimate of the treatment benefit tailored to individual patients, which can support optimizing treatment decisions. However, the assessment of their performance might be challenging with the non-random treatment assignment. This study conducts a conceptual analysis, which can be applied to finite-sample studies. We present a framework for evaluating TBPs using observational data from a target population of interest. We then explore the impact of confounding bias on TBP evaluation using measures of discrimination and calibration, which are the moderate calibration and the concentration of the benefit index ($C_b$), respectively. We illustrate that failure to control for confounding can lead to misleading values of performance metrics and establish how the confounding bias propagates to an evaluation bias to quantify the explicit bias for the performance metrics. These findings underscore the necessity of accounting for confounding factors when evaluating TBPs, ensuring more reliable and contextually appropriate treatment decisions.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
The expected value of sample information calculations for external validation of risk prediction models
Authors:
Mohsen Sadatsafavi,
Andrew J Vickers,
Tae Yoon Lee,
Paul Gustafson,
Laure Wynants
Abstract:
In designing external validation studies of clinical prediction models, contemporary sample size calculation methods are based on the frequentist inferential paradigm. One of the widely reported metrics of model performance is net benefit (NB), and the relevance of conventional inference around NB as a measure of clinical utility is doubtful. Value of Information methodology quantifies the consequ…
▽ More
In designing external validation studies of clinical prediction models, contemporary sample size calculation methods are based on the frequentist inferential paradigm. One of the widely reported metrics of model performance is net benefit (NB), and the relevance of conventional inference around NB as a measure of clinical utility is doubtful. Value of Information methodology quantifies the consequences of uncertainty in terms of its impact on clinical utility of decisions. We introduce the expected value of sample information (EVSI) for validation as the expected gain in NB from conducting an external validation study of a given size. We propose algorithms for EVSI computation, and in a case study demonstrate how EVSI changes as a function of the amount of current information and future study's sample size. Value of Information methodology provides a decision-theoretic lens to the process of planning a validation study of a risk prediction model and can complement conventional methods when designing such studies.
△ Less
Submitted 6 January, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
Non-parametric inference on calibration of predicted risks
Authors:
Mohsen Sadatsafavi,
John Petkau
Abstract:
Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypoth…
▽ More
Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified "bridge" test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack, where we provide suggestions on graphical presentation and the interpretation of results. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters. An accompanying R package implements this method (see https://github.com/resplab/cumulcalib/).
△ Less
Submitted 23 May, 2024; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Methodological concerns about 'concordance-statistic for benefit' as a measure of discrimination in treatment benefit prediction
Authors:
Yuan Xia,
Paul Gustafson,
Mohsen Sadatsafavi
Abstract:
Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predict…
▽ More
Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predictor by directly extending the concept of the concordance statistic from a risk model with a binary outcome to a model for treatment benefit. In this work, we scrutinize $cfb$ on multiple fronts. Through numerical examples and theoretical developments, we show that cfb is not a proper scoring rule. We also show that it is sensitive to the unestimable correlation between counterfactual outcomes and to the definition of matched pairs. We argue that measures of statistical dispersion applied to predicted benefits do not suffer from these issues and can be an alternative metric for the discriminatory performance of treatment benefit predictors.
△ Less
Submitted 15 May, 2023; v1 submitted 29 August, 2022;
originally announced August 2022.
-
Value of Information Analysis for External Validation of Risk Prediction Models
Authors:
Mohsen Sadatsafavi,
Tae Yoon Lee,
Laure Wynants,
Andrew Vickers,
Paul Gustafson
Abstract:
Background: Before being used to inform patient care, a risk prediction model needs to be validated in a representative sample from the target population. The finite size of the validation sample entails that there is uncertainty with respect to estimates of model performance. We apply value-of-information methodology as a framework to quantify the consequence of such uncertainty in terms of NB. M…
▽ More
Background: Before being used to inform patient care, a risk prediction model needs to be validated in a representative sample from the target population. The finite size of the validation sample entails that there is uncertainty with respect to estimates of model performance. We apply value-of-information methodology as a framework to quantify the consequence of such uncertainty in terms of NB. Methods: We define the Expected Value of Perfect Information (EVPI) for model validation as the expected loss in NB due to not confidently knowing which of the alternative decisions confers the highest NB at a given risk threshold. We propose methods for EVPI calculations based on Bayesian or ordinary bootstrapping of NBs, as well as an asymptotic approach supported by the central limit theorem. We conducted brief simulation studies to compare the performance of these methods, and used subsets of data from an international clinical trial for predicting mortality after myocardial infarction as a case study. Results: The three computation methods generated similar EVPI values in simulation studies. In the case study, at the pre-specified threshold of 0.02, the best decision with current information would be to use the model, with an expected incremental NB of 0.0020 over treating all. At this threshold, EVPI was 0.0005 (a relative EVPI of 25%). When scaled to the annual number of heart attacks in the US, this corresponds to a loss of 400 true positives, or extra 19,600 false positives (unnecessary treatments) per year, indicating the value of further model validation. As expected, the validation EVPI generally declined with larger samples. Conclusion: Value-of-information methods can be applied to the NB calculated during external validation of clinical prediction models to provide a decision-theoretic perspective to the consequences of uncertainty.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Closed-Form Solution of the Unit Normal Loss Integral in Two-Dimensions
Authors:
Tae Yoon Lee,
Paul Gustafson,
Mohsen Sadatsafavi
Abstract:
In Value of Information (VoI) analysis, the unit normal loss integral (UNLI) frequently emerges as a solution for the computation of various VoI metrics. However, one limitation of the UNLI has been that its closed-form solution is available for only one dimension, and thus can be used for comparisons involving only two strategies (where it is applied to the scalar incremental net benefit). We der…
▽ More
In Value of Information (VoI) analysis, the unit normal loss integral (UNLI) frequently emerges as a solution for the computation of various VoI metrics. However, one limitation of the UNLI has been that its closed-form solution is available for only one dimension, and thus can be used for comparisons involving only two strategies (where it is applied to the scalar incremental net benefit). We derived a closed-form solution for the two-dimensional UNLI, enabling closed-form VoI calculations for three strategies. We verified the accuracy of this method via simulation studies. A case study based on a three-arm clinical trial was used as an example. VoI methods based on the closed-form solutions for the UNLI can now be extended to three-decision comparisons, taking a fraction of a second to compute and not being subject to Monte Carlo error. An R implementation of this method is provided as part of the predtools package (https://github.com/resplab/predtools/).
△ Less
Submitted 23 July, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Programmable Interface for Statistical & Simulation Models (PRISM): Towards Greater Accessibility of Clinical and Healthcare Decision Models
Authors:
Amin Adibi,
Stephanie Harvard,
Mohsen Sadatsafavi
Abstract:
Background: Increasingly, decision-making in healthcare relies on computer models, be it clinical prediction models at point of care or decision-analytic models at the policymaking level. Given the important role models play in both contexts, their structure and implementation be rigorously scrutinized. The ability to interrogate input/output associations without facing barriers can improve qualit…
▽ More
Background: Increasingly, decision-making in healthcare relies on computer models, be it clinical prediction models at point of care or decision-analytic models at the policymaking level. Given the important role models play in both contexts, their structure and implementation be rigorously scrutinized. The ability to interrogate input/output associations without facing barriers can improve quality assurance mechanisms while satisfying privacy/confidentiality concerns and facilitating the integration of models into decision-making. This paper reports on the development of Programmable Interface for Statistical & Simulation Models (PRISM), a cloud-based platform for model accessibility. Methods: PRISM emphasizes two main principles: 1) minimal specifications on the side of model developer to make the model fit for cloud hosting, and 2) making client access completely independent of the resource requirement and software dependencies of the model. The server architecture integrates a RESTful Application Programming Interface (API) infrastructure, JSON for data transfer, a routing layer for access management, container technology for management of computer resources and package dependencies, and the capacity for synchronous or asynchronous model calls. Results: We discuss the architecture, the minimal API standards that enable a universal language for access to such models, the underlying server infrastructure, and the standards used for data transfer. An instance of PRISM is available as a service via the Peer Models Network http://peermodelsnetwork.com. Through a series of case studies, we demonstrate how interrogating models becomes possible in standardized fashion, in a way that is irrespective of the specifics of any model. Conclusions: We have developed a publicly accessible platform and minimalist standards that facilitate model accessibility for both clinical and policy models.
△ Less
Submitted 19 February, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Minding non-collapsibility of odds ratios when recalibrating risk prediction models
Authors:
Mohsen Sadatsafavi,
Hamid Tavakoli,
Abdollah Safari
Abstract:
In clinical prediction modeling, model updating refers to the practice of modifying a prediction model before it is used in a new setting. In the context of logistic regression for a binary outcome, one of the simplest updating methods is a fixed odds-ratio transformation of predicted risks to improve calibration-in-the-large. Previous authors have proposed equations for calculating this odds-rati…
▽ More
In clinical prediction modeling, model updating refers to the practice of modifying a prediction model before it is used in a new setting. In the context of logistic regression for a binary outcome, one of the simplest updating methods is a fixed odds-ratio transformation of predicted risks to improve calibration-in-the-large. Previous authors have proposed equations for calculating this odds-ratio based on the discrepancy between the prevalence in the original and the new population, or between the average of predicted and observed risks. We show that this method fails to consider the non-collapsibility of odds-ratio. Consequently, it under-corrects predicted risks, especially when predicted risks are more dispersed (i.e., for models with good discrimination). We suggest an approximate equation for recovering the conditional odds-ratio from the mean and variance of predicted risks. Brief simulations and a case study show that this approach reduces under-correction, sometimes substantially. R code for implementation is provided.
△ Less
Submitted 10 November, 2021; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Uncertainty and Value of Information in Risk Prediction Modeling
Authors:
Mohsen Sadatsafavi,
Tae Yoon Lee,
Paul Gustafson
Abstract:
Background: Due to the finite size of the development sample, predicted probabilities from a risk prediction model are inevitably uncertain. We apply Value of Information methodology to evaluate the decision-theoretic implications of prediction uncertainty.
Methods: Adopting a Bayesian perspective, we extend the definition of the Expected Value of Perfect Information (EVPI) from decision analysi…
▽ More
Background: Due to the finite size of the development sample, predicted probabilities from a risk prediction model are inevitably uncertain. We apply Value of Information methodology to evaluate the decision-theoretic implications of prediction uncertainty.
Methods: Adopting a Bayesian perspective, we extend the definition of the Expected Value of Perfect Information (EVPI) from decision analysis to net benefit calculations in risk prediction. In the context of model development, EVPI is the expected gain in net benefit by using the correct predictions as opposed to predictions from a proposed model. We suggest bootstrap methods for sampling from the posterior distribution of predictions for EVPI calculation using Monte Carlo simulations. In a case study, we used subsets of data of various sizes from a clinical trial for predicting mortality after myocardial infarction to show how EVPI changes with sample size.
Results: With a sample size of 1,000 and at the pre-specified threshold of 2% on predicted risks, the gain in net benefit by using the proposed and the correct models were 0.0006 and 0.0011, respectively, resulting in an EVPI of 0.0005 and a relative EVPI of 87%. EVPI was zero only at unrealistically high thresholds (>85%). As expected, EVPI declined with larger samples. We summarize an algorithm for incorporating EVPI calculations into the commonly used bootstrap method for optimism correction.
Conclusion: Value of Information methods can be applied to explore decision-theoretic consequences of uncertainty in risk prediction and can complement inferential methods when developing risk prediction models. R code for implementing this method is provided.
△ Less
Submitted 3 November, 2021; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Model-based ROC (mROC) curve: examining the effect of case-mix and model calibration on the ROC plot
Authors:
Mohsen Sadatsafavi,
Paramita Saha-Chaudhuri,
John Petkau
Abstract:
The performance of risk prediction models is often characterized in terms of discrimination and calibration. The Receiver Operating Characteristic (ROC) curve is widely used for evaluating model discrimination. When evaluating the performance of a risk prediction model in a new sample, the shape of the ROC curve is affected by both case-mix and the postulated model. Further, compared to discrimina…
▽ More
The performance of risk prediction models is often characterized in terms of discrimination and calibration. The Receiver Operating Characteristic (ROC) curve is widely used for evaluating model discrimination. When evaluating the performance of a risk prediction model in a new sample, the shape of the ROC curve is affected by both case-mix and the postulated model. Further, compared to discrimination, evaluating calibration has not received the same level of attention. Commonly used methods for model calibration involve subjective specification of smoothing or grouping. Leveraging the familiar ROC framework, we introduce the model-based ROC (mROC) curve to assess the calibration of a pre-specified model in a new sample. mROC curve is the ROC curve that should be observed if a pre-specified model is calibrated in the sample. We show the empirical ROC and mROC curves for a sample converge asymptotically if the model is calibrated in that sample. As a consequence, the mROC curve can be used to assess visually the effect of case-mix and model mis-calibration. Further, we propose a novel statistical test for calibration that does not require any smoothing or grouping. Simulations support the adequacy of the test. A case study puts these developments in a practical context. We conclude that mROC can easily be constructed and used to evaluate the effect of case-mix and model calibration on the ROC plot, thus adding to the utility of ROC curve analysis in the evaluation of risk prediction models. R code for the proposed methodology is provided (https://github.com/msadatsafavi/mROC/).
△ Less
Submitted 12 July, 2021; v1 submitted 29 February, 2020;
originally announced March 2020.
-
Concentration of Benefit index: A threshold-free summary metric for quantifying the capacity of covariates to yield efficient treatment rules
Authors:
Mohsen Sadatsafavi,
Mohammad Ali Mansournia,
Paul Gustafson
Abstract:
When data on treatment assignment, outcomes, and covariates from a randomized trial are available, a question of interest is to what extent covariates can be used to optimize treatment decisions. Statistical hypothesis testing of covariate-by-treatment interaction is ill-suited for this purpose. The application of decision theory results in treatment rules that compare the expected benefit of trea…
▽ More
When data on treatment assignment, outcomes, and covariates from a randomized trial are available, a question of interest is to what extent covariates can be used to optimize treatment decisions. Statistical hypothesis testing of covariate-by-treatment interaction is ill-suited for this purpose. The application of decision theory results in treatment rules that compare the expected benefit of treatment given the patient's covariates against a treatment threshold. However, determining treatment threshold is often context-specific, and any given threshold might seem arbitrary when the overall capacity towards predicting treatment benefit is of concern. We propose the Concentration of Benefit index (Cb), a threshold-free metric that quantifies the combined performance of covariates towards finding individuals who will benefit the most from treatment. The construct of the proposed index is comparing expected treatment outcomes with and without knowledge of covariates when one of a two randomly selected patients are to be treated. We show that the resulting index can also be expressed in terms of the integrated efficiency of individualized treatment decision over the entire range of treatment thresholds. We propose parametric and semi-parametric estimators, the latter being suitable for out-of-sample validation and correction for optimism. We used data from a clinical trial to demonstrate the calculations in a step-by-step fashion, and have provided the R code for implementation (https://github.com/msadatsafavi/txBenefit). The proposed index has intuitive and theoretically sound interpretation and can be estimated with relative ease for a wide class of regression models. Beyond the conceptual developments, various aspects of estimation and inference for such a metric need to be pursued in future research.
△ Less
Submitted 7 January, 2020; v1 submitted 1 January, 2020;
originally announced January 2020.
-
A threshold-free summary index for quantifying the capacity of covariates to yield efficient treatment rules
Authors:
Mohsen Sadatsafavi,
Mohammad Mansournia,
Paul Gustafson
Abstract:
The focus of this paper is on quantifying the capacity of covariates in devising efficient treatment rules when data from a randomized trial are available. Conventional one-variable-at-a-time subgroup analysis based on statistical hypothesis testing of covariate-by-treatment interaction is ill-suited for this purpose. The application of decision theory results in treatment rules that compare the e…
▽ More
The focus of this paper is on quantifying the capacity of covariates in devising efficient treatment rules when data from a randomized trial are available. Conventional one-variable-at-a-time subgroup analysis based on statistical hypothesis testing of covariate-by-treatment interaction is ill-suited for this purpose. The application of decision theory results in treatment rules that compare the expected benefit of treatment given the patient's covariates against a treatment threshold. However, determining treatment threshold is often context-specific, and any given threshold might seem arbitrary at the reporting stages of a clinical trial. We propose a threshold-free metric that quantifies the capacity of a set of covariates towards finding individuals who will benefit the most from treatment. The construct of the proposed metric is comparing the expected outcomes with and without knowledge of covariates when one of a two randomly selected patients are to be treated. We show that the resulting index can also be expressed in terms of integrated treatment benefit as a function of covariates over the entire range of treatment thresholds. We also propose a semi-parametric estimation method suitable for out-of-sample validation and adjustment for optimism. We use data from a clinical trial of preventive antibiotic therapy for reducing exacerbation rate in Chronic Obstructive Pulmonary Disease to demonstrate the calculations in a step-by-step fashion. The proposed index has intuitive and theoretically sound interpretation and can be estimated with relative ease for a wide class of regression models. Beyond the conceptual developments presented in this work, various aspects of estimation and inference for such metrics need to be pursued in future research.
△ Less
Submitted 22 August, 2019; v1 submitted 15 January, 2019;
originally announced January 2019.