Healthcare Cost Regressions: Going Beyond the Mean to Estimate the Full Distribution

Health Econ. 2015 Sep;24(9):1192-212. doi: 10.1002/hec.3178. Epub 2015 Apr 30.

Abstract

Understanding the data generating process behind healthcare costs remains a key empirical issue. Although much research to date has focused on the prediction of the conditional mean cost, this can potentially miss important features of the full distribution such as tail probabilities. We conduct a quasi-Monte Carlo experiment using the English National Health Service inpatient data to compare 14 approaches in modelling the distribution of healthcare costs: nine of which are parametric and have commonly been used to fit healthcare costs, and five others are designed specifically to construct a counterfactual distribution. Our results indicate that no one method is clearly dominant and that there is a trade-off between bias and precision of tail probability forecasts. We find that distributional methods demonstrate significant potential, particularly with larger sample sizes where the variability of predictions is reduced. Parametric distributions such as log-normal, generalised gamma and generalised beta of the second kind are found to estimate tail probabilities with high precision but with varying bias depending upon the cost threshold being considered.

Keywords: counterfactual distributions; healthcare costs; heavy tails; quasi-Monte Carlo.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Health Care Costs / statistics & numerical data*
  • Hospital Costs / statistics & numerical data
  • Humans
  • Models, Econometric
  • Monte Carlo Method
  • Probability
  • State Medicine / economics
  • State Medicine / statistics & numerical data
  • Statistics as Topic
  • United Kingdom