A 64,489-patient full-disclosure database of cardiovascular risk factors and events status analysed in a Bayesian framework: a unique contribution to predictive science

Int J Cardiol. 2013 Apr 30;165(1):3-6. doi: 10.1016/j.ijcard.2012.09.209. Epub 2012 Nov 20.

Abstract

Today in the International Journal of Cardiology Liu et al. [1] publish an unusual exercise in open science which should set a pioneering trend for future knowledge sharing. They present both the principle and a large fully-analysed real world dataset to show how Bayesian reasoning can be practically helpful for clinicians at the front line. The Bayesian approach differs from the frequentist approach that is more commonly seen in reports of clinical research. Instead of a probability having a single point estimate and confidence interval, it instead has a complete probability density function. For Bayesian analysis in general, instead of there being no information before a particular study, there is some information--the "prior". The difference is that while the frequentist approach assumes that before the study all probabilities are equally plausible, the Bayesian approach recognises that even before the study, some probabilities are more likely than others. Therefore, after the study, the Bayesian approach produces a new distribution of the probability--the "posterior"--which incorporates both the raw study results and the prior distribution. Bayesian approaches are routinely used in medical decision-making and everyday life, perhaps without even realising it. Clinical test results are rarely interpreted in isolation. Instead, the background clinical belief of plausibility of various diagnoses (the prior) is updated in light of test results, to form a new set of beliefs (the posterior). We more readily accept assertions that are within the range of our prior beliefs than those that substantially contradict those beliefs. To build a model of cardiovascular risk, the Bayesian approach begins with an assumed distribution for the risk depending on the risk factors and progressively updates it with the experience of patients and their outcomes. Each additional patient makes a contribution to the model's knowledge. Then the model can be applied to any individual, and provide a distribution for the risk of that individual. This might be narrow, indicating precise risk evaluation or wide, indicating substantial persisting uncertainty. The authors' openness to share the whole dataset creates three exciting avenues for advancement in the field. First, researchers could analyse the dataset in different ways, for example, by proposing distributions other than the normal. Second, they could use the outcome of this dataset as a starting point for further upgrading the model with future data. Third, researchers are absolutely free to use this data to explore other interrelationships between the variables for new purposes. For example, we have studied the joint distribution of two variables that have a multiplicative effect of cardiovascular risk: cholesterol and blood pressure. The online supplement to this editorial contains the raw dataset in .zip format to facilitate its download for the reader. Freely exposing all the data is currently remarkable but on objective reflection, it is hard to understand why it is not already a normal practice. Do authors fear that readers despite a handicap of years might trump them to future findings? Or do they have something to hide? We do not know but this paper is changing our practice and we hope it will change yours.

Publication types

  • Editorial
  • Comment

MeSH terms

  • Bayes Theorem*
  • Cardiovascular Diseases / diagnosis*
  • Cardiovascular Diseases / epidemiology*
  • Female
  • Humans
  • Male