The prevalence of negative studies with inadequate statistical power: an analysis of the plastic surgery literature

Plast Reconstr Surg. 2002 Jan;109(1):1-6; discussion 7-8. doi: 10.1097/00006534-200201000-00001.

Abstract

Studies published in the medical literature often neglect to consider the statistical power needed to detect a meaningful difference between study groups. Small sample sizes tend to produce negative results because of low statistical power. Studies that cannot make conclusive statements about their hypotheses can waste resources, deter further research, and impede advances in clinical treatment. The current study reviewed three of the most frequently read plastic surgery journals from 1976 to 1996 to determine the prevalence of inadequately (<80 percent) powered clinical trials and experimental studies that found no difference (negative studies) in the response variable of interest between comparison groups. The statistical power of 54 negative studies using continuous response variables was calculated to detect a difference of 1 SD (+/-1 SD) in means between the comparative groups. The power of another 57 negative studies with dichotomous response (yes/no) variables was calculated to detect a relative change in proportions of 25 percent and 50 percent from the experimental to the control group. It was found that 85 percent of the studies with continuous response variables had inadequate power to detect the desired mean difference of +/-1 SD. In studies with dichotomous response variables, 98 percent had inadequate power to detect a desired 25 percent relative change in proportions, and 74 percent had inadequate power to detect a desired 50 percent relative change in proportions. These results indicate that many of the studies in the plastic surgery literature lack adequate power to detect a moderate-to-large difference between groups. The lack of power makes the interpretation of the studies with negative findings inconclusive. Proper study design dictates that investigators consider a priori the difference between groups that is of clinical interest, and the sample size per group that is needed to provide adequate statistical power to detect the desired difference.

MeSH terms

  • Animals
  • Bibliometrics*
  • Clinical Trials as Topic / statistics & numerical data
  • Data Interpretation, Statistical*
  • Humans
  • Periodicals as Topic / statistics & numerical data*
  • Research / statistics & numerical data
  • Sample Size
  • Surgery, Plastic / statistics & numerical data*