Analysis of overdispersed count data: application to the Human Papillomavirus Infection in Men (HIM) Study

Epidemiol Infect. 2012 Jun;140(6):1087-94. doi: 10.1017/S095026881100166X. Epub 2011 Aug 30.

Abstract

The Poisson model can be applied to the count of events occurring within a specific time period. The main feature of the Poisson model is the assumption that the mean and variance of the count data are equal. However, this equal mean-variance relationship rarely occurs in observational data. In most cases, the observed variance is larger than the assumed variance, which is called overdispersion. Further, when the observed data involve excessive zero counts, the problem of overdispersion results in underestimating the variance of the estimated parameter, and thus produces a misleading conclusion. We illustrated the use of four models for overdispersed count data that may be attributed to excessive zeros. These are Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial models. The example data in this article deal with the number of incidents involving human papillomavirus infection. The four models resulted in differing statistical inferences. The Poisson model, which is widely used in epidemiology research, underestimated the standard errors and overstated the significance of some covariates.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Adult
  • Alphapapillomavirus / isolation & purification*
  • Humans
  • Male
  • Middle Aged
  • Models, Biological
  • Papillomavirus Infections / epidemiology*
  • Risk Factors
  • Young Adult