[Cluster sampling: consequences of data analysis on drawing conclusions]

Rev Epidemiol Sante Publique. 2005 Feb;53(1):43-50. doi: 10.1016/s0398-7620(05)84571-7.
[Article in French]

Abstract

Background: Cluster sampling is commonly used since it does not require a sampling frame which lists all the individual enumeration units. However, this sampling design is often less precise than simple random sampling due to frequent homogeneity of individuals within clusters. This note illustrates that the precision of parameters such as mean, prevalence and odds ratio can be biased when the data analysis ignores the sampling design, yielding to possibly erroneous conclusions.

Methods: Data from a cluster sampling among clandestine sex workers in Senegal were used. Two analyses were performed and their results were compared. The first analysis took into account the sampling design (design-based analysis) while the second did not (naïve analysis).

Results: The range of confidence intervals in design-based analysis differed from -43% to +84% with regard to those of naive analysis, and different conclusions could be drawn. For instance, the human immunodeficiency virus (HIV) infection in clandestine sex workers was associated with condoms use and perceived risk of HIV infection in design-based analysis but not in naive analysis.

Conclusion: The data analysis must take into account the sampling design, and this is facilitated by the availability of statistical software with survey analysis capabilities.

MeSH terms

  • Adult
  • Cluster Analysis*
  • Condoms / statistics & numerical data
  • Data Interpretation, Statistical*
  • Female
  • HIV Infections / epidemiology*
  • Humans
  • Prevalence
  • Risk-Taking
  • Senegal / epidemiology
  • Sex Work*