Point of data saturation was assessed using resampling methods in a survey with open-ended questions

J Clin Epidemiol. 2016 Dec:80:88-96. doi: 10.1016/j.jclinepi.2016.07.014. Epub 2016 Aug 2.

Abstract

Objective: To describe methods to determine sample sizes in surveys using open-ended questions and to assess how resampling methods can be used to determine data saturation in these surveys.

Study design and setting: We searched the literature for surveys with open-ended questions and assessed the methods used to determine sample size in 100 studies selected at random. Then, we used Monte Carlo simulations on data from a previous study on the burden of treatment to assess the probability of identifying new themes as a function of the number of patients recruited.

Results: In the literature, 85% of researchers used a convenience sample, with a median size of 167 participants (interquartile range [IQR] = 69-406). In our simulation study, the probability of identifying at least one new theme for the next included subject was 32%, 24%, and 12% after the inclusion of 30, 50, and 100 subjects, respectively. The inclusion of 150 participants at random resulted in the identification of 92% themes (IQR = 91-93%) identified in the original study.

Conclusion: In our study, data saturation was most certainly reached for samples >150 participants. Our method may be used to determine when to continue the study to find new themes or stop because of futility.

Keywords: Computer simulation; Data saturation; Qualitative research; Simulation methods; Surveys and questionnaires; Surveys with open-ended questions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Epidemiologic Research Design*
  • Humans
  • Monte Carlo Method
  • Sampling Studies
  • Surveys and Questionnaires*