Point of data saturation was assessed using resampling methods in a survey with open-ended questions

Viet-Thi Tran; Raphael Porcher; Bruno Falissard; Philippe Ravaud

doi:10.1016/j.jclinepi.2016.07.014

Point of data saturation was assessed using resampling methods in a survey with open-ended questions

J Clin Epidemiol. 2016 Dec:80:88-96. doi: 10.1016/j.jclinepi.2016.07.014. Epub 2016 Aug 2.

Authors

Viet-Thi Tran¹, Raphael Porcher², Bruno Falissard³, Philippe Ravaud⁴

Affiliations

¹ Department of General Medicine, Paris Diderot University, 16 rue Henri Huchard, Paris 75018, France; METHODS Team, Centre de Recherche en Epidémiologie et Statistiques (CRESS), INSERM U1153, 1 Place du Parvis Notre Dame, Paris 75004, France; Centre d'Épidémiologie Clinique, Hôpital Hôtel-Dieu, Assistance Publique-Hôpitaux de Paris, 1 Place du Parvis Notre Dame, Paris 75004, France. Electronic address: [email protected].
² METHODS Team, Centre de Recherche en Epidémiologie et Statistiques (CRESS), INSERM U1153, 1 Place du Parvis Notre Dame, Paris 75004, France; Centre d'Épidémiologie Clinique, Hôpital Hôtel-Dieu, Assistance Publique-Hôpitaux de Paris, 1 Place du Parvis Notre Dame, Paris 75004, France; Paris Descartes University, 12 Rue de l'École de Médecine, Paris 75006, France.
³ Paris Sud University, 15 Rue Georges Clemenceau, Orsay 91400, France; Centre de Recherche en Epidémiologie et Santé des populations (CESP), INSERM U1018, 16, avenue Paul Vaillant-Couturier, Villejuif 94807, France.
⁴ METHODS Team, Centre de Recherche en Epidémiologie et Statistiques (CRESS), INSERM U1153, 1 Place du Parvis Notre Dame, Paris 75004, France; Centre d'Épidémiologie Clinique, Hôpital Hôtel-Dieu, Assistance Publique-Hôpitaux de Paris, 1 Place du Parvis Notre Dame, Paris 75004, France; Paris Descartes University, 12 Rue de l'École de Médecine, Paris 75006, France; Department of Epidemiology, Columbia University Mailman School of Public Health, 116th St & Broadway, New York, NY, USA.

PMID: 27492788
DOI: 10.1016/j.jclinepi.2016.07.014

Abstract

Objective: To describe methods to determine sample sizes in surveys using open-ended questions and to assess how resampling methods can be used to determine data saturation in these surveys.

Study design and setting: We searched the literature for surveys with open-ended questions and assessed the methods used to determine sample size in 100 studies selected at random. Then, we used Monte Carlo simulations on data from a previous study on the burden of treatment to assess the probability of identifying new themes as a function of the number of patients recruited.

Results: In the literature, 85% of researchers used a convenience sample, with a median size of 167 participants (interquartile range [IQR] = 69-406). In our simulation study, the probability of identifying at least one new theme for the next included subject was 32%, 24%, and 12% after the inclusion of 30, 50, and 100 subjects, respectively. The inclusion of 150 participants at random resulted in the identification of 92% themes (IQR = 91-93%) identified in the original study.

Conclusion: In our study, data saturation was most certainly reached for samples >150 participants. Our method may be used to determine when to continue the study to find new themes or stop because of futility.

Keywords: Computer simulation; Data saturation; Qualitative research; Simulation methods; Surveys and questionnaires; Surveys with open-ended questions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Epidemiologic Research Design*
Humans
Monte Carlo Method
Sampling Studies
Surveys and Questionnaires*