The impact of ChatGPT on human data collection: A case study involving typicality norming data

Behav Res Methods. 2024 Aug;56(5):4974-4981. doi: 10.3758/s13428-023-02235-w. Epub 2023 Oct 3.

Abstract

Tools like ChatGPT, which allow people to unlock the potential of large language models (LLMs), have taken the world by storm. ChatGPT's ability to produce written output of remarkable quality has inspired, or forced, academics to consider its consequences for both research and education. In particular, the question of what constitutes authorship, and how to evaluate (scientific) contributions has received a lot of attention. However, its impact on (online) human data collection has mostly flown under the radar. The current paper examines how ChatGPT can be (mis)used in the context of generating norming data. We found that ChatGPT is able to produce sensible output, resembling that of human participants, for a typicality rating task. Moreover, the test-retest reliability of ChatGPT's ratings was similar to that of human participants tested 1 day apart. We discuss the relevance of these findings in the context of (online) human data collection, focusing both on opportunities (e.g., (risk-)free pilot data) and challenges (e.g., data fabrication).

Keywords: ChatGPT; Human data collection; Large language models; Typicality.

MeSH terms

  • Adult
  • Authorship
  • Data Collection* / methods
  • Female
  • Humans
  • Language
  • Male
  • Reproducibility of Results
  • Software