People make mistakes: Obtaining accurate ground truth from continuous annotations of subjective constructs

Behav Res Methods. 2024 Dec;56(8):8784-8800. doi: 10.3758/s13428-024-02503-3. Epub 2024 Sep 30.

Abstract

Accurately representing changes in mental states over time is crucial for understanding their complex dynamics. However, there is little methodological research on the validity and reliability of human-produced continuous-time annotation of these states. We present a psychometric perspective on valid and reliable construct assessment, examine the robustness of interval-scale (e.g., values between zero and one) continuous-time annotation, and identify three major threats to validity and reliability in current approaches. We then propose a novel ground truth generation pipeline that combines emerging techniques for improving validity and robustness. We demonstrate its effectiveness in a case study involving crowd-sourced annotation of perceived violence in movies, where our pipeline achieves a .95 Spearman correlation in summarized ratings compared to a .15 baseline. These results suggest that highly accurate ground truth signals can be produced from continuous annotations using additional comparative annotation (e.g., a versus b) to correct structured errors, highlighting the need for a paradigm shift in robust construct measurement over time.

Keywords: Continuous annotation; Movie violence; Ordinal perception; Reliability; Validity.

MeSH terms

  • Humans
  • Psychometrics* / instrumentation
  • Psychometrics* / methods
  • Reproducibility of Results
  • Violence / psychology