How do clinical researchers generate data-driven scientific hypotheses? Cognitive events using think-aloud protocol

medRxiv [Preprint]. 2023 Oct 31:2023.10.31.23297860. doi: 10.1101/2023.10.31.23297860.

Abstract

Objectives: This study aims to identify the cognitive events related to information use (e.g., "Analyze data", "Seek connection") during hypothesis generation among clinical researchers. Specifically, we describe hypothesis generation using cognitive event counts and compare them between groups.

Methods: The participants used the same datasets, followed the same scripts, used VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control) to analyze the datasets, and came up with hypotheses while following the think-aloud protocol. Their screen activities and audio were recorded and then transcribed and coded for cognitive events.

Results: The VIADS group exhibited the lowest mean number of cognitive events per hypothesis and the smallest standard deviation. The experienced clinical researchers had approximately 10% more valid hypotheses than the inexperienced group. The VIADS users among the inexperienced clinical researchers exhibit a similar trend as the experienced clinical researchers in terms of the number of cognitive events and their respective percentages out of all the cognitive events. The highest percentages of cognitive events in hypothesis generation were "Using analysis results" (30%) and "Seeking connections" (23%).

Conclusion: VIADS helped inexperienced clinical researchers use fewer cognitive events to generate hypotheses than the control group. This suggests that VIADS may guide participants to be more structured during hypothesis generation compared with the control group. The results provide evidence to explain the shorter average time needed by the VIADS group in generating each hypothesis.

Keywords: Clinical research; Cognitive events; Data-driven hypothesis generation; Scientific hypothesis generation; Secondary data analytical tool; Think-aloud method.

Publication types

  • Preprint