Coherence and comprehensibility: Large language models predict lay understanding of health-related content

J Biomed Inform. 2025 Jan:161:104758. doi: 10.1016/j.jbi.2024.104758. Epub 2024 Dec 9.

Abstract

Health literacy is a prerequisite to informed health-related decision making. To facilitate understanding of information, text should be presented at an appropriate reading level for the reader. Cognitive studies suggest that the coherence of a text - the interconnectedness between the ideas it expresses - is especially important for low-knowledge readers, who lack the background knowledge to draw inferences from text that is implicitly connected only. Prior work in cognitive science has yielded automated methods to estimate coherence. These methods estimate the proximity between text representations in a semantic vector space, with the underlying idea that units of text that are poorly connected will be further apart in this space. In addition, recent work with large language models (LLMs) has produced probabilistic methodological analogues that have yet to be evaluated for this purpose. This work concerns the relationship between these automated measures and layperson comprehension of biomedical text. To characterize this relationship, we applied a range of automated measures of text coherence to a set of text snippets, some of which were deliberately modified to improve their accessibility in a series of reading comprehension experiments. Results indicate significant associations between reader comprehension - as estimated using multiple-choice questions - and LLM-derived coherence metrics. Interventions designed to improve the comprehensibility of passages also improved their coherence, as measured with the best-performing LLM-derived models and shown by improved reader understanding of the text. These findings support the utility of LLM-derived measures of text coherence as a means to identify gaps in connectedness that make biomedical text difficult for laypeople to understand, with the potential to inform both manual and automated methods to improve the accessibility of the biomedical literature.

Keywords: Large language models; Text coherence; Word embeddings; layperson comprehension.

MeSH terms

  • Adult
  • Comprehension*
  • Female
  • Health Literacy*
  • Humans
  • Language
  • Male
  • Natural Language Processing
  • Reading
  • Semantics