Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal

Behav Res Methods. 2024 Dec 4;57(1):5. doi: 10.3758/s13428-024-02515-z.

Abstract

This study investigates the potential of large language models (LLMs) to provide accurate estimates of concreteness, valence, and arousal for multi-word expressions. Unlike previous artificial intelligence (AI) methods, LLMs can capture the nuanced meanings of multi-word expressions. We systematically evaluated GPT-4o's ability to predict concreteness, valence, and arousal. In Study 1, GPT-4o showed strong correlations with human concreteness ratings (r = .8) for multi-word expressions. In Study 2, these findings were repeated for valence and arousal ratings of individual words, matching or outperforming previous AI models. Studies 3-5 extended the valence and arousal analysis to multi-word expressions and showed good validity of the LLM-generated estimates for these stimuli as well. To help researchers with stimulus selection, we provide datasets with LLM-generated norms of concreteness, valence, and arousal for 126,397 English single words and 63,680 multi-word expressions.

Keywords: Arousal; Concreteness; Large language model; Multi-word expressions; Valence; Word norms.

MeSH terms

  • Arousal* / physiology
  • Artificial Intelligence
  • Humans
  • Language*
  • Psycholinguistics / methods
  • Semantics