Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal

Gonzalo Martínez; Juan Diego Molero; Sandra González; Javier Conde; Marc Brysbaert; Pedro Reviriego

doi:10.3758/s13428-024-02515-z

Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal

Behav Res Methods. 2024 Dec 4;57(1):5. doi: 10.3758/s13428-024-02515-z.

Authors

Gonzalo Martínez¹, Juan Diego Molero², Sandra González², Javier Conde², Marc Brysbaert³, Pedro Reviriego²

Affiliations

¹ Universidad Carlos III de Madrid, Madrid, Spain.
² ETSI de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain.
³ Department of Experimental Psychology, Ghent University, 9000, Ghent, Belgium. [email protected].

PMID: 39633225
DOI: 10.3758/s13428-024-02515-z

Abstract

This study investigates the potential of large language models (LLMs) to provide accurate estimates of concreteness, valence, and arousal for multi-word expressions. Unlike previous artificial intelligence (AI) methods, LLMs can capture the nuanced meanings of multi-word expressions. We systematically evaluated GPT-4o's ability to predict concreteness, valence, and arousal. In Study 1, GPT-4o showed strong correlations with human concreteness ratings (r = .8) for multi-word expressions. In Study 2, these findings were repeated for valence and arousal ratings of individual words, matching or outperforming previous AI models. Studies 3-5 extended the valence and arousal analysis to multi-word expressions and showed good validity of the LLM-generated estimates for these stimuli as well. To help researchers with stimulus selection, we provide datasets with LLM-generated norms of concreteness, valence, and arousal for 126,397 English single words and 63,680 multi-word expressions.

Keywords: Arousal; Concreteness; Large language model; Multi-word expressions; Valence; Word norms.

MeSH terms

Arousal* / physiology
Artificial Intelligence
Humans
Language*
Psycholinguistics / methods
Semantics