AI-generated estimates of familiarity, concreteness, valence, and arousal for over 100,000 Spanish words

Gonzalo Martínez; Javier Conde; Pedro Reviriego; Marc Brysbaert

doi:10.1177/17470218241306694

AI-generated estimates of familiarity, concreteness, valence, and arousal for over 100,000 Spanish words

Q J Exp Psychol (Hove). 2024 Dec 24:17470218241306694. doi: 10.1177/17470218241306694. Online ahead of print.

Authors

Gonzalo Martínez¹, Javier Conde², Pedro Reviriego², Marc Brysbaert³

Affiliations

¹ Universidad Carlos III de Madrid, Leganés, Madrid, Spain.
² ETSI de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain.
³ Department of Experimental Psychology, Ghent University, Gent, Belgium.

PMID: 39614682
DOI: 10.1177/17470218241306694

Abstract

This study investigates whether estimates of familiarity, valence, arousal, and concreteness based on artificial intelligence (AI) are useful alternatives to word counts and human ratings in Spanish. We replicate and extend previous findings in English and show that GPT-4o is effective in estimating these word features. Validity checks even suggest that AI-generated estimates sometimes outperform traditional measurements. The ability to generate AI estimates for large numbers of words at low cost simplifies the process of obtaining word features and provides a new resource for researchers working in Spanish. We provide Excel lists of the collected word features, which can be freely used for research and teaching.

Keywords: GPT-4; Spanish; Word norms; arousal; concreteness; large language model; multiword expressions; valence.