A large-scale database of Mandarin Chinese word associations from the Small World of Words Project

Behav Res Methods. 2024 Dec 30;57(1):34. doi: 10.3758/s13428-024-02513-1.

Abstract

Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.

Keywords: Chinese; Mental lexicon; Semantic network; Word association.

MeSH terms

  • Adult
  • China
  • Databases, Factual
  • Female
  • Humans
  • Language*
  • Male
  • Psycholinguistics / methods
  • Reaction Time / physiology
  • Semantics*
  • Vocabulary
  • Young Adult