Expansion of the SyllabO+ corpus and database: Words, lemmas, and morphology

Behav Res Methods. 2025 Jan 7;57(1):47. doi: 10.3758/s13428-024-02582-2.

Abstract

Having a detailed description of the psycholinguistic properties of a language is essential for conducting well-controlled language experiments. However, there is a paucity of databases for some languages and regional varieties, including Québec French. The SyllabO+ corpus was created to provide a complete phonological and syllabic analysis of a corpus of spoken Québec French. In the present study, the corpus was expanded with 41 additional speakers, bringing the total to 225. The analysis was also expanded to include three new databases: unique words, lemmas, and morphemes (inflectional, derivational, and compounds). Next, the internal structure of unique words was analyzed to identify roots, inflectional markers, and affixes, as well as the components of compounds. Additionally, a group of 441 speakers of Québec French provided semantic transparency ratings for 3764 derived words. Results from the semantic transparency judgment study show broad inter-individual variability for words of medium transparency. No influence of sociodemographic variables was found. Transparency ratings are coherent with studies showing the greater transparency of suffixed words compared to prefixed words. Results for participants who speak French as a second language support the association between second-language proficiency and morphological processing.

Keywords: Composition; Corpus; Derivational morphology; Distributional statistics; Inflectional morphology; Lemmas; Morphology; Oral language; Words.

MeSH terms

  • Adolescent
  • Adult
  • Databases, Factual
  • Female
  • Humans
  • Language*
  • Male
  • Middle Aged
  • Phonetics
  • Psycholinguistics* / methods
  • Quebec
  • Semantics
  • Young Adult