Simulating Early Phonetic and Word Learning Without Linguistic Categories

Dev Sci. 2025 Mar;28(2):e13606. doi: 10.1111/desc.13606.

Abstract

Before they even talk, infants become sensitive to the speech sounds of their native language and recognize the auditory form of an increasing number of words. Traditionally, these early perceptual changes are attributed to an emerging knowledge of linguistic categories such as phonemes or words. However, there is growing skepticism surrounding this interpretation due to limited evidence of category knowledge in infants. Previous modeling work has shown that a distributional learning algorithm could reproduce perceptual changes in infants' early phonetic learning without acquiring phonetic categories. Taking this inquiry further, we propose that linguistic categories may not be needed for early word learning. We introduce STELA, a predictive coding algorithm designed to extract statistical patterns from continuous raw speech data. Our findings demonstrate that STELA can reproduce some developmental patterns of phonetic and word form learning without relying on linguistic categories such as phonemes or words nor requiring explicit word segmentation. Through an analysis of the learned representations, we show evidence that linguistic categories may emerge as an end product of learning rather than being prerequisites during early language acquisition.

Keywords: language acquisition; lexical learning; linguistic categories; phonetic learning; self‐supervised learning; statistical learning.

MeSH terms

  • Algorithms
  • Humans
  • Infant
  • Language
  • Language Development*
  • Learning / physiology
  • Linguistics
  • Phonetics*
  • Speech / physiology
  • Speech Perception / physiology
  • Verbal Learning / physiology