Continuous and discrete decoding of overt speech with scalp electroencephalography (EEG)

Alexander Craik; Heather R Dial; Jose L Contreras-Vidal

doi:10.1088/1741-2552/ad8d0a

Continuous and discrete decoding of overt speech with scalp electroencephalography (EEG)

J Neural Eng. 2024 Oct 30. doi: 10.1088/1741-2552/ad8d0a. Online ahead of print.

Authors

Alexander Craik¹, Heather R Dial², Jose L Contreras-Vidal³

Affiliations

¹ Electrical Engineering, University of Houston, N308 Engineering Bldg 1, 4726 Calhoun Rd, Houston, Texas, 77204, UNITED STATES.
² Department of Communication Sciences and Disorders, University of Houston, 3871 Holman St., Houston, Texas, 77204, UNITED STATES.
³ Electrical and Computer Engineering, University of Houston, N308 Engineering Building I, Houston, Texas, 77204-4005, UNITED STATES.

PMID: 39476487
DOI: 10.1088/1741-2552/ad8d0a

Abstract

Neurological disorders affecting speech production adversely impact quality of life for over 7 million individuals in the US. Traditional speech interfaces like eyetracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech Brain-Computer Interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences displayed on a screen selected for phonetic similarity to the English language. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with and without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes for real-time application. These models were employed for discrete and continuous speech decoding tasks, achieving statistically significant participant-independent decoding performance for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, and gamma) for decoding performance, and a perturbation analysis was used to identify crucial channels. Assessed channel selection methods did not significantly improve performance, suggesting a distributed representation of speech information encoded in the EEG signals. Leave-One-Out training demonstrated the feasibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants.

Keywords: Electroencephalography; Electromyography removal; attention networks; convolutional neural networks; deep learning; recurrent neural networks; speech decoding.

Creative Commons Attribution license.