The prediction of subject traits using brain data is an important goal in neuroscience, with relevant applications in clinical research, as well as in the study of differential psychology and cognition. While previous prediction work has predominantly been done on neuroimaging data, our focus is on electroencephalography (EEG), a relatively inexpensive, widely available and non-invasive data modality. However, EEG data is complex and needs some form of feature extraction for subsequent prediction. This process is sometimes done manually, risking biases and suboptimal decisions. Here we investigate the use of data-driven Kernel methods for prediction from single channels using the EEG spectrogram, which reflects macro-scale neural oscillations in the brain. Specifically, we introduce the idea of reinterpreting the spectrogram of each channel as a probability distribution, so that we can leverage advanced machine learning techniques that can handle probability distributions with mathematical rigour and without the need for manual feature extraction. We explore how the resulting technique, Kernel mean embedding regression, compares to a standard application of Kernel ridge regression as well as to a non-Kernelised approach. Overall, we found that the Kernel methods exhibit improved performance thanks to their capacity to handle nonlinearities in the relation between the EEG spectrogram and the trait of interest. We leveraged this method to predict biological age in a multinational EEG data set, HarMNqEEG, showing the method's capacity to generalise across experiments and acquisition setups.
Keywords: EEG; Kernel mean embedding regression; Kernel methods; brain age; machine learning; maximum mean discrepancy.
© 2024 The Author(s). Human Brain Mapping published by Wiley Periodicals LLC.