In recent years, natural stimuli such as audio excerpts or video streams have received increasing attention in neuroimaging studies. Compared with conventional simple, idealized and repeated artificial stimuli, natural stimuli contain more unrepeated, dynamic and complex information that are more close to real-life. However, there is no direct correspondence between the stimuli and any sensory or cognitive functions of the brain, which makes it difficult to apply traditional hypothesis-driven analysis methods (e.g., the general linear model (GLM)). Moreover, traditional data-driven methods (e.g., independent component analysis (ICA)) lack quantitative modeling of stimuli, which may limit the power of analysis models. In this paper, we propose a sparse representation based decoding framework to explore the neural correlates between the computational audio features and functional brain activities under free listening conditions. First, we adopt a biologically-plausible auditory saliency feature to quantitatively model the audio excerpts and meanwhile develop sparse representation/dictionary learning method to learn an over-complete dictionary basis of brain activity patterns. Then, we reconstruct the auditory saliency features from the learned fMRI-derived dictionaries. After that, a group-wise analysis procedure is conducted to identify the associated brain regions and networks. Experiments showed that the auditory saliency feature can be well decoded from brain activity patterns by our methods, and the identified brain regions and networks are consistent and meaningful. At last, our method is evaluated and compared with ICA method and experimental results demonstrated the superiority of our methods.
Keywords: Auditory saliency; Brain networks; Decoding; Natural stimuli; Sparse representation.