Understanding how the brain processes stimuli in a rich natural environment is a fundamental goal of neuroscience. Here, we showed a feature film to 10 healthy volunteers during functional magnetic resonance imaging (fMRI) of hemodynamic brain activity. We then annotated auditory and visual features of the motion picture to inform analysis of the hemodynamic data. The annotations were fitted to both voxel-wise data and brain network time courses extracted by independent component analysis (ICA). Auditory annotations correlated with two independent components (IC) disclosing two functional networks, one responding to variety of auditory stimulation and another responding preferentially to speech but parts of the network also responding to non-verbal communication. Visual feature annotations correlated with four ICs delineating visual areas according to their sensitivity to different visual stimulus features. In comparison, a separate voxel-wise general linear model based analysis disclosed brain areas preferentially responding to sound energy, speech, music, visual contrast edges, body motion and hand motion which largely overlapped the results revealed by ICA. Differences between the results of IC- and voxel-based analyses demonstrate that thorough analysis of voxel time courses is important for understanding the activity of specific sub-areas of the functional networks, while ICA is a valuable tool for revealing novel information about functional connectivity which need not be explained by the predefined model. Our results encourage the use of naturalistic stimuli and tasks in cognitive neuroimaging to study how the brain processes stimuli in rich natural environments.