Many studies in the literature attempt recognition of emotions through the use of videos or images, but very few have explored the role that sounds have in evoking emotions. In this study we have devised an experimental protocol for elicitation of emotions by using, separately and jointly, images and sounds from the widely used International Affective Pictures System and International Affective Digital Sounds databases. During the experiments we have recorded the skin conductance and pupillary signals and processed them with the goal of extracting indices linked to the autonomic nervous system, thus revealing specific patterns of behavior depending on the different stimulation modalities. Our results show that skin conductance helps discriminate emotions along the arousal dimension, whereas features derived from the pupillary signal are able to discriminate different states along both valence and arousal dimensions. In particular, the pupillary diameter was found to be significantly greater at increasing arousal and during elicitation of negative emotions in the phases of viewing images and images with sounds. In the sound-only phase, on the other hand, the power calculated in the high and very high frequency bands of the pupillary diameter were significantly greater at higher valence (valence ratings > 5). Clinical relevance- This study demonstrates the ability of physiological signals to assess specific emotional states by providing different activation patterns depending on the stimulation through images, sounds and images with sounds. The approach has high clinical relevance as it could be extended to evaluate mood disorders (e.g. depression, bipolar disorders, or just stress), or to use physiological patterns found for sounds in order to study whether hearing aids can lead to increased emotional perception.