Reading and understanding speech are usually considered as different manifestations of a single cognitive ability, that of language. In this study, we were interested in characterizing the specific contributions of input modality and linguistic complexity on the neural networks involved when subjects understand language. We conducted an fMRI study during which 10 right-handed male subjects had to read and listen to words, sentences and texts in different runs. By comparing reading to listening tasks, we were able to show that the cerebral regions specifically recruited by a given modality were circumscribed to unimodal and associative unimodal cortices associated with the task, indicating that higher cognitive processes required by the task may be common to both modalities. Such cognitive processes involved a common phonological network as well as lexico-semantic activations as revealed by the conjunction between all reading and listening tasks. The restriction of modality-specific regions to their corresponding unimodal cortices was replicated when looking at brain areas showing a greater increase during the comprehension of more complex linguistic units than words (such as sentences and texts) for each modality. Finally, we discuss the possible roles of regions showing pure effect of linguistic complexity, such as the anterior part of the superior temporal gyrus and the ventro-posterior part of the middle temporal gyrus that were activated for sentences and texts but not for isolated words, as well as a text-specific region found in the left posterior STS.