Dynamical theories of speech processing propose that the auditory cortex parses acoustic information in parallel at the syllabic and phonemic timescales. We developed a paradigm to independently manipulate both linguistic timescales, and acquired intracranial recordings from 11 patients who are epileptic listening to French sentences. Our results indicate that (i) syllabic and phonemic timescales are both reflected in the acoustic spectral flux; (ii) during comprehension, the auditory cortex tracks the syllabic timescale in the theta range, while neural activity in the alpha-beta range phase locks to the phonemic timescale; (iii) these neural dynamics occur simultaneously and share a joint spatial location; (iv) the spectral flux embeds two timescales-in the theta and low-beta ranges-across 17 natural languages. These findings help us understand how the human brain extracts acoustic information from the continuous speech signal at multiple timescales simultaneously, a prerequisite for subsequent linguistic processing.