To investigate the neural mechanisms of auditory-visual integration, we recorded event-related potentials during a word-identification task, in which the stimulus was presented in the auditory (A), visual (V), and in the auditory-visual (AV) modalities. The reliability of the visual information varied at the high-reliability (VH) and low-reliability (VL) levels in both the V and AV presentations. The modulation of sensory integrations owing to the variation of cue reliability was revealed in the format of the double-difference waveform generated by subtracting the difference waveform AVL-(A+VL) from the difference waveform AVH-(A+VH). The results demonstrated (i) the early modulation of the activity in the auditory and visual cortex; (ii) subsequent spatial-temporal sequence of activities mostly occurred in multisensory areas; and (iii) the timing of final outputs of AV integration at around 370-410 ms poststimulus.