In real-world situations, the integration of sensory information in working memory (WM) is an important mechanism for the recognition of objects. Studies in single sensory modalities show that object recognition is facilitated if bottom-up inputs match a template held in WM, and that this effect may be linked to enhanced synchronization of neurons in the gamma-band (>30 Hz). Natural objects, however, frequently provide inputs to multiple sensory modalities. In this EEG study, we examined the integration of semantically matching or non-matching visual and auditory inputs using a delayed visual-to-auditory object-matching paradigm. In the event-related potentials (ERPs) triggered by auditory inputs, effects of semantic matching were observed after 120-170 ms at frontal and posterior regions, indicating WM-specific processing across modalities, and after 250-400 ms over medial-central regions, possibly reflecting the contextual integration of sensory inputs. Additionally, total gamma-band activity (GBA) with medial-central topography after 120-180 ms was larger for matching compared to non-matching trials. This demonstrates that multisensory matching in WM is reflected by GBA and that dynamic coupling of neural populations in this frequency range might be a crucial mechanism for integrative multisensory processes.