We investigated audio-visual (AV) perceptual integration by examining the effect of seeing the speaker's synchronised moving face on masked-speech detection ability. Signal amplification and higher-level cognitive accounts of an AV advantage were contrasted, the latter by varying whether participants knew the language of the speaker. An AV advantage was shown for sentences whose mid-to-high-frequency acoustic envelope was highly correlated with articulator movement, regardless of knowledge of the language. For low-correlation sentences, knowledge of the language had a large impact; for participants with no knowledge of the language an AV inhibitory effect was found (providing support for reports of a compelling AV illusion). The results indicate a role for both sensory enhancement and higher-level cognitive factors in AV speech detection.