We tested the semantic coding hypothesis, which states that cross-modal interactions observed in speeded classification tasks arise after perceptual information is recoded into an abstract format common to perceptual and linguistic systems. Using a speeded classification task, we first confirmed the presence of congruence interactions between auditory pitch and visual lightness and observed Garner-type interference with nonlinguistic (perceptual) stimuli (low-frequency and high-frequency tones, black and white squares). Subsequently, we found that modifying the visual stimuli by (a) making them lexical (related words) or (b) reducing their compactness or figural 'goodness' altered congruence effects and Garner interference. The results are consistent with the semantic coding hypothesis, but only in part, and suggest the need for additional assumptions regarding the role of perceptual organization in cross-modal dimensional interactions.