Participants tend to produce a higher or lower vocal pitch in response to upward or downward visual motion, suggesting a pitch-motion correspondence between the visual and speech production processes. However, previous studies were contaminated by factors such as the meaning of vocalized words and the intrinsic pitch or tongue movements associated with the vowels. To address these issues, we examined the pitch-motion correspondence between simple visual motion and pitched speech production. Participants were required to produce a high- or low-pitched meaningless single vowel [a] in response to the upward or downward direction of a visual motion stimulus. Using a single vowel, we eliminated the artifacts related to the meaning, intrinsic pitch, and tongue movements of multiple vocalized vowels. The results revealed that vocal responses were faster when the pitch corresponded to the visual motion (consistent condition) than when it did not (inconsistent condition). This result indicates that the pitch-motion correspondence in speech production does not depend on the stimulus meaning, intrinsic pitch, or tongue movement of the vocalized words. In other words, the present study suggests that the pitch-motion correspondence can be explained more parsimoniously as an association between simple sensory (visual motion) and motoric (vocal pitch) features. Additionally, acoustic analysis revealed that speech production aligned with visual motion exhibited lower stress, greater confidence, and higher vocal fluency.