Learning modality-invariant representations for speech and images | IEEE Conference Publication | IEEE Xplore