Deep Multimodal Semantic Embeddings for Speech and Images

Harwath, David; Glass, James

Computer Science > Computer Vision and Pattern Recognition

arXiv:1511.03690 (cs)

[Submitted on 11 Nov 2015]

Title:Deep Multimodal Semantic Embeddings for Speech and Images

Authors:David Harwath, James Glass

View PDF

Abstract:In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities. We employ a pair of convolutional neural networks to model visual objects and speech signals at the word level, and tie the networks together with an embedding and alignment model which learns a joint semantic space over both modalities. We evaluate our model using image search and annotation tasks on the Flickr8k dataset, which we augmented by collecting a corpus of 40,000 spoken captions using Amazon Mechanical Turk.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:1511.03690 [cs.CV]
	(or arXiv:1511.03690v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1511.03690

Submission history

From: David Harwath [view email]
[v1] Wed, 11 Nov 2015 21:30:10 UTC (2,105 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2015-11

Change to browse by:

cs
cs.AI
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

David F. Harwath
James R. Glass

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Multimodal Semantic Embeddings for Speech and Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Deep Multimodal Semantic Embeddings for Speech and Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators