Google Scholar

Cross-modal discrete representation learning

AH Liu, SY Jin, CIJ Lai, A Rouditchenko, A Oliva… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent advances in representation learning have demonstrated an ability to represent
information from different modalities such as video, text, and audio in a single high-level
embedding vector. In this work we present a self-supervised learning framework that is able
to learn a representation that captures finer levels of granularity across different modalities
such as concepts or events represented by visual objects or spoken words. Our framework
relies on a discretized embedding space created via vector quantization that is shared …

Speichern Sie Cite Cited by 42 Related articles All 10 versions View as HTML

[PDF] mit.edu

[PDF][PDF] Cross-Modal Discrete Representation Learning

AHLSYJ Cheng, IJLA Rouditchenko, AOJ Glass - olivalab.mit.edu

In contrast to recent advances focusing on highlevel representation learning across
modalities, in this work we present a self-supervised learning framework that is able to learn
a representation that captures finer levels of granularity across different modalities such as
concepts or events represented by visual objects or spoken words. Our framework relies on
a discretized embedding space created via vector quantization that is shared across
different modalities. Beyond the shared embedding space, we propose a Cross-Modal Code …

Speichern Sie Cite Related articles View as HTML

Showing the best results for this search. See all results

Cite

Advanced search

Saved to My library

Cross-modal discrete representation learning

[PDF][PDF] Cross-Modal Discrete Representation Learning