Multimodal clustering networks for self-supervised learning from unlabeled videos

B Chen, A Rouditchenko, K Duarte… - Proceedings of the …, 2021 - openaccess.thecvf.com
Multimodal self-supervised learning is getting more and more attention as it allows not only
to train large networks without human supervision but also to search and retrieve data
across various modalities. In this context, this paper proposes a framework that, starting from
a pre-trained backbone, learns a common multimodal embedding space that, in addition to
sharing representations across different modalities, enforces a grouping of semantically
similar instances. To this end, we extend the concept of instance-level contrastive learning …

[PDF][PDF] Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos Brian Chen Andrew Rouditchenko2 Kevin Duarte 3 Hilde Kuehne4 …

B Chen, A Rouditchenko - rpand002.github.io
Multimodal self-supervised learning is getting more and more attention as it allows not only
to train large networks without human supervision but also to search and retrieve data
across various modalities. In this context, this paper proposes a self-supervised training
framework that learns a common multimodal embedding space that, in addition to sharing
representations across different modalities, enforces a grouping of semantically similar
instances. To this end, we extend the concept of instance-level contrastive learning with a …
Showing the best results for this search. See all results