Unleash the Potential of CLIP for Video Highlight Detection

Han, Donghoon; Seo, Seunghyeon; Park, Eunhwan; Nam, Seong-Uk; Kwak, Nojun

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.01745 (cs)

[Submitted on 2 Apr 2024]

Title:Unleash the Potential of CLIP for Video Highlight Detection

Authors:Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak

View PDF HTML (experimental)

Abstract:Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.01745 [cs.CV]
	(or arXiv:2404.01745v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.01745

Submission history

From: Donghoon Han [view email]
[v1] Tue, 2 Apr 2024 09:01:58 UTC (683 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2024-04

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Unleash the Potential of CLIP for Video Highlight Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Unleash the Potential of CLIP for Video Highlight Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators