SELF-VS: Self-supervised Encoding Learning For Video Summarization
Authors:
Hojjat Mokhtarabadi,
Kave Bahraman,
Mehrdad HosseinZadeh,
Mahdi Eftekhari
Abstract:
Despite its wide range of applications, video summarization is still held back by the scarcity of extensive datasets, largely due to the labor-intensive and costly nature of frame-level annotations. As a result, existing video summarization methods are prone to overfitting. To mitigate this challenge, we propose a novel self-supervised video representation learning method using knowledge distillat…
▽ More
Despite its wide range of applications, video summarization is still held back by the scarcity of extensive datasets, largely due to the labor-intensive and costly nature of frame-level annotations. As a result, existing video summarization methods are prone to overfitting. To mitigate this challenge, we propose a novel self-supervised video representation learning method using knowledge distillation to pre-train a transformer encoder. Our method matches its semantic video representation, which is constructed with respect to frame importance scores, to a representation derived from a CNN trained on video classification. Empirical evaluations on correlation-based metrics, such as Kendall's $τ$ and Spearman's $ρ$ demonstrate the superiority of our approach compared to existing state-of-the-art methods in assigning relative scores to the input frames.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.