Search | arXiv e-print repository

Leveraging Synthetic Data to Learn Video Stabilization Under Adverse Conditions

Authors: Abdulrahman Kerim, Washington L. S. Ramos, Leandro Soriano Marcolino, Erickson R. Nascimento, Richard Jiang

Abstract: Video stabilization plays a central role to improve videos quality. However, despite the substantial progress made by these methods, they were, mainly, tested under standard weather and lighting conditions, and may perform poorly under adverse conditions. In this paper, we propose a synthetic-aware adverse weather robust algorithm for video stabilization that does not require real data and can be… ▽ More Video stabilization plays a central role to improve videos quality. However, despite the substantial progress made by these methods, they were, mainly, tested under standard weather and lighting conditions, and may perform poorly under adverse conditions. In this paper, we propose a synthetic-aware adverse weather robust algorithm for video stabilization that does not require real data and can be trained only on synthetic data. We also present Silver, a novel rendering engine to generate the required training data with an automatic ground-truth extraction procedure. Our approach uses our specially generated synthetic data for training an affine transformation matrix estimator avoiding the feature extraction issues faced by current methods. Additionally, since no video stabilization datasets under adverse conditions are available, we propose the novel VSAC105Real dataset for evaluation. We compare our method to five state-of-the-art video stabilization algorithms using two benchmarks. Our results show that current approaches perform poorly in at least one weather condition, and that, even training in a small dataset with synthetic data only, we achieve the best performance in terms of stability average score, distortion score, success rate, and average cropping ratio when considering all weather conditions. Hence, our video stabilization model generalizes well on real-world videos and does not require large-scale synthetic training data to converge. △ Less

Submitted 26 August, 2022; originally announced August 2022.

ACM Class: I.4.0; I.4.1; I.6.0

arXiv:2009.11063 [pdf, other]

doi 10.1109/TPAMI.2020.2983929

A Sparse Sampling-based framework for Semantic Fast-Forward of First-Person Videos

Authors: Michel Melo Silva, Washington Luis Souza Ramos, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Abstract: Technological advances in sensors have paved the way for digital cameras to become increasingly ubiquitous, which, in turn, led to the popularity of the self-recording culture. As a result, the amount of visual data on the Internet is moving in the opposite direction of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched stashed… ▽ More Technological advances in sensors have paved the way for digital cameras to become increasingly ubiquitous, which, in turn, led to the popularity of the self-recording culture. As a result, the amount of visual data on the Internet is moving in the opposite direction of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched stashed away in some computer folder or website. In this paper, we address the problem of creating smooth fast-forward videos without losing the relevant content. We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem. Using a smoothing frame transition and filling visual gaps between segments, our approach accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities. Experiments conducted on controlled videos and also on an unconstrained dataset of First-Person Videos (FPVs) show that, when creating fast-forward videos, our method is able to retain as much relevant information and smoothness as the state-of-the-art techniques, but in less processing time. △ Less

Submitted 21 September, 2020; originally announced September 2020.

Comments: Accepted at the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020. arXiv admin note: text overlap with arXiv:1802.08722

arXiv:1912.12655 [pdf, other]

Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network

Authors: Washington L. S. Ramos, Michel M. Silva, Edson R. Araujo, Alan C. Neves, Erickson R. Nascimento

Abstract: The growth of Social Networks has fueled the habit of people logging their day-to-day activities, and long First-Person Videos (FPVs) are one of the main tools in this new habit. Semantic-aware fast-forward methods are able to decrease the watch time and select meaningful moments, which is key to increase the chances of these videos being watched. However, these methods can not handle semantics in… ▽ More The growth of Social Networks has fueled the habit of people logging their day-to-day activities, and long First-Person Videos (FPVs) are one of the main tools in this new habit. Semantic-aware fast-forward methods are able to decrease the watch time and select meaningful moments, which is key to increase the chances of these videos being watched. However, these methods can not handle semantics in terms of personalization. In this work, we present a new approach to automatically creating personalized fast-forward videos for FPVs. Our approach explores the availability of text-centric data from the user's social networks such as status updates to infer her/his topics of interest and assigns scores to the input frames according to her/his preferences. Extensive experiments are conducted on three different datasets with simulated and real-world users as input, achieving an average F1 score of up to 12.8 percentage points higher than the best competitors. We also present a user study to demonstrate the effectiveness of our method. △ Less

Submitted 29 December, 2019; originally announced December 2019.

arXiv:1802.08722 [pdf, other]

doi 10.1109/CVPR.2018.00253

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Authors: Michel Melo Silva, Washington Luis Souza Ramos, Joao Klock Ferreira, Felipe Cadar Chamone, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Abstract: Thanks to the advances in the technology of low-cost digital cameras and the popularity of the self-recording culture, the amount of visual data on the Internet is going to the opposite side of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched in a computer folder or website. In this work, we address the problem of creating smo… ▽ More Thanks to the advances in the technology of low-cost digital cameras and the popularity of the self-recording culture, the amount of visual data on the Internet is going to the opposite side of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched in a computer folder or website. In this work, we address the problem of creating smooth fast-forward videos without losing the relevant content. We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem, which combined with a smoothing frame transition method accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities. The experiments show that our method is able to fast-forward videos to retain as much relevant information and smoothness as the state-of-the-art techniques in less time. We also present a new 80-hour multimodal (RGB-D, IMU, and GPS) dataset of first-person videos with annotations for recorder profile, frame scene, activities, interaction, and attention. △ Less

Submitted 4 April, 2019; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: Accepted for publication in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018. Link to the project wesite: https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/

arXiv:1711.03473 [pdf, other]

doi 10.1016/j.jvcir.2018.02.013

Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Authors: Michel Melo Silva, Washington Luis Souza Ramos, Felipe Cadar Chamone, João Pedro Klock Ferreira, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Abstract: The emergence of low-cost high-quality personal wearable cameras combined with the increasing storage capacity of video-sharing websites have evoked a growing interest in first-person videos, since most videos are composed of long-running unedited streams which are usually tedious and unpleasant to watch. State-of-the-art semantic fast-forward methods currently face the challenge of providing an a… ▽ More The emergence of low-cost high-quality personal wearable cameras combined with the increasing storage capacity of video-sharing websites have evoked a growing interest in first-person videos, since most videos are composed of long-running unedited streams which are usually tedious and unpleasant to watch. State-of-the-art semantic fast-forward methods currently face the challenge of providing an adequate balance between smoothness in visual flow and the emphasis on the relevant parts. In this work, we present the Multi-Importance Fast-Forward (MIFF), a fully automatic methodology to fast-forward egocentric videos facing these challenges. The dilemma of defining what is the semantic information of a video is addressed by a learning process based on the preferences of the user. Results show that the proposed method keeps over $3$ times more semantic content than the state-of-the-art fast-forward. Finally, we discuss the need of a particular video stabilization technique for fast-forward egocentric videos. △ Less

Submitted 7 March, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: Accepted to publication in the Journal of Visual Communication and Image Representation (JVCI) 2018. Project website: https://www.verlab.dcc.ufmg.br/semantic-hyperlapse

arXiv:1708.04160 [pdf, other]

doi 10.1109/ICIP.2016.7532977

Fast-Forward Video Based on Semantic Extraction

Authors: Washington Luis Souza Ramos, Michel Melo Silva, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Abstract: Thanks to the low operational cost and large storage capacity of smartphones and wearable devices, people are recording many hours of daily activities, sport actions and home videos. These videos, also known as egocentric videos, are generally long-running streams with unedited content, which make them boring and visually unpalatable, bringing up the challenge to make egocentric videos more appeal… ▽ More Thanks to the low operational cost and large storage capacity of smartphones and wearable devices, people are recording many hours of daily activities, sport actions and home videos. These videos, also known as egocentric videos, are generally long-running streams with unedited content, which make them boring and visually unpalatable, bringing up the challenge to make egocentric videos more appealing. In this work we propose a novel methodology to compose the new fast-forward video by selecting frames based on semantic information extracted from images. The experiments show that our approach outperforms the state-of-the-art as far as semantic information is concerned and that it is also able to produce videos that are more pleasant to be watched. △ Less

Submitted 16 August, 2017; v1 submitted 14 August, 2017; originally announced August 2017.

Comments: Accepted for publication and presented in 2016 IEEE International Conference on Image Processing (ICIP)

arXiv:1708.04146 [pdf, ps, other]

doi 10.1007/978-3-319-46604-0_40

Towards Semantic Fast-Forward and Stabilized Egocentric Videos

Authors: Michel Melo Silva, Washington Luis Souza Ramos, Joao Pedro Klock Ferreira, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Abstract: The emergence of low-cost personal mobiles devices and wearable cameras and the increasing storage capacity of video-sharing websites have pushed forward a growing interest towards first-person videos. Since most of the recorded videos compose long-running streams with unedited content, they are tedious and unpleasant to watch. The fast-forward state-of-the-art methods are facing challenges of bal… ▽ More The emergence of low-cost personal mobiles devices and wearable cameras and the increasing storage capacity of video-sharing websites have pushed forward a growing interest towards first-person videos. Since most of the recorded videos compose long-running streams with unedited content, they are tedious and unpleasant to watch. The fast-forward state-of-the-art methods are facing challenges of balancing the smoothness of the video and the emphasis in the relevant frames given a speed-up rate. In this work, we present a methodology capable of summarizing and stabilizing egocentric videos by extracting the semantic information from the frames. This paper also describes a dataset collection with several semantically labeled videos and introduces a new smoothness evaluation metric for egocentric videos that is used to test our method. △ Less

Submitted 16 August, 2017; v1 submitted 14 August, 2017; originally announced August 2017.

Comments: Accepted for publication and presented in the First International Workshop on Egocentric Perception, Interaction and Computing at European Conference on Computer Vision (EPIC@ECCV) 2016

Showing 1–7 of 7 results for author: Ramos, W L S