Search | arXiv e-print repository

RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl… ▽ More We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Preprint. Under review

arXiv:2312.00852 [pdf, other]

Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion

Authors: Litu Rout, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

Abstract: Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie's first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for post… ▽ More Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie's first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for posterior sampling. This paper introduces Second-order Tweedie sampler from Surrogate Loss (STSL), a novel sampler that offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation. Our theoretical results reveal that the second-order approximation is lower bounded by our surrogate loss that only requires $O(1)$ compute using the trace of the Hessian, and by the lower bound we derive a new drift term to make the reverse process tractable. Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations, respectively, while notably enhancing sampling quality on FFHQ, ImageNet, and COCO benchmarks. In addition, we show STSL extends to text-guided image editing and addresses residual distortions present from corrupted images in leading text-guided image editing methods. To our best knowledge, this is the first work to offer an efficient second-order approximation in solving inverse problems using latent diffusion and editing real-world images with corruptions. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2307.00619 [pdf, other]

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Authors: Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai

Abstract: We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often c… ▽ More We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: Preprint

arXiv:2302.06570 [pdf, ps, other]

Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD

Authors: Matthew Faw, Litu Rout, Constantine Caramanis, Sanjay Shakkottai

Abstract: This work considers the problem of finding a first-order stationary point of a non-convex function with potentially unbounded smoothness constant using a stochastic gradient oracle. We focus on the class of $(L_0,L_1)$-smooth functions proposed by Zhang et al. (ICLR'20). Empirical evidence suggests that these functions more closely captures practical machine learning problems as compared to the pe… ▽ More This work considers the problem of finding a first-order stationary point of a non-convex function with potentially unbounded smoothness constant using a stochastic gradient oracle. We focus on the class of $(L_0,L_1)$-smooth functions proposed by Zhang et al. (ICLR'20). Empirical evidence suggests that these functions more closely captures practical machine learning problems as compared to the pervasive $L_0$-smoothness. This class is rich enough to include highly non-smooth functions, such as $\exp(L_1 x)$ which is $(0,\mathcal{O}(L_1))$-smooth. Despite the richness, an emerging line of works achieves the $\widetilde{\mathcal{O}}(\frac{1}{\sqrt{T}})$ rate of convergence when the noise of the stochastic gradients is deterministically and uniformly bounded. This noise restriction is not required in the $L_0$-smooth setting, and in many practical settings is either not satisfied, or results in weaker convergence rates with respect to the noise scaling of the convergence rate. We develop a technique that allows us to prove $\mathcal{O}(\frac{\mathrm{poly}\log(T)}{\sqrt{T}})$ convergence rates for $(L_0,L_1)$-smooth functions without assuming uniform bounds on the noise support. The key innovation behind our results is a carefully constructed stopping time $τ$ which is simultaneously "large" on average, yet also allows us to treat the adaptive step sizes before $τ$ as (roughly) independent of the gradients. For general $(L_0,L_1)$-smooth functions, our analysis requires the mild restriction that the multiplicative noise parameter $σ_1 < 1$. For a broad subclass of $(L_0,L_1)$-smooth functions, our convergence rate continues to hold when $σ_1 \geq 1$. By contrast, we prove that many algorithms analyzed by prior works on $(L_0,L_1)$-smooth optimization diverge with constant probability even for smooth and strongly-convex functions when $σ_1 > 1$. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2302.01217 [pdf, other]

A Theoretical Justification for Image Inpainting using Denoising Diffusion Probabilistic Models

Authors: Litu Rout, Advait Parulekar, Constantine Caramanis, Sanjay Shakkottai

Abstract: We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint (Lugmayr et al., 2… ▽ More We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint (Lugmayr et al., 2022), and show that it has a bias due to misalignment that hampers sample recovery even in a two-state diffusion process. Motivated by our analysis, we propose a modified RePaint algorithm we call RePaint$^+$ that provably recovers the underlying true sample and enjoys a linear rate of convergence. It achieves this by rectifying the misalignment error present in drift and dispersion of the reverse process. To the best of our knowledge, this is the first linear convergence result for a diffusion based image inpainting algorithm. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: 30 pages, 5 figures, 1 Table

arXiv:2209.13570 [pdf, other]

Hierarchical Sliced Wasserstein Distance

Authors: Khai Nguyen, Tongzheng Ren, Huy Nguyen, Litu Rout, Tan Nguyen, Nhat Ho

Abstract: Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite i… ▽ More Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite its efficiency in the number of supports, estimating the sliced Wasserstein requires a relatively large number of projections in high-dimensional settings. Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively. We then formulate the approach into a new metric between measures, named Hierarchical Sliced Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the metricity of HSW. Moreover, we investigate the theoretical properties of HSW including its connection to SW variants and its computational and sample complexities. Finally, we compare the computational cost and generative quality of HSW with the conventional SW on the task of deep generative modeling using various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet. △ Less

Submitted 6 February, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Comments: Accepted to ICLR 2023, 29 pages, 8 figures, 3 tables,

arXiv:2202.01116 [pdf, other]

An Optimal Transport Perspective on Unpaired Image Super-Resolution

Authors: Milena Gazdieva, Litu Rout, Alexander Korotin, Andrey Kravchenko, Alexander Filippov, Evgeny Burnaev

Abstract: Real-world image super-resolution (SR) tasks often do not have paired datasets, which limits the application of supervised techniques. As a result, the tasks are usually approached by unpaired techniques based on Generative Adversarial Networks (GANs), which yield complex training losses with several regularization terms, e.g., content or identity losses. We theoretically investigate optimization… ▽ More Real-world image super-resolution (SR) tasks often do not have paired datasets, which limits the application of supervised techniques. As a result, the tasks are usually approached by unpaired techniques based on Generative Adversarial Networks (GANs), which yield complex training losses with several regularization terms, e.g., content or identity losses. We theoretically investigate optimization problems which arise in such models and find two surprizing observations. First, the learned SR map is always an optimal transport (OT) map. Second, we theoretically prove and empirically show that the learned map is biased, i.e., it does not actually transform the distribution of low-resolution images to high-resolution ones. Inspired by these findings, we propose an algorithm for unpaired SR which learns an unbiased OT map for the perceptual transport cost. Unlike the existing GAN-based alternatives, our algorithm has a simple optimization objective reducing the need for complex hyperparameter selection and an application of additional regularizations. At the same time, it provides a nearly state-of-the-art performance on the large-scale unpaired AIM19 dataset. △ Less

Submitted 11 October, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

arXiv:2110.02999 [pdf, other]

Generative Modeling with Optimal Transport Maps

Authors: Litu Rout, Alexander Korotin, Evgeny Burnaev

Abstract: With the discovery of Wasserstein GANs, Optimal Transport (OT) has become a powerful tool for large-scale generative modeling tasks. In these tasks, OT cost is typically used as the loss for training GANs. In contrast to this approach, we show that the OT map itself can be used as a generative model, providing comparable performance. Previous analogous approaches consider OT maps as generative mod… ▽ More With the discovery of Wasserstein GANs, Optimal Transport (OT) has become a powerful tool for large-scale generative modeling tasks. In these tasks, OT cost is typically used as the loss for training GANs. In contrast to this approach, we show that the OT map itself can be used as a generative model, providing comparable performance. Previous analogous approaches consider OT maps as generative models only in the latent spaces due to their poor performance in the original high-dimensional ambient space. In contrast, we apply OT maps directly in the ambient space, e.g., a space of high-dimensional images. First, we derive a min-max optimization algorithm to efficiently compute OT maps for the quadratic cost (Wasserstein-2 distance). Next, we extend the approach to the case when the input and output distributions are located in the spaces of different dimensions and derive error bounds for the computed OT map. We evaluate the algorithm on image generation and unpaired image restoration tasks. In particular, we consider denoising, colorization, and inpainting, where the optimality of the restoration map is a desired attribute, since the output (restored) image is expected to be close to the input (degraded) one. △ Less

Submitted 5 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: ICLR 2022

arXiv:2010.00522 [pdf, other]

Understanding the Role of Adversarial Regularization in Supervised Learning

Authors: Litu Rout

Abstract: Despite numerous attempts sought to provide empirical evidence of adversarial regularization outperforming sole supervision, the theoretical understanding of such phenomena remains elusive. In this study, we aim to resolve whether adversarial regularization indeed performs better than sole supervision at a fundamental level. To bring this insight into fruition, we study vanishing gradient issue, a… ▽ More Despite numerous attempts sought to provide empirical evidence of adversarial regularization outperforming sole supervision, the theoretical understanding of such phenomena remains elusive. In this study, we aim to resolve whether adversarial regularization indeed performs better than sole supervision at a fundamental level. To bring this insight into fruition, we study vanishing gradient issue, asymptotic iteration complexity, gradient flow and provable convergence in the context of sole supervision and adversarial regularization. The key ingredient is a theoretical justification supported by empirical evidence of adversarial acceleration in gradient descent. In addition, motivated by a recently introduced unit-wise capacity based generalization bound, we analyze the generalization error in adversarial framework. Guided by our observation, we cast doubts on the ability of this measure to explain generalization. We therefore leave as open questions to explore new measures that can explain generalization behavior in adversarial learning. Furthermore, we observe an intriguing phenomenon in the neural embedded vector space while contrasting adversarial learning with sole supervision. △ Less

Submitted 1 October, 2020; originally announced October 2020.

Comments: Under Review

arXiv:2010.00521 [pdf, other]

Why Adversarial Interaction Creates Non-Homogeneous Patterns: A Pseudo-Reaction-Diffusion Model for Turing Instability

Authors: Litu Rout

Abstract: Long after Turing's seminal Reaction-Diffusion (RD) model, the elegance of his fundamental equations alleviated much of the skepticism surrounding pattern formation. Though Turing model is a simplification and an idealization, it is one of the best-known theoretical models to explain patterns as a reminiscent of those observed in nature. Over the years, concerted efforts have been made to align th… ▽ More Long after Turing's seminal Reaction-Diffusion (RD) model, the elegance of his fundamental equations alleviated much of the skepticism surrounding pattern formation. Though Turing model is a simplification and an idealization, it is one of the best-known theoretical models to explain patterns as a reminiscent of those observed in nature. Over the years, concerted efforts have been made to align theoretical models to explain patterns in real systems. The apparent difficulty in identifying the specific dynamics of the RD system makes the problem particularly challenging. Interestingly, we observe Turing-like patterns in a system of neurons with adversarial interaction. In this study, we establish the involvement of Turing instability to create such patterns. By theoretical and empirical studies, we present a pseudo-reaction-diffusion model to explain the mechanism that may underlie these phenomena. While supervised learning attains homogeneous equilibrium, this paper suggests that the introduction of an adversary helps break this homogeneity to create non-homogeneous patterns at equilibrium. Further, we prove that randomly initialized gradient descent with over-parameterization can converge exponentially fast to an $ε$-stationary point even under adversarial interaction. In addition, different from sole supervision, we show that the solutions obtained under adversarial interaction are not limited to a tiny subspace around initialization. △ Less

Submitted 8 December, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: 35th AAAI Conference on Artificial Intelligence

arXiv:2004.03879 [pdf, other]

Monte-Carlo Siamese Policy on Actor for Satellite Image Super Resolution

Authors: Litu Rout, Saumyaa Shah, S Manthira Moorthi, Debajyoti Dhar

Abstract: In the past few years supervised and adversarial learning have been widely adopted in various complex computer vision tasks. It seems natural to wonder whether another branch of artificial intelligence, commonly known as Reinforcement Learning (RL) can benefit such complex vision tasks. In this study, we explore the plausible usage of RL in super resolution of remote sensing imagery. Guided by rec… ▽ More In the past few years supervised and adversarial learning have been widely adopted in various complex computer vision tasks. It seems natural to wonder whether another branch of artificial intelligence, commonly known as Reinforcement Learning (RL) can benefit such complex vision tasks. In this study, we explore the plausible usage of RL in super resolution of remote sensing imagery. Guided by recent advances in super resolution, we propose a theoretical framework that leverages the benefits of supervised and reinforcement learning. We argue that a straightforward implementation of RL is not adequate to address ill-posed super resolution as the action variables are not fully known. To tackle this issue, we propose to parameterize action variables by matrices, and train our policy network using Monte-Carlo sampling. We study the implications of parametric action space in a model-free environment from theoretical and empirical perspective. Furthermore, we analyze the quantitative and qualitative results on both remote sensing and non-remote sensing datasets. Based on our experiments, we report considerable improvement over state-of-the-art methods by encapsulating supervised models in a reinforcement learning framework. △ Less

Submitted 8 April, 2020; originally announced April 2020.

Comments: Computer Vision and Pattern Recognition (CVPR) Workshop on Large Scale Computer Vision for Remote Sensing Imagery

arXiv:2004.03867 [pdf, other]

S2A: Wasserstein GAN with Spatio-Spectral Laplacian Attention for Multi-Spectral Band Synthesis

Authors: Litu Rout, Indranil Misra, S Manthira Moorthi, Debajyoti Dhar

Abstract: Intersection of adversarial learning and satellite image processing is an emerging field in remote sensing. In this study, we intend to address synthesis of high resolution multi-spectral satellite imagery using adversarial learning. Guided by the discovery of attention mechanism, we regulate the process of band synthesis through spatio-spectral Laplacian attention. Further, we use Wasserstein GAN… ▽ More Intersection of adversarial learning and satellite image processing is an emerging field in remote sensing. In this study, we intend to address synthesis of high resolution multi-spectral satellite imagery using adversarial learning. Guided by the discovery of attention mechanism, we regulate the process of band synthesis through spatio-spectral Laplacian attention. Further, we use Wasserstein GAN with gradient penalty norm to improve training and stability of adversarial learning. In this regard, we introduce a new cost function for the discriminator based on spatial attention and domain adaptation loss. We critically analyze the qualitative and quantitative results compared with state-of-the-art methods using widely adopted evaluation metrics. Our experiments on datasets of three different sensors, namely LISS-3, LISS-4, and WorldView-2 show that attention learning performs favorably against state-of-the-art methods. Using the proposed method we provide an additional data product in consistent with existing high resolution bands. Furthermore, we synthesize over 4000 high resolution scenes covering various terrains to analyze scientific fidelity. At the end, we demonstrate plausible large scale real world applications of the synthesized band. △ Less

Submitted 8 April, 2020; originally announced April 2020.

Comments: Computer Vision and Pattern Recognition (CVPR) Workshop on Large Scale Computer Vision for Remote Sensing Imagery

arXiv:1910.13993 [pdf, ps, other]

Is Supervised Learning With Adversarial Features Provably Better Than Sole Supervision?

Authors: Litu Rout

Abstract: Generative Adversarial Networks (GAN) have shown promising results on a wide variety of complex tasks. Recent experiments show adversarial training provides useful gradients to the generator that helps attain better performance. In this paper, we intend to theoretically analyze whether supervised learning with adversarial features can outperform sole supervision, or not. First, we show that superv… ▽ More Generative Adversarial Networks (GAN) have shown promising results on a wide variety of complex tasks. Recent experiments show adversarial training provides useful gradients to the generator that helps attain better performance. In this paper, we intend to theoretically analyze whether supervised learning with adversarial features can outperform sole supervision, or not. First, we show that supervised learning without adversarial features suffer from vanishing gradient issue in near optimal region. Second, we analyze how adversarial learning augmented with supervised signal mitigates this vanishing gradient issue. Finally, we prove our main result that shows supervised learning with adversarial features can be better than sole supervision (under some mild assumptions). We support our main result on two fronts (i) expected empirical risk and (ii) rate of convergence. △ Less

Submitted 20 April, 2020; v1 submitted 30 October, 2019; originally announced October 2019.

arXiv:1906.01551 [pdf, other]

doi 10.1007/978-3-030-20890-5_41

Learning Rotation Adaptive Correlation Filters in Robust Visual Object Tracking

Authors: Litu Rout, Priya Mariam Raju, Deepak Mishra, Rama Krishna Sai Subrahmanyam Gorthi

Abstract: Visual object tracking is one of the major challenges in the field of computer vision. Correlation Filter (CF) trackers are one of the most widely used categories in tracking. Though numerous tracking algorithms based on CFs are available today, most of them fail to efficiently detect the object in an unconstrained environment with dynamically changing object appearance. In order to tackle such ch… ▽ More Visual object tracking is one of the major challenges in the field of computer vision. Correlation Filter (CF) trackers are one of the most widely used categories in tracking. Though numerous tracking algorithms based on CFs are available today, most of them fail to efficiently detect the object in an unconstrained environment with dynamically changing object appearance. In order to tackle such challenges, the existing strategies often rely on a particular set of algorithms. Here, we propose a robust framework that offers the provision to incorporate illumination and rotation invariance in the standard Discriminative Correlation Filter (DCF) formulation. We also supervise the detection stage of DCF trackers by eliminating false positives in the convolution response map. Further, we demonstrate the impact of displacement consistency on CF trackers. The generality and efficiency of the proposed framework is illustrated by integrating our contributions into two state-of-the-art CF trackers: SRDCF and ECO. As per the comprehensive experiments on the VOT2016 dataset, our top trackers show substantial improvement of 14.7% and 6.41% in robustness, 11.4% and 1.71% in Average Expected Overlap (AEO) over the baseline SRDCF and ECO, respectively. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: Published in ACCV 2018

Journal ref: ACCV: Asian Conference on Computer Vision 2018

arXiv:1905.02749 [pdf, other]

DeepSWIR: A Deep Learning Based Approach for the Synthesis of Short-Wave InfraRed Band using Multi-Sensor Concurrent Datasets

Authors: Litu Rout, Yatharath Bhateja, Ankur Garg, Indranil Mishra, S Manthira Moorthi, Debjyoti Dhar

Abstract: Convolutional Neural Network (CNN) is achieving remarkable progress in various computer vision tasks. In the past few years, the remote sensing community has observed Deep Neural Network (DNN) finally taking off in several challenging fields. In this study, we propose a DNN to generate a predefined High Resolution (HR) synthetic spectral band using an ensemble of concurrent Low Resolution (LR) ban… ▽ More Convolutional Neural Network (CNN) is achieving remarkable progress in various computer vision tasks. In the past few years, the remote sensing community has observed Deep Neural Network (DNN) finally taking off in several challenging fields. In this study, we propose a DNN to generate a predefined High Resolution (HR) synthetic spectral band using an ensemble of concurrent Low Resolution (LR) bands and existing HR bands. Of particular interest, the proposed network, namely DeepSWIR, synthesizes Short-Wave InfraRed (SWIR) band at 5m Ground Sampling Distance (GSD) using Green (G), Red (R) and Near InfraRed (NIR) bands at both 24m and 5m GSD, and SWIR band at 24m GSD. To our knowledge, the highest spatial resolution of commercially deliverable SWIR band is at 7.5m GSD. Also, we propose a Gaussian feathering based image stitching approach in light of processing large satellite imagery. To experimentally validate the synthesized HR SWIR band, we critically analyse the qualitative and quantitative results produced by DeepSWIR using state-of-the-art evaluation metrics. Further, we convert the synthesized DN values to Top Of Atmosphere (TOA) reflectance and compare with the corresponding band of Sentinel-2B. Finally, we show one real world application of the synthesized band by using it to map wetland resources over our region of interest. △ Less

Submitted 7 May, 2019; originally announced May 2019.

arXiv:1709.06057 [pdf, other]

Rotation Adaptive Visual Object Tracking with Motion Consistency

Authors: Litu Rout, Sidhartha, Gorthi R. K. S. S. Manyam, Deepak Mishra

Abstract: Visual Object tracking research has undergone significant improvement in the past few years. The emergence of tracking by detection approach in tracking paradigm has been quite successful in many ways. Recently, deep convolutional neural networks have been extensively used in most successful trackers. Yet, the standard approach has been based on correlation or feature selection with minimal consid… ▽ More Visual Object tracking research has undergone significant improvement in the past few years. The emergence of tracking by detection approach in tracking paradigm has been quite successful in many ways. Recently, deep convolutional neural networks have been extensively used in most successful trackers. Yet, the standard approach has been based on correlation or feature selection with minimal consideration given to motion consistency. Thus, there is still a need to capture various physical constraints through motion consistency which will improve accuracy, robustness and more importantly rotation adaptiveness. Therefore, one of the major aspects of this paper is to investigate the outcome of rotation adaptiveness in visual object tracking. Among other key contributions, the paper also includes various consistencies that turn out to be extremely effective in numerous challenging sequences than the current state-of-the-art. △ Less

Submitted 22 November, 2017; v1 submitted 18 September, 2017; originally announced September 2017.

Comments: Accepted conference paper WACV 2018

Showing 1–16 of 16 results for author: Rout, L