Zum Hauptinhalt springen

Showing 1–46 of 46 results for author: Vineet, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11748  [pdf, other

    cs.CV

    GeoMeter: Probing Depth and Height Perception of Large Visual-Language Models

    Authors: Shehreen Azad, Yash Jain, Rishit Garg, Yogesh S Rawat, Vibhav Vineet

    Abstract: Geometric understanding is crucial for navigating and interacting with our environment. While large Vision Language Models (VLMs) demonstrate impressive capabilities, deploying them in real-world scenarios necessitates a comparable geometric understanding in visual perception. In this work, we focus on the geometric comprehension of these models; specifically targeting the depths and heights of ob… ▽ More

    Submitted 30 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2406.14852  [pdf, other

    cs.CV cs.AI

    Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

    Authors: Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Neel Joshi

    Abstract: Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning -- a fundamental component of human cognition -- remains under-explored. We develop novel benchmarks that cover diverse aspects of spatial reasoning such as relationship understanding, navigation,… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.10834  [pdf, other

    cs.CL cs.AI cs.LG

    Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning

    Authors: Joykirat Singh, Akshay Nambi, Vibhav Vineet

    Abstract: Large Language Models (LLMs) have been applied to Math Word Problems (MWPs) with transformative impacts, revolutionizing how these complex problems are approached and solved in various domains including educational settings. However, the evaluation of these models often prioritizes final accuracy, overlooking the crucial aspect of reasoning capabilities. This work addresses this gap by focusing on… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2402.19405  [pdf, other

    cs.CV

    Navigating Hallucinations for Reasoning of Unintentional Activities

    Authors: Shresth Grover, Vibhav Vineet, Yogesh S Rawat

    Abstract: In this work we present a novel task of understanding unintentional human activities in videos. We formalize this problem as a reasoning task under zero-shot scenario, where given a video of an unintentional activity we want to know why it transitioned from intentional to unintentional. We first evaluate the effectiveness of current state-of-the-art Large Multimodal Models on this reasoning task a… ▽ More

    Submitted 3 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  5. arXiv:2312.14216  [pdf, other

    cs.CV

    DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

    Authors: Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge

    Abstract: The popularization of Text-to-Image (T2I) diffusion models enables the generation of high-quality images from text descriptions. However, generating diverse customized images with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting commonalities from a set of reference images while creating… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  6. arXiv:2312.07509  [pdf, other

    cs.CV cs.LG

    PEEKABOO: Interactive Video Generation via Masked-Diffusion

    Authors: Yash Jain, Anshul Nasery, Vibhav Vineet, Harkirat Behl

    Abstract: Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal contro… ▽ More

    Submitted 19 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Project webpage - https://jinga-lala.github.io/projects/Peekaboo/

  7. arXiv:2311.04894  [pdf, other

    cs.CV cs.AI cs.LG

    DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets

    Authors: Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet

    Abstract: Construction of a universal detector poses a crucial question: How can we most effectively train a model on a large mixture of datasets? The answer lies in learning dataset-specific features and ensembling their knowledge but do all this in a single model. Previous methods achieve this by having separate detection heads on a common backbone but that results in a significant increase in parameters.… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: https://github.com/jinga-lala/DAMEX

  8. arXiv:2309.07499  [pdf, other

    cs.CV

    Efficiently Robustify Pre-trained Models

    Authors: Nishant Jain, Harkirat Behl, Yogesh Singh Rawat, Vibhav Vineet

    Abstract: A recent trend in deep learning algorithms has been towards training large scale models, having high parameter count and trained on big dataset. However, robustness of such large scale models towards real-world settings is still a less-explored topic. In this work, we first benchmark the performance of these models under different perturbations and datasets thereby representing real-world shifts,… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  9. arXiv:2309.05956  [pdf, other

    cs.CV

    Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation

    Authors: Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet

    Abstract: We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-to-image synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach1 decouples training data generation into foreground object generation, and contextually coherent background generation. To generate foreground objects, we employ a straightforward textual template,… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Code in https://github.com/gyhandy/Text2Image-for-Detection

  10. arXiv:2306.09278  [pdf, other

    cs.CV

    Robustness Analysis on Foundational Segmentation Models

    Authors: Madeline Chantry Schiappa, Shehreen Azad, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet

    Abstract: Due to the increase in computational resources and accessibility of data, an increase in large, deep learning models trained on copious amounts of multi-modal data using self-supervised or semi-supervised learning have emerged. These ``foundation'' models are often adapted to a variety of downstream tasks like classification, object detection, and segmentation with little-to-no training on the tar… ▽ More

    Submitted 26 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: This benchmark along with the code and datasets is available at: https://tinyurl.com/fm-robust. Accepted at CVPRW 2024

  11. arXiv:2306.06010  [pdf, other

    cs.CV

    A Large-Scale Analysis on Self-Supervised Video Representation Learning

    Authors: Akash Kumar, Ashlesha Kumar, Vibhav Vineet, Yogesh Singh Rawat

    Abstract: Self-supervised learning is an effective way for label-free model pre-training, especially in the video domain where labeling is expensive. Existing self-supervised works in the video domain use varying experimental setups to demonstrate their effectiveness and comparison across approaches becomes challenging with no standard benchmark. In this work, we first provide a benchmark that enables a com… ▽ More

    Submitted 20 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  12. arXiv:2305.18583  [pdf, other

    cs.CV cs.AI

    Controllable Text-to-Image Generation with GPT-4

    Authors: Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin Wang

    Abstract: Current text-to-image generation models often struggle to follow textual instructions, especially the ones requiring spatial reasoning. On the other hand, Large Language Models (LLMs), such as GPT-4, have shown remarkable precision in generating code snippets for sketching out text inputs graphically, e.g., via TikZ. In this work, we introduce Control-GPT to guide the diffusion-based text-to-image… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  13. arXiv:2303.08789  [pdf, other

    cs.RO cs.AI cs.LG

    PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

    Authors: Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

    Abstract: A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos -- a type of data available… ▽ More

    Submitted 8 November, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  14. arXiv:2212.10015  [pdf, other

    cs.CV cs.AI cs.CL

    Benchmarking Spatial Relationships in Text-to-Image Generation

    Authors: Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

    Abstract: Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial understanding capabilities. We investigate the ability of… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: preprint; Code and Data at https://github.com/microsoft/VISOR and https://huggingface.co/datasets/tgokhale/sr2d_visor

  15. arXiv:2212.07629  [pdf, other

    cs.CV

    EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation

    Authors: Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet

    Abstract: We propose EM-PASTE: an Expectation Maximization(EM) guided Cut-Paste compositional dataset augmentation approach for weakly-supervised instance segmentation using only image-level supervision. The proposed method consists of three main components. The first component generates high-quality foreground object masks. To this end, an EM-like approach is proposed that iteratively refines an initial se… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: 15 pages (including appendix), 7 figures

  16. arXiv:2210.12350  [pdf, other

    cs.CV cs.AI cs.LG

    Instance-Aware Image Completion

    Authors: Jinoh Cho, Minguk Kang, Vibhav Vineet, Jaesik Park

    Abstract: Image completion is a task that aims to fill in the missing region of a masked image with plausible contents. However, existing image completion methods tend to fill in the missing region with the surrounding texture instead of hallucinating a visual instance that is suitable in accordance with the context of the scene. In this work, we propose a novel image completion model, dubbed ImComplete, th… ▽ More

    Submitted 26 May, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: AI for Content Creation (AI4CC) CVPR workshop, 2023

  17. arXiv:2209.10986  [pdf, other

    cs.RO cs.CV

    Learning to Simulate Realistic LiDARs

    Authors: Benoit Guillard, Sai Vemprala, Jayesh K. Gupta, Ondrej Miksik, Vibhav Vineet, Pascal Fua, Ashish Kapoor

    Abstract: Simulating realistic sensors is a challenging part in data generation for autonomous systems, often involving carefully handcrafted sensor design, scene properties, and physics modeling. To alleviate this, we introduce a pipeline for data-driven simulation of a realistic LiDAR sensor. We propose a model that learns a mapping between RGB images and corresponding LiDAR features such as raydrop or pe… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: IROS2022 paper

  18. arXiv:2207.11368  [pdf, other

    cs.CV

    Neural-Sim: Learning to Generate Training Data with NeRF

    Authors: Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet

    Abstract: Training computer vision models usually requires collecting and labeling vast amounts of imagery under a diverse set of scene configurations and properties. This process is incredibly time-consuming, and it is challenging to ensure that the captured data distribution maps well to the target domain of an application scenario. Recently, synthetic data has emerged as a way to address both of these is… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  19. arXiv:2207.05205  [pdf, other

    cs.CV cs.LG

    Scaling Novel Object Detection with Weakly Supervised Detection Transformers

    Authors: Tyler LaBonte, Yale Song, Xin Wang, Vibhav Vineet, Neel Joshi

    Abstract: A critical object detection task is finetuning an existing model to detect novel objects, but the standard workflow requires bounding box annotations which are time-consuming and expensive to collect. Weakly supervised object detection (WSOD) offers an appealing alternative, where object detectors can be trained using image-level labels. However, the practical application of current WSOD models is… ▽ More

    Submitted 25 May, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: WACV 2023. Preliminary version appeared in CVPR 2022 Workshop on Transformers for Vision

  20. arXiv:2207.05006  [pdf, other

    cs.RO cs.AI cs.LG

    TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs

    Authors: Christopher Agia, Krishna Murthy Jatavallabhula, Mohamed Khodeir, Ondrej Miksik, Vibhav Vineet, Mustafa Mukadam, Liam Paull, Florian Shkurti

    Abstract: 3D scene graphs (3DSGs) are an emerging description; unifying symbolic, topological, and metric scene representations. However, typical 3DSGs contain hundreds of objects and symbols even for small environments; rendering task planning on the full graph impractical. We construct TASKOGRAPHY, the first large-scale robotic task planning benchmark over 3DSGs. While most benchmarking efforts in this ar… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Video: https://www.youtube.com/watch?v=mM4v5hP4LdA&ab_channel=KrishnaMurthy . Project page: https://taskography.github.io/ . 18 pages, 7 figures. In proceedings of Conference on Robot Learning (CoRL) 2021. The first two authors contributed equally

    ACM Class: I.2.8; I.2.9; I.2.10; I.2.6

    Journal ref: PMLR 164 (2022) 46-58

  21. arXiv:2207.02159  [pdf, other

    cs.CV cs.MM

    Robustness Analysis of Video-Language Models Against Visual and Language Perturbations

    Authors: Madeline C. Schiappa, Shruti Vyas, Hamid Palangi, Yogesh S. Rawat, Vibhav Vineet

    Abstract: Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-vid… ▽ More

    Submitted 18 July, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2022 Datasets and Benchmarks Track. This projects webpage is located at https://bit.ly/3CNOly4

    Journal ref: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (2022)

  22. arXiv:2207.01398  [pdf, other

    cs.CV eess.IV

    Large-scale Robustness Analysis of Video Action Recognition Models

    Authors: Madeline Chantry Schiappa, Naman Biyani, Prudvi Kamtam, Shruti Vyas, Hamid Palangi, Vibhav Vineet, Yogesh Rawat

    Abstract: We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world d… ▽ More

    Submitted 7 April, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted in 2023 Conference on Computer Vision and Pattern Recognition (CVPR)

  23. arXiv:2206.09592  [pdf, other

    cs.CV

    DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection

    Authors: Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet

    Abstract: We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-toimage synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach decouples training data generation into foreground object mask generation and background (context) image generation. For foreground object mask generation, we use a simple textual template with obje… ▽ More

    Submitted 21 December, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: v3(same as v2) version, update structure (add foreground generation, stable diffusion), add more experiments

  24. arXiv:2204.08945  [pdf, other

    cs.CV cs.AI cs.LG

    Missingness Bias in Model Debugging

    Authors: Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, Aleksander Madry

    Abstract: Missingness, or the absence of features from an input, is a concept fundamental to many model debugging tools. However, in computer vision, pixels cannot simply be removed from an image. One thus tends to resort to heuristics such as blacking out pixels, which may in turn introduce bias into the debugging process. We study such biases and, in particular, show how transformer-based architectures ca… ▽ More

    Submitted 13 June, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Published at ICLR 2022

  25. Image Retrieval from Contextual Descriptions

    Authors: Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, Edoardo Ponti, Siva Reddy

    Abstract: The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: accepted to ACL 2022

  26. arXiv:2203.10488  [pdf, other

    cs.RO cs.AI cs.CV

    Inferring Articulated Rigid Body Dynamics from RGBD Video

    Authors: Eric Heiden, Ziang Liu, Vibhav Vineet, Erwin Coumans, Gaurav S. Sukhatme

    Abstract: Being able to reproduce physical phenomena ranging from light interaction to contact mechanics, simulators are becoming increasingly useful in more and more application domains where real-world interaction or labeled data are difficult to obtain. Despite recent progress, significant human effort is needed to configure simulators to accurately reproduce real-world behavior. We introduce a pipeline… ▽ More

    Submitted 11 September, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: IROS 2022 camera-ready

  27. arXiv:2203.08130  [pdf, other

    cs.CV cs.AI cs.LG

    One Network Doesn't Rule Them All: Moving Beyond Handcrafted Architectures in Self-Supervised Learning

    Authors: Sharath Girish, Debadeepta Dey, Neel Joshi, Vibhav Vineet, Shital Shah, Caio Cesar Teodoro Mendes, Abhinav Shrivastava, Yale Song

    Abstract: The current literature on self-supervised learning (SSL) focuses on developing learning objectives to train neural networks more effectively on unlabeled data. The typical development process involves taking well-established architectures, e.g., ResNet demonstrated on ImageNet, and using them to evaluate newly developed objectives on downstream scenarios. While convenient, this does not take into… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  28. arXiv:2201.04309  [pdf, other

    cs.CV cs.LG

    Robust Contrastive Learning against Noisy Views

    Authors: Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song

    Abstract: Contrastive learning relies on an assumption that positive pairs contain related views, e.g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance. But what if this assumption is violated? The literature suggests that contrastive learning produces suboptimal representations in the presence of noisy views, e.g., false positiv… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  29. arXiv:2111.09301  [pdf, other

    cs.CV cs.AI

    Learning to Align Sequential Actions in the Wild

    Authors: Weizhe Liu, Bugra Tekin, Huseyin Coskun, Vibhav Vineet, Pascal Fua, Marc Pollefeys

    Abstract: State-of-the-art methods for self-supervised sequential action alignment rely on deep networks that find correspondences across videos in time. They either learn frame-to-frame mapping across sequences, which does not leverage temporal information, or assume monotonic alignment between each video pair, which ignores variations in the order of actions. As such, these methods are not able to deal wi… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  30. arXiv:2106.13364  [pdf, other

    cs.AI cs.CV cs.LG

    CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning

    Authors: Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, Ashish Kapoor

    Abstract: The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examp… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  31. arXiv:2106.03805  [pdf, other

    cs.CV cs.LG stat.ML

    3DB: A Framework for Debugging Computer Vision Models

    Authors: Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

    Abstract: We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation. We demonstrate, through a wide range of use cases, that 3DB allows users to discover vulnerabilities in computer vision systems and gain insights into how models make decisions. 3DB captures and generalizes many robustness analyses from prior work, and enables one to study th… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  32. arXiv:2103.08457  [pdf, other

    cs.CV cs.AI

    RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs

    Authors: Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, Richard Hartley

    Abstract: Although 3D Convolutional Neural Networks are essential for most learning based applications involving dense 3D data, their applicability is limited due to excessive memory and computational requirements. Compressing such networks by pruning therefore becomes highly desirable. However, pruning 3D CNNs is largely unexplored possibly because of the complex nature of typical pruning algorithms that e… ▽ More

    Submitted 8 February, 2021; originally announced March 2021.

    Comments: this is an extension of our 3DV2020 conference paper RANP. arXiv admin note: substantial text overlap with arXiv:2010.02488

  33. arXiv:2010.10691  [pdf, other

    cs.SD cs.LG eess.AS

    Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks

    Authors: Ziqi Fan, Vibhav Vineet, Chenshen Lu, T. W. Wu, Kyla McMullen

    Abstract: Acoustic scattering is strongly influenced by boundary geometry of objects over which sound scatters. The present work proposes a method to infer object geometry from scattering features by training convolutional neural networks. The training data is generated from a fast numerical solver developed on CUDA. The complete set of simulations is sampled to generate multiple datasets containing differe… ▽ More

    Submitted 10 February, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted by ICASSP 2021

  34. arXiv:2010.02488  [pdf, ps, other

    cs.CV cs.AI cs.LG

    RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs

    Authors: Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, Richard Hartley

    Abstract: Although 3D Convolutional Neural Networks (CNNs) are essential for most learning based applications involving dense 3D data, their applicability is limited due to excessive memory and computational requirements. Compressing such networks by pruning therefore becomes highly desirable. However, pruning 3D CNNs is largely unexplored possibly because of the complex nature of typical pruning algorithms… ▽ More

    Submitted 25 October, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: International Conference on 3D Vision (3DV), 2020 (Oral)

  35. arXiv:2008.08424  [pdf, other

    cs.CV cs.GR cs.LG stat.ML

    AutoSimulate: (Quickly) Learning Synthetic Data Generation

    Authors: Harkirat Singh Behl, Atılım Güneş Baydin, Ran Gal, Philip H. S. Torr, Vibhav Vineet

    Abstract: Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and valida… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

    Journal ref: European Conference on Computer Vision (ECCV) 2020

  36. arXiv:2001.07791  [pdf, other

    cs.CV

    Depth Completion Using a View-constrained Deep Prior

    Authors: Pallabi Ghosh, Vibhav Vineet, Larry S. Davis, Abhinav Shrivastava, Sudipta Sinha, Neel Joshi

    Abstract: Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting. We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we optimize a random… ▽ More

    Submitted 1 December, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

  37. arXiv:1911.01802  [pdf, other

    eess.AS cs.LG cs.SD eess.IV eess.SP

    Fast acoustic scattering using convolutional neural networks

    Authors: Ziqi Fan, Vibhav Vineet, Hannes Gamper, Nikunj Raghuvanshi

    Abstract: Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation. We propose training a convolutional neural network to map from a convex scatterer's cross-section to a 2D slice of the resulting spatial loudness distribution. We show that employing a full-resolution residual network for… ▽ More

    Submitted 15 February, 2020; v1 submitted 30 October, 2019; originally announced November 2019.

    Comments: Accepted by ICASSP 2020

  38. arXiv:1909.06993  [pdf, other

    cs.CV cs.RO

    Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations

    Authors: Rogerio Bonatti, Ratnesh Madaan, Vibhav Vineet, Sebastian Scherer, Ashish Kapoor

    Abstract: Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to g… ▽ More

    Submitted 8 March, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

  39. arXiv:1903.06708  [pdf, other

    cs.CV

    Live Reconstruction of Large-Scale Dynamic Outdoor Worlds

    Authors: Ondrej Miksik, Vibhav Vineet

    Abstract: Standard 3D reconstruction pipelines assume stationary world, therefore suffer from `ghost artifacts' whenever dynamic objects are present in the scene. Recent approaches has started tackling this issue, however, they typically either only discard dynamic information, represent it using bounding boxes or per-frame depth or rely on approaches that are inherently slow and not suitable to online sett… ▽ More

    Submitted 15 April, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: CVPR 2019 workshop on dynamic scene reconstruction

  40. arXiv:1902.09085  [pdf, other

    cs.CV

    Privacy-Preserving Action Recognition using Coded Aperture Videos

    Authors: Zihao W. Wang, Vibhav Vineet, Francesco Pittaluga, Sudipta Sinha, Oliver Cossairt, Sing Bing Kang

    Abstract: The risk of unauthorized remote access of streaming video from networked cameras underlines the need for stronger privacy safeguards. We propose a lens-free coded aperture camera system for human action recognition that is privacy-preserving. While coded aperture systems exist, we believe ours is the first system designed for action recognition without the need for image restoration as an intermed… ▽ More

    Submitted 16 April, 2019; v1 submitted 24 February, 2019; originally announced February 2019.

    Comments: CVCOPS2019

  41. arXiv:1902.03334  [pdf, other

    cs.CV cs.AI cs.RO

    Photorealistic Image Synthesis for Object Instance Detection

    Authors: Tomas Hodan, Vibhav Vineet, Ran Gal, Emanuel Shalev, Jon Hanzelka, Treb Connell, Pedro Urbina, Sudipta N. Sinha, Brian Guenter

    Abstract: We present an approach to synthesize highly photorealistic images of 3D object models, which we use to train a convolutional neural network for detecting the objects in real images. The proposed approach has three key ingredients: (1) 3D object models are rendered in 3D models of complete scenes with realistic materials and lighting, (2) plausible geometric configuration of objects and cameras in… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  42. arXiv:1608.02192  [pdf, other

    cs.CV

    Playing for Data: Ground Truth from Computer Games

    Authors: Stephan R. Richter, Vibhav Vineet, Stefan Roth, Vladlen Koltun

    Abstract: Recent progress in computer vision has been driven by high-capacity models trained on large datasets. Unfortunately, creating large datasets with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating pixel-accurate semantic label maps for images extracted from modern computer games. Although the source cod… ▽ More

    Submitted 7 August, 2016; originally announced August 2016.

    Comments: Accepted to the 14th European Conference on Computer Vision (ECCV 2016)

    ACM Class: I.4.8

  43. SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes

    Authors: Stuart Golodetz, Michael Sapienza, Julien P. C. Valentin, Vibhav Vineet, Ming-Ming Cheng, Anurag Arnab, Victor A. Prisacariu, Olaf Kähler, Carl Yuheng Ren, David W. Murray, Shahram Izadi, Philip H. S. Torr

    Abstract: We present an open-source, real-time implementation of SemanticPaint, a system for geometric reconstruction, object-class segmentation and learning of 3D scenes. Using our system, a user can walk into a room wearing a depth camera and a virtual reality headset, and both densely reconstruct the 3D scene and interactively segment the environment into object classes such as 'chair', 'floor' and 'tabl… ▽ More

    Submitted 13 October, 2015; originally announced October 2015.

    Comments: 33 pages, Project: http://www.semantic-paint.com, Code: https://github.com/torrvision/spaint

    ACM Class: I.2.10

  44. Conditional Random Fields as Recurrent Neural Networks

    Authors: Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr

    Abstract: Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we i… ▽ More

    Submitted 13 April, 2016; v1 submitted 11 February, 2015; originally announced February 2015.

    Comments: This paper is published in IEEE ICCV 2015

  45. arXiv:1403.6275  [pdf, other

    cs.CV

    A Tiered Move-making Algorithm for General Non-submodular Pairwise Energies

    Authors: Vibhav Vineet, Jonathan Warrell, Philip H. S. Torr

    Abstract: A large number of problems in computer vision can be modelled as energy minimization problems in a Markov Random Field (MRF) or Conditional Random Field (CRF) framework. Graph-cuts based $α$-expansion is a standard move-making method to minimize the energy functions with sub-modular pairwise terms. However, certain problems require more complex pairwise terms where the $α$-expansion method is gene… ▽ More

    Submitted 25 March, 2014; originally announced March 2014.

  46. arXiv:1310.4389  [pdf, other

    cs.GR cs.CV

    ImageSpirit: Verbal Guided Image Parsing

    Authors: Ming-Ming Cheng, Shuai Zheng, Wen-Yan Lin, Jonathan Warrell, Vibhav Vineet, Paul Sturgess, Nigel Crook, Niloy Mitra, Philip Torr

    Abstract: Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual at… ▽ More

    Submitted 21 May, 2014; v1 submitted 16 October, 2013; originally announced October 2013.

    Comments: http://mmcheng.net/imagespirit/

    ACM Class: I.3.6; I.4.8

    Journal ref: ACM Transactions on Graphics, 2014