Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Mallya, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.09809  [pdf, other

    cs.CV

    SPACE: Speech-driven Portrait Animation with Controllable Expression

    Authors: Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu

    Abstract: Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic h… ▽ More

    Submitted 6 December, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

  2. arXiv:2210.01794  [pdf, other

    cs.CV

    Implicit Warping for Animation with Image Sets

    Authors: Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

    Abstract: We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video. A single cross- modal attention layer is used to find correspondences between the source images and the driving image, choose the most appropriate features from different source images, and warp the selected features. This is in contrast to the existing… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: To be published at NeurIPS 2022

  3. arXiv:2112.07658  [pdf, other

    cs.CV cs.LG

    AdaViT: Adaptive Tokens for Efficient Vision Transformer

    Authors: Hongxu Yin, Arash Vahdat, Jose Alvarez, Arun Mallya, Jan Kautz, Pavlo Molchanov

    Abstract: We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds. We reformulate Adaptive Computation Time (ACT) for this task, extending halting to discard redundant spatial tokens.… ▽ More

    Submitted 5 October, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: CVPR'22 oral acceptance

  4. arXiv:2112.05130  [pdf, other

    cs.CV

    Multimodal Conditional Image Synthesis with Product-of-Experts GANs

    Authors: Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

    Abstract: Existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference. They are often unable to leverage multimodal user inputs when available, which reduces their practicality. To address this limitation, we propose the Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework, which can synth… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

  5. arXiv:2104.07659  [pdf, other

    cs.CV

    GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds

    Authors: Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu

    Abstract: We present GANcraft, an unsupervised neural rendering framework for generating photorealistic images of large 3D block worlds such as those created in Minecraft. Our method takes a semantic block world as input, where each block is assigned a semantic label such as dirt, grass, or water. We represent the world as a continuous volumetric function and train our model to render view-consistent photor… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  6. arXiv:2104.07586  [pdf, other

    cs.LG cs.CV

    See through Gradients: Image Batch Recovery via GradInversion

    Authors: Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

    Abstract: Training deep neural networks requires gradient estimation from data batches to update parameters. Gradients per parameter are averaged over a set of data and this has been presumed to be safe for privacy-preserving training in joint, collaborative, and federated learning applications. Prior work only showed the possibility of recovering input data given gradients under very restrictive conditions… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 accepted paper

  7. arXiv:2011.15126  [pdf, other

    cs.CV

    One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

    Authors: Ting-Chun Wang, Arun Mallya, Ming-Yu Liu

    Abstract: We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing. Our model learns to synthesize a talking-head video using a source image containing the target person's appearance and a driving video that dictates the motion in the output. Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related info… ▽ More

    Submitted 2 April, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

    Comments: CVPR 2021 camera ready (oral). Our project page can be found at https://nvlabs.github.io/face-vid2vid

  8. arXiv:2008.02793  [pdf, other

    cs.CV

    Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications

    Authors: Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, Arun Mallya

    Abstract: The generative adversarial network (GAN) framework has emerged as a powerful tool for various image and video synthesis tasks, allowing the synthesis of visual content in an unconditional or input-conditional manner. It has enabled the generation of high-resolution photorealistic images and videos, a task that was challenging or impossible with prior methods. It has also led to the creation of man… ▽ More

    Submitted 30 November, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

  9. arXiv:2007.08509  [pdf, other

    cs.CV

    World-Consistent Video-to-Video Synthesis

    Authors: Arun Mallya, Ting-Chun Wang, Karan Sapra, Ming-Yu Liu

    Abstract: Video-to-video synthesis (vid2vid) aims for converting high-level semantic inputs to photorealistic videos. While existing vid2vid methods can achieve short-term temporal consistency, they fail to ensure the long-term one. This is because they lack knowledge of the 3D world being rendered and generate each frame only based on the past few frames. To address the limitation, we introduce a novel vid… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: Published at the European Conference on Computer Vision, 2020

  10. arXiv:1912.08795  [pdf, other

    cs.LG cs.CV stat.ML

    Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

    Authors: Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz

    Abstract: We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network. We 'invert' a trained network (teacher) to synthesize class-conditional input images starting from random noise, without using any additional information about the training dataset. Keeping the teacher fixed, our method optimizes the input while regularizing the distrib… ▽ More

    Submitted 15 June, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

  11. arXiv:1912.07651  [pdf, other

    cs.LG cs.CV stat.ML

    UNAS: Differentiable Architecture Search Meets Reinforcement Learning

    Authors: Arash Vahdat, Arun Mallya, Ming-Yu Liu, Jan Kautz

    Abstract: Neural architecture search (NAS) aims to discover network architectures with desired properties such as high accuracy or low latency. Recently, differentiable NAS (DNAS) has demonstrated promising results while maintaining a search cost orders of magnitude lower than reinforcement learning (RL) based NAS. However, DNAS models can only optimize differentiable loss functions in search, and they requ… ▽ More

    Submitted 27 August, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

    Comments: Accepted to CVPR 2020 (Oral)

  12. arXiv:1906.10771  [pdf, other

    cs.LG cs.CV stat.ML

    Importance Estimation for Neural Network Pruning

    Authors: Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz

    Abstract: Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution.… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

  13. Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

    Authors: Zih-Siou Hung, Arun Mallya, Svetlana Lazebnik

    Abstract: Relations amongst entities play a central role in image understanding. Due to the complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a method that can not only recognize seen relations, but also generalize to unseen cases. Inspired by a previously proposed visual translation embedding model, or VTransE, we propose a context-augmented translation embeddi… ▽ More

    Submitted 6 February, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

  14. arXiv:1905.01723  [pdf, other

    cs.CV cs.AI cs.GR cs.MM stat.ML

    Few-Shot Unsupervised Image-to-Image Translation

    Authors: Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz

    Abstract: Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human cap… ▽ More

    Submitted 9 September, 2019; v1 submitted 5 May, 2019; originally announced May 2019.

    Comments: The paper will be presented at the International Conference on Computer Vision (ICCV) 2019

    Journal ref: ICCV 2019

  15. arXiv:1801.06519  [pdf, other

    cs.CV

    Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

    Authors: Arun Mallya, Dillon Davis, Svetlana Lazebnik

    Abstract: This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that piggyback on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task. These masks are learned in an… ▽ More

    Submitted 16 March, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

  16. arXiv:1711.05769  [pdf, other

    cs.CV

    PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

    Authors: Arun Mallya, Svetlana Lazebnik

    Abstract: This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a sing… ▽ More

    Submitted 13 May, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

  17. arXiv:1703.06233  [pdf, other

    cs.CV

    Recurrent Models for Situation Recognition

    Authors: Arun Mallya, Svetlana Lazebnik

    Abstract: This work proposes Recurrent Neural Network (RNN) models to predict structured 'image situations' -- actions and noun entities fulfilling semantic roles related to the action. In contrast to prior work relying on Conditional Random Fields (CRFs), we use a specialized action prediction network followed by an RNN for noun prediction. Our system obtains state-of-the-art accuracy on the challenging re… ▽ More

    Submitted 4 August, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

    Comments: To appear at ICCV 2017

  18. arXiv:1611.06641  [pdf, other

    cs.CV

    Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

    Authors: Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik

    Abstract: This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between peop… ▽ More

    Submitted 8 August, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

    Comments: IEEE ICCV 2017 accepted paper

  19. arXiv:1611.00393  [pdf, other

    cs.CV

    Combining Multiple Cues for Visual Madlibs Question Answering

    Authors: Tatiana Tommasi, Arun Mallya, Bryan Plummer, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg

    Abstract: This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach employs a combination of networks trained for specialized tasks such as scene recognition, person activity classification, and attribute prediction. We also present a… ▽ More

    Submitted 7 February, 2018; v1 submitted 1 November, 2016; originally announced November 2016.

    Comments: submitted to IJCV -- under review

  20. arXiv:1608.03410  [pdf, other

    cs.CV

    Solving Visual Madlibs with Multiple Cues

    Authors: Tatiana Tommasi, Arun Mallya, Bryan Plummer, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg

    Abstract: This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the ImageNet dataset, despite the wide scope of questions. In contrast, our approach employs features derived from networks trained for specialized tasks of scene cl… ▽ More

    Submitted 11 August, 2016; originally announced August 2016.

    Comments: accepted at BMVC 2016

  21. arXiv:1604.04808  [pdf, other

    cs.CV

    Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

    Authors: Arun Mallya, Svetlana Lazebnik

    Abstract: This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning to handle the lack of supervision on the level of individual person instances, and weighted loss to handle unbalanced traini… ▽ More

    Submitted 28 July, 2016; v1 submitted 16 April, 2016; originally announced April 2016.

  22. arXiv:1507.06332  [pdf, other

    cs.CV

    Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization

    Authors: Kevin J. Shih, Arun Mallya, Saurabh Singh, Derek Hoiem

    Abstract: We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification. We show that by conditioning the predictions on object proposals with sufficient image support, our method can do well without complicated spatial reasoning. Instead, inference methods with… ▽ More

    Submitted 22 July, 2015; originally announced July 2015.

    Comments: BMVC 2015

  23. Unsupervised Network Pretraining via Encoding Human Design

    Authors: Ming-Yu Liu, Arun Mallya, Oncel C. Tuzel, Xi Chen

    Abstract: Over the years, computer vision researchers have spent an immense amount of effort on designing image features for the visual object recognition task. We propose to incorporate this valuable experience to guide the task of training deep neural networks. Our idea is to pretrain the network through the task of replicating the process of hand-designed feature extraction. By learning to replicate the… ▽ More

    Submitted 22 January, 2016; v1 submitted 19 February, 2015; originally announced February 2015.

    Comments: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of Computer Vision

  24. arXiv:1411.7482  [pdf, other

    cs.NI

    SmartConnect: A System for the Design and Deployment of Wireless Sensor Networks

    Authors: Abhijit Bhattacharya, Sanjay Motilal Ladwa, Rachit Srivastava, Aniruddha Mallya, Akhila Rao, Easwar Vivek. M, Deeksha G. Rao Sahib, S. V. R. Anand, Anurag Kumar

    Abstract: We have developed SmartConnect, a tool that addresses the growing need for the design and deployment of multihop wireless relay networks for connecting sensors to a control center. Given the locations of the sensors, the traffic that each sensor generates, the quality of service (QoS) requirements, and the potential locations at which relays can be placed, SmartConnect helps design and deploy a lo… ▽ More

    Submitted 27 November, 2014; originally announced November 2014.