Zum Hauptinhalt springen

Showing 1–32 of 32 results for author: Singh, K K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.10822  [pdf, other

    cs.CV

    ActAnywhere: Subject-Aware Video Background Generation

    Authors: Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang

    Abstract: Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which tra… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  2. arXiv:2312.14985  [pdf, other

    cs.CV

    UniHuman: A Unified Model for Editing Human Images in the Wild

    Authors: Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin

    Abstract: Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning them jointly. In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings. To en… ▽ More

    Submitted 31 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024

  3. arXiv:2312.06712  [pdf, other

    cs.CV cs.AI

    Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

    Authors: Zhipeng Bao, Yijun Li, Krishna Kumar Singh, Yu-Xiong Wang, Martial Hebert

    Abstract: Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. Whi… ▽ More

    Submitted 31 January, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  4. arXiv:2307.01425  [pdf, other

    cs.CV

    Consistent Multimodal Generation via A Unified GAN Framework

    Authors: Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem

    Abstract: We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: In review

  5. arXiv:2304.14406  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

    Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh

    Abstract: We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. W… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page with code: https://sumith1896.github.io/affordance-insertion/

  6. arXiv:2302.14368  [pdf, other

    cs.CV cs.AI cs.GR

    Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods

    Authors: Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale

    Abstract: As Diffusion Models have shown promising performance, a lot of efforts have been made to improve the controllability of Diffusion Models. However, how to train Diffusion Models to have the disentangled latent spaces and how to naturally incorporate the disentangled conditions during the sampling process have been underexplored. In this paper, we present a training framework for feature disentangle… ▽ More

    Submitted 23 July, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: ECCV 2024; Code will be opened after a patent application is granted

  7. arXiv:2302.12764  [pdf, other

    cs.CV

    Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

    Authors: Cusuh Ham, James Hays, Jingwan Lu, Krishna Kumar Singh, Zhifei Zhang, Tobias Hinz

    Abstract: We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but \textit{does not require any updat… ▽ More

    Submitted 18 May, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: SIGGRAPH Conference Proceedings 2023. Project page at https://mcm-diffusion.github.io

  8. arXiv:2302.03027  [pdf, other

    cs.CV cs.GR cs.LG

    Zero-shot Image-to-Image Translation

    Authors: Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu

    Abstract: Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can in… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: website: https://pix2pixzero.github.io/

  9. arXiv:2211.10157  [pdf, other

    cs.CV cs.AI

    UMFuse: Unified Multi View Fusion for Human Editing applications

    Authors: Rishabh Jain, Mayur Hemani, Duygu Ceylan, Krishna Kumar Singh, Jingwan Lu, Mausoom Sarkar, Balaji Krishnamurthy

    Abstract: Numerous pose-guided human editing methods have been explored by the vision community due to their extensive practical applications. However, most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. This objective becomes ill-defined in cases when the target pose differs significantly from the input pose. Existing… ▽ More

    Submitted 28 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 8 pages, 6 figures

    ACM Class: I.4; I.5

  10. arXiv:2211.08540  [pdf, other

    cs.CV cs.AI

    VGFlow: Visibility guided Flow Network for Human Reposing

    Authors: Rishabh Jain, Krishna Kumar Singh, Mayur Hemani, Jingwan Lu, Mausoom Sarkar, Duygu Ceylan, Balaji Krishnamurthy

    Abstract: The task of human reposing involves generating a realistic image of a person standing in an arbitrary conceivable pose. There are multiple difficulties in generating perceptually accurate images, and existing methods suffer from limitations in preserving texture, maintaining pattern coherence, respecting cloth boundaries, handling occlusions, manipulating skin generation, etc. These difficulties a… ▽ More

    Submitted 28 March, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Selected for publication in CVPR2023

    ACM Class: I.4; I.5

  11. arXiv:2211.02707  [pdf, other

    cs.CV

    Contrastive Learning for Diverse Disentangled Foreground Generation

    Authors: Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

    Abstract: We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: ECCV 2022

  12. arXiv:2208.01350  [pdf, ps, other

    cs.CY

    Application of Blockchain Smart Contracts in E-Commerce and Government

    Authors: Kamal Kishor Singh

    Abstract: With technological advances and the establishment of e-commerce models, business challenges have shifted to online platforms. The promise of embedding self-executing and autonomous programs into blockchain technologies has attracted increased interest and its use in niche solutions. Using qualitative interviews, this paper sought the opinions of the eleven industry leaders regarding smart contract… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

  13. arXiv:2206.08357  [pdf, other

    cs.CV cs.GR cs.LG

    Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

    Authors: Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh

    Abstract: Existing GAN inversion and editing methods work well for aligned objects with a clean background, such as portraits and animal faces, but often struggle for more difficult categories with complex scene layouts and object occlusions, such as cars, animals, and outdoor images. We propose a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2. Our key idea… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: CVPR 2022. Github: https://github.com/adobe-research/sam_inversion Website: https://www.cs.cmu.edu/~SAMInversion

  14. arXiv:2206.07331  [pdf

    cs.MM

    ETMA: Efficient Transformer Based Multilevel Attention framework for Multimodal Fake News Detection

    Authors: Ashima Yadav, Shivani Gaba, Haneef Khan, Ishan Budhiraja, Akansha Singh, Krishan Kant Singh

    Abstract: In this new digital era, social media has created a severe impact on the lives of people. In recent times, fake news content on social media has become one of the major challenging problems for society. The dissemination of fabricated and false news articles includes multimodal data in the form of text and images. The previous methods have mainly focused on unimodal analysis. Moreover, for multimo… ▽ More

    Submitted 13 March, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in IEEE Transactions on Computational Social Systems

  15. arXiv:2203.14954  [pdf, other

    cs.CV cs.AI cs.LG

    GIRAFFE HD: A High-Resolution 3D-aware Generative Model

    Authors: Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee

    Abstract: 3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation. In particular, the current state-of-the-art model GIRAFFE can control each object's rotation, translation, scale, and scene camera pose without corresponding supervision. However, GIRAFFE only operates well when the image resolution is low. We propose GIRAFFE HD, a high-res… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  16. arXiv:2203.07293  [pdf, other

    cs.CV cs.GR cs.LG

    InsetGAN for Full-Body Image Generation

    Authors: Anna Frühstück, Krishna Kumar Singh, Eli Shechtman, Niloy J. Mitra, Peter Wonka, Jingwan Lu

    Abstract: While GANs can produce photo-realistic images in ideal conditions for certain domains, the generation of full-body human images remains difficult due to the diversity of identities, hairstyles, clothing, and the variance in pose. Instead of modeling this complex domain with a single GAN, we propose a novel method to combine multiple pretrained GANs, where one GAN generates a global canvas (e.g., h… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: Project webpage and video available at http://afruehstueck.github.io/insetgan

  17. arXiv:2111.05916  [pdf, other

    cs.CV

    Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

    Authors: Tuanfeng Y. Wang, Duygu Ceylan, Krishna Kumar Singh, Niloy J. Mitra

    Abstract: Synthesizing dynamic appearances of humans in motion plays a central role in applications such as AR/VR and video editing. While many recent methods have been proposed to tackle this problem, handling loose garments with complex textures and high dynamic motion still remains challenging. In this paper, we propose a video based appearance synthesis method that tackles such challenges and demonstrat… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  18. arXiv:2110.04281  [pdf, other

    cs.CV cs.LG

    Collaging Class-specific GANs for Semantic Image Synthesis

    Authors: Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

    Abstract: We propose a new approach for high resolution semantic image synthesis. It consists of one base image generator and multiple class-specific generators. The base generator generates high quality images based on a segmentation map. To further improve the quality of different objects, we create a bank of Generative Adversarial Networks (GANs) by separately training class-specific models. This has sev… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: ICCV 2021

  19. arXiv:2104.05895  [pdf, other

    cs.CV

    IMAGINE: Image Synthesis by Image-Guided Model Inversion

    Authors: Pei Wang, Yijun Li, Krishna Kumar Singh, Jingwan Lu, Nuno Vasconcelos

    Abstract: We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images from only a single training sample. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations via matching multi-level feature representations in the classifier, associated with adversarial training with an external… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: Published in CVPR2021

  20. arXiv:2104.02052  [pdf, other

    cs.CV cs.GR cs.LG

    Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

    Authors: Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

    Abstract: We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e.g., dogs and cars). The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: Camera ready version for ICLR 2021

  21. arXiv:2001.03152  [pdf, other

    cs.CV cs.LG

    Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

    Authors: Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram

    Abstract: Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations. Our goal is to accurately recognize a… ▽ More

    Submitted 5 May, 2020; v1 submitted 9 January, 2020; originally announced January 2020.

    Comments: CVPR 2020

  22. arXiv:1911.11758  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

    Authors: Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

    Abstract: We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to lea… ▽ More

    Submitted 13 April, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: CVPR 2020 camera ready

    Journal ref: CVPR 2020

  23. arXiv:1910.01112  [pdf, other

    cs.LG cs.CV stat.ML

    Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

    Authors: Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, Yong Jae Lee

    Abstract: We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN, and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent fa… ▽ More

    Submitted 30 October, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Camera ready version for NeurIPS 2020

  24. arXiv:1811.11155  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

    Authors: Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

    Abstract: We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way… ▽ More

    Submitted 9 April, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Journal ref: CVPR 2019 (Oral Presentation)

  25. arXiv:1811.02545  [pdf, other

    cs.CV

    Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond

    Authors: Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, Yong Jae Lee

    Abstract: We propose 'Hide-and-Seek' a general purpose data augmentation technique, which is complementary to existing data augmentation techniques and is beneficial for various visual recognition tasks. The key idea is to hide patches in a training image randomly, in order to force the network to seek other relevant content when the most discriminative content is hidden. Our approach only needs to modify t… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: TPAMI submission. This is a journal extension of our ICCV 2017 paper arXiv:1704.04232

  26. arXiv:1804.01077  [pdf, other

    cs.CV cs.AI

    DOCK: Detecting Objects by transferring Common-sense Knowledge

    Authors: Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee

    Abstract: We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories. In our setting, the training data for the source categories have bounding box annotations, while those for the target categories only have image-level annotations. Current state-of-the-art approaches focus on image-level visual or semantic similarity to adapt a detec… ▽ More

    Submitted 31 July, 2018; v1 submitted 3 April, 2018; originally announced April 2018.

    Journal ref: ECCV, 2018

  27. arXiv:1705.09275  [pdf, other

    cs.CV cs.SI

    Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks

    Authors: Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee

    Abstract: Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly focuses on modeling popularity using a single metric such as the total number of likes or shares. In this work, we propose Diffusion-LSTM, a memory-based deep recurrent network th… ▽ More

    Submitted 29 November, 2017; v1 submitted 25 May, 2017; originally announced May 2017.

    Comments: 9 pages, 6 figures

  28. arXiv:1704.06340  [pdf, other

    cs.CV

    Identifying First-person Camera Wearers in Third-person Videos

    Authors: Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo

    Abstract: We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the cam… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

  29. arXiv:1704.04232  [pdf, other

    cs.CV

    Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization

    Authors: Krishna Kumar Singh, Yong Jae Lee

    Abstract: We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to see… ▽ More

    Submitted 23 December, 2017; v1 submitted 13 April, 2017; originally announced April 2017.

    Comments: Camera-Ready Version (ICCV 2017)

  30. arXiv:1608.02676  [pdf, other

    cs.CV

    End-to-End Localization and Ranking for Relative Attributes

    Authors: Krishna Kumar Singh, Yong Jae Lee

    Abstract: We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons. Unlike previous methods, our network jointly learns the attribute's features, localization, and ranker. The localization module of our network discovers the most informative image region for the attribute, which is then used by… ▽ More

    Submitted 8 August, 2016; originally announced August 2016.

    Comments: Appears in European Conference on Computer Vision (ECCV), 2016

  31. arXiv:1604.05766  [pdf, other

    cs.CV

    Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection

    Authors: Krishna Kumar Singh, Fanyi Xiao, Yong Jae Lee

    Abstract: The status quo approach to training object detectors requires expensive bounding box annotations. Our framework takes a markedly different direction: we transfer tracked object boxes from weakly-labeled videos to weakly-labeled images to automatically generate pseudo ground-truth boxes, which replace manually annotated bounding boxes. We first mine discriminative regions in the weakly-labeled imag… ▽ More

    Submitted 19 April, 2016; originally announced April 2016.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  32. arXiv:1303.2171  [pdf, ps, other

    cs.DC

    CPU and/or GPU: Revisiting the GPU Vs. CPU Myth

    Authors: Kishore Kothapalli, Dip Sankar Banerjee, P. J. Narayanan, Surinder Sood, Aman Kumar Bahl, Shashank Sharma, Shrenik Lad, Krishna Kumar Singh, Kiran Matam, Sivaramakrishna Bharadwaj, Rohit Nigam, Parikshit Sakurikar, Aditya Deshpande, Ishan Misra, Siddharth Choudhary, Shubham Gupta

    Abstract: Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has signifi- cantly relegated the role of CPUs in computation. As CPUs evolve and also offer matching c… ▽ More

    Submitted 9 March, 2013; originally announced March 2013.

    Comments: 20 pages