Skip to main content

Showing 1–50 of 222 results for author: Theobalt, C

Searching in archive cs. Search in all archives.
.
  1. Lite2Relight: 3D-aware Single Image Portrait Relighting

    Authors: Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt

    Abstract: Achieving photorealistic 3D view synthesis and relighting of human portraits is pivotal for advancing AR/VR applications. Existing methodologies in portrait relighting demonstrate substantial limitations in terms of generalization and 3D consistency, coupled with inaccuracies in physically realistic lighting and identity preservation. Furthermore, personalization from a single view is difficult to… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

  2. arXiv:2407.08701  [pdf, other

    cs.CV

    Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

    Authors: Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen

    Abstract: Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio, thanks to their temporally uni-directional attention mechanism, which models correlations between the current token and previous tokens. However, video streaming remains much less explored, despite a growing need for live video processing. State-of-the-art video diffusion models leverage bi-di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: https://live2diff.github.io/

  3. arXiv:2406.17988  [pdf, other

    cs.CV

    DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

    Authors: Qingxuan Wu, Zhiyang Dou, Sirui Xu, Soshi Shimada, Chen Wang, Zhengming Yu, Yuan Liu, Cheng Lin, Zeyu Cao, Taku Komura, Vladislav Golyanik, Christian Theobalt, Wenping Wang, Lingjie Liu

    Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures, 3 tables

  4. arXiv:2406.10078  [pdf, other

    cs.CV cs.GR cs.LG

    D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

    Authors: Moritz Kappel, Florian Hahlbohm, Timon Scholz, Susana Castillo, Christian Theobalt, Martin Eisemann, Vladislav Golyanik, Marcus Magnor

    Abstract: Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures, 10 tables. Project page: https://moritzkappel.github.io/projects/dnpc

  5. arXiv:2406.08924  [pdf, other

    cs.GR cs.CV cs.LG

    Learning Images Across Scales Using Adversarial Training

    Authors: Krzysztof Wolski, Adarsh Djeacoumar, Alireza Javanmardi, Hans-Peter Seidel, Christian Theobalt, Guillaume Cordonnier, Karol Myszkowski, George Drettakis, Xingang Pan, Thomas Leimkühler

    Abstract: The real world exhibits rich structure and detail across many scales of observation. It is difficult, however, to capture and represent a broad spectrum of scales using ordinary images. We devise a novel paradigm for learning a representation that captures an orders-of-magnitude variety of scales from an unstructured collection of ordinary images. We treat this collection as a distribution of scal… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024; project page: https://scalespacegan.mpi-inf.mpg.de/

  6. arXiv:2406.07163  [pdf, other

    cs.CV

    FaceGPT: Self-supervised Learning to Chat about 3D Human Faces

    Authors: Haoran Wang, Mohit Mendiratta, Christian Theobalt, Adam Kortylewski

    Abstract: We introduce FaceGPT, a self-supervised learning framework for Large Vision-Language Models (VLMs) to reason about 3D human faces from images and text. Typical 3D face reconstruction methods are specialized algorithms that lack semantic reasoning capabilities. FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM, enabling t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2405.20980  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Gaussian Scale-Space Fields

    Authors: Felix Mujkanovic, Ntumba Elie Nsampi, Christian Theobalt, Hans-Peter Seidel, Thomas Leimkühler

    Abstract: Gaussian scale spaces are a cornerstone of signal representation and processing, with applications in filtering, multiscale analysis, anti-aliasing, and many more. However, obtaining such a scale space is costly and cumbersome, in particular for continuous representations such as neural fields. We present an efficient and lightweight method to learn the fully continuous, anisotropic Gaussian scale… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 15 pages; SIGGRAPH 2024; project page at https://neural-gaussian-scale-space-fields.mpi-inf.mpg.de

  8. arXiv:2405.17531  [pdf, other

    cs.CV

    Evolutive Rendering Models

    Authors: Fangneng Zhan, Hanxue Liang, Yifan Wang, Michael Niemeyer, Michael Oechsle, Adam Kortylewski, Cengiz Oztireli, Gordon Wetzstein, Christian Theobalt

    Abstract: The landscape of computer graphics has undergone significant transformations with the recent advances of differentiable rendering models. These rendering models often rely on heuristic designs that may not fully align with the final rendering objectives. We address this gap by pioneering \textit{evolutive rendering models}, a methodology where rendering models possess the ability to evolve and ada… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://fnzhan.com/Evolutive-Rendering-Models/

  9. arXiv:2405.16947  [pdf, other

    cs.CV

    Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

    Authors: Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan, Adam Kortylewski, Christian Theobalt, Peter Wonka

    Abstract: We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, w… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project webpage: https://qianwangx.github.io/VidSeg_diffusion/

  10. arXiv:2404.08640  [pdf, other

    cs.CV

    EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

    Authors: Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik

    Abstract: Monocular egocentric 3D human motion capture is a challenging and actively researched problem. Existing methods use synchronously operating visual sensors (e.g. RGB cameras) and often fail under low lighting and fast motions, which can be restricting in many applications involving head-mounted devices. In response to the existing limitations, this paper 1) introduces a new problem, i.e., 3D human… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 14 pages, 11 figures and 6 tables; project page: https://4dqv.mpi-inf.mpg.de/EventEgo3D/; Computer Vision and Pattern Recognition (CVPR) 2024

  11. arXiv:2403.18820  [pdf, other

    cs.CV

    MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering

    Authors: Guoxing Sun, Rishabh Dabral, Pascal Fua, Christian Theobalt, Marc Habermann

    Abstract: Faithful human performance capture and free-view rendering from sparse RGB observations is a long-standing problem in Vision and Graphics. The main challenges are the lack of observations and the inherent ambiguities of the setting, e.g. occlusions and depth ambiguity. As a result, radiance fields, which have shown great promise in capturing high-frequency appearance and geometry details in dense… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Project page: https://vcai.mpi-inf.mpg.de/projects/MetaCap/

  12. arXiv:2403.17936  [pdf, other

    cs.CV

    ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

    Authors: Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobalt

    Abstract: Gestures play a key role in human communication. Recent methods for co-speech gesture generation, while managing to generate beat-aligned motions, struggle generating gestures that are semantically aligned with the utterance. Compared to beat gestures that align naturally to the audio signal, semantically coherent gestures require modeling the complex interactions between the language and human mo… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024. Project Page: https://vcai.mpi-inf.mpg.de/projects/ConvoFusion/

  13. arXiv:2403.15064  [pdf, other

    cs.CV cs.GR

    Recent Trends in 3D Reconstruction of General Non-Rigid Scenes

    Authors: Raza Yunus, Jan Eric Lenssen, Michael Niemeyer, Yiyi Liao, Christian Rupprecht, Christian Theobalt, Gerard Pons-Moll, Jia-Bin Huang, Vladislav Golyanik, Eddy Ilg

    Abstract: Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Fu… ▽ More

    Submitted 6 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: 42 pages, 18 figures, 5 tables; State-of-the-Art Report at EUROGRAPHICS 2024. Project page: https://razayunus.github.io/non-rigid-star

  14. arXiv:2403.07807  [pdf, other

    cs.CV

    StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

    Authors: Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu

    Abstract: We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and dec… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2402.04930  [pdf, other

    cs.CV cs.GR cs.LG

    Blue noise for diffusion models

    Authors: Xingchang Huang, Corentin Salaün, Cristina Vasconcelos, Christian Theobalt, Cengiz Öztireli, Gurprit Singh

    Abstract: Most of the existing diffusion models use Gaussian noise for training and sampling across all time steps, which may not optimally account for the frequency contents reconstructed by the denoising network. Despite the diverse applications of correlated noise in computer graphics, its potential for improving the training process has been underexplored. In this paper, we introduce a novel and general… ▽ More

    Submitted 2 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: SIGGRAPH 2024 Conference Proceedings; Project page: https://xchhuang.github.io/bndm

  16. arXiv:2401.00889  [pdf, other

    cs.CV

    3D Human Pose Perception from Egocentric Stereo Videos

    Authors: Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt

    Abstract: While head-mounted devices are becoming more compact, they provide egocentric views with significant self-occlusions of the device user. Hence, existing methods often fail to accurately estimate complex 3D poses from egocentric views. In this work, we propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation, which leverages the scene information and temporal… ▽ More

    Submitted 15 May, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

  17. arXiv:2312.14929  [pdf, other

    cs.CV cs.GR

    MACS: Mass Conditioned 3D Hand and Object Motion Synthesis

    Authors: Soshi Shimada, Franziska Mueller, Jan Bednarik, Bardia Doosti, Bernd Bickel, Danhang Tang, Vladislav Golyanik, Jonathan Taylor, Christian Theobalt, Thabo Beeler

    Abstract: The physical properties of an object, such as mass, significantly affect how we manipulate it with our hands. Surprisingly, this aspect has so far been neglected in prior work on 3D motion synthesis. To improve the naturalness of the synthesized 3D hand object motions, this work proposes MACS the first MAss Conditioned 3D hand and object motion Synthesis approach. Our approach is based on cascaded… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  18. arXiv:2312.14157  [pdf, other

    cs.CV

    3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera

    Authors: Christen Millerdurai, Diogo Luvizon, Viktor Rudnev, André Jonas, Jiayi Wang, Christian Theobalt, Vladislav Golyanik

    Abstract: 3D hand tracking from a monocular video is a very challenging problem due to hand interactions, occlusions, left-right hand ambiguity, and fast motion. Most existing methods rely on RGB inputs, which have severe limitations under low-light conditions and suffer from motion blur. In contrast, event cameras capture local brightness changes instead of full image frames and do not suffer from the desc… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 17 pages, 12 figures, 7 tables; project page: https://4dqv.mpi-inf.mpg.de/Ev2Hands/

    Journal ref: International Conference on 3D Vision (3DV) 2024

  19. arXiv:2312.11587  [pdf, other

    cs.CV

    Relightable Neural Actor with Intrinsic Decomposition and Pose Control

    Authors: Diogo Luvizon, Vladislav Golyanik, Adam Kortylewski, Marc Habermann, Christian Theobalt

    Abstract: Creating a digital human avatar that is relightable, drivable, and photorealistic is a challenging and important problem in Vision and Graphics. Humans are highly articulated creating pose-dependent appearance effects like self-shadows and wrinkles, and skin as well as clothing require complex and space-varying BRDF models. While recent human relighting approaches can recover plausible material-li… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Project page: https://people.mpi-inf.mpg.de/~dluvizon/relightable-neural-actor/

  20. arXiv:2312.07423  [pdf, other

    cs.CV

    Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

    Authors: Ashwath Shetty, Marc Habermann, Guoxing Sun, Diogo Luvizon, Vladislav Golyanik, Christian Theobalt

    Abstract: We present the first approach to render highly realistic free-viewpoint videos of a human actor in general apparel, from sparse multi-view recording to display, in real-time at an unprecedented 4K resolution. At inference, our method only requires four camera views of the moving actor and the respective 3D skeletal pose. It handles actors in wide clothing, and reproduces even fine-scale dynamic de… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project page: https://vcai.mpi-inf.mpg.de/projects/holochar/

  21. arXiv:2312.05941  [pdf, other

    cs.CV

    ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

    Authors: Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann

    Abstract: Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photore… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: For project page, see https://vcai.mpi-inf.mpg.de/projects/ash/

  22. arXiv:2312.05161  [pdf, other

    cs.CV

    TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis

    Authors: Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann

    Abstract: Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  23. arXiv:2311.17057  [pdf, other

    cs.CV

    ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

    Authors: Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek

    Abstract: Current approaches for 3D human motion synthesis generate high-quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in addressing the complex dynamics of multi-human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion-based model that synthesizes full-body reactive motion of a person… ▽ More

    Submitted 26 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 17 pages, 7 figures, 5 tables

  24. arXiv:2311.17050  [pdf, other

    cs.CV cs.GR

    Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

    Authors: Zhengming Yu, Zhiyang Dou, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wenping Wang

    Abstract: We present Surf-D, a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies using Diffusion models. Previous methods explored shape generation with different representations and they suffer from limited topologies and poor geometry details. To generate high-quality surfaces of arbitrary topologies, we use the Unsigned Distance Field (UDF) as our surface representa… ▽ More

    Submitted 22 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Page: https://yzmblog.github.io/projects/SurfD/

  25. arXiv:2311.16495  [pdf, other

    cs.CV

    Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement

    Authors: Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt

    Abstract: In this work, we explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion. This task presents significant challenges due to three factors: the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion. To address these challenges, we propose a novel approach that leverages FisheyeViT to extra… ▽ More

    Submitted 2 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  26. arXiv:2311.12063  [pdf, other

    cs.CV

    DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields

    Authors: Yu Chi, Fangneng Zhan, Sibo Wu, Christian Theobalt, Adam Kortylewski

    Abstract: Progress in 3D computer vision tasks demands a huge amount of data, yet annotating multi-view images with 3D-consistent annotations, or point clouds with part segmentation is both time-consuming and challenging. This paper introduces DatasetNeRF, a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations, while utilizing minima… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  27. arXiv:2311.05604  [pdf, other

    cs.CV

    3D-QAE: Fully Quantum Auto-Encoding of 3D Point Clouds

    Authors: Lakshika Rathi, Edith Tretschk, Christian Theobalt, Rishabh Dabral, Vladislav Golyanik

    Abstract: Existing methods for learning 3D representations are deep neural networks trained and tested on classical hardware. Quantum machine learning architectures, despite their theoretically predicted advantages in terms of speed and the representational capacity, have so far not been considered for this problem nor for tasks involving 3D data in general. This paper thus introduces the first quantum auto… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 20 pages, 11 figures, 5 tables

    Journal ref: British Machine Vision Conference (BMVC) 2023

  28. arXiv:2310.15008  [pdf, other

    cs.CV

    Wonder3D: Single Image to 3D using Cross-Domain Diffusion

    Authors: Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, Wenping Wang

    Abstract: In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly pro… ▽ More

    Submitted 8 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Project page: https://www.xxlong.site/Wonder3D/

  29. arXiv:2310.11449  [pdf, other

    cs.CV cs.GR cs.LG

    DELIFFAS: Deformable Light Fields for Fast Avatar Synthesis

    Authors: Youngjoong Kwon, Lingjie Liu, Henry Fuchs, Marc Habermann, Christian Theobalt

    Abstract: Generating controllable and photorealistic digital human avatars is a long-standing and important problem in Vision and Graphics. Recent methods have shown great progress in terms of either photorealism or inference speed while the combination of the two desired properties still remains unsolved. To this end, we propose a novel method, called DELIFFAS, which parameterizes the appearance of the hum… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  30. Discovering Fatigued Movements for Virtual Character Animation

    Authors: Noshaba Cheema, Rui Xu, Nam Hee Kim, Perttu Hämäläinen, Vladislav Golyanik, Marc Habermann, Christian Theobalt, Philipp Slusallek

    Abstract: Virtual character animation and movement synthesis have advanced rapidly during recent years, especially through a combination of extensive motion capture datasets and machine learning. A remaining challenge is interactively simulating characters that fatigue when performing extended motions, which is indispensable for the realism of generated animations. However, capturing such movements is probl… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 16 pages, 22 figures. To be published in ACM SIGGRAPH Asia Conference Papers 2023. ACM ISBN 979-8-4007-0315-7/23/12

    ACM Class: I.3.7

    Journal ref: ACM SIGGRAPH Asia Conference Papers 2023

  31. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  32. Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering

    Authors: Linjie Lyu, Ayush Tewari, Marc Habermann, Shunsuke Saito, Michael Zollhöfer, Thomas Leimkühler, Christian Theobalt

    Abstract: Inverse rendering, the process of inferring scene properties from images, is a challenging inverse problem. The task is ill-posed, as many different scene configurations can give rise to the same image. Most existing solutions incorporate priors into the inverse-rendering pipeline to encourage plausible solutions, but they do not consider the inherent ambiguities and the multi-modal distribution o… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: SIGGRAPH Asia 2023

  33. arXiv:2309.16670  [pdf, other

    cs.CV cs.GR cs.HC

    Decaf: Monocular Deformation Capture for Face and Hand Interactions

    Authors: Soshi Shimada, Vladislav Golyanik, Patrick Pérez, Christian Theobalt

    Abstract: Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects. Modelling dense non-rigid object deformations in this setting remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR and avatar communications. This is due to the severe ill-posedness of the monocular view setting… ▽ More

    Submitted 13 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

  34. arXiv:2308.12970  [pdf, other

    cs.GR cs.LG

    NeuralClothSim: Neural Deformation Fields Meet the Thin Shell Theory

    Authors: Navami Kairanda, Marc Habermann, Christian Theobalt, Vladislav Golyanik

    Abstract: Despite existing 3D cloth simulators producing realistic results, they predominantly operate on discrete surface representations (e.g. points and meshes) with a fixed spatial resolution, which often leads to large memory consumption and resolution-dependent simulations. Moreover, back-propagating gradients through the existing solvers is difficult, and they cannot be easily integrated into modern… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 33 pages, 21 figures and 3 tables; project page: https://4dqv.mpi-inf.mpg.de/NeuralClothSim/

  35. arXiv:2308.12969  [pdf, other

    cs.CV

    ROAM: Robust and Object-Aware Motion Generation Using Neural Pose Descriptors

    Authors: Wanyue Zhang, Rishabh Dabral, Thomas Leimkühler, Vladislav Golyanik, Marc Habermann, Christian Theobalt

    Abstract: Existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalise well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. This paper addresses this limitation and shows that robustness and generalisation to novel scene objects in 3D object-aware… ▽ More

    Submitted 15 February, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 14 pages, 11 figures; project page: https://vcai.mpi-inf.mpg.de/projects/ROAM/

    Journal ref: International Conference on 3D Vision 2024

  36. arXiv:2308.08258  [pdf, other

    cs.CV cs.GR

    SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes

    Authors: Edith Tretschk, Vladislav Golyanik, Michael Zollhoefer, Aljaz Bozic, Christoph Lassner, Christian Theobalt

    Abstract: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view R… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Project page: https://vcai.mpi-inf.mpg.de/projects/scenerflow/

  37. arXiv:2308.04826  [pdf, other

    cs.CV

    WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields

    Authors: Muyu Xu, Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Xiaoqin Zhang, Christian Theobalt, Ling Shao, Shijian Lu

    Abstract: Neural Radiance Field (NeRF) has shown impressive performance in novel view synthesis via implicit scene representation. However, it usually suffers from poor scalability as requiring densely sampled images for each new scene. Several studies have attempted to mitigate this problem by integrating Multi-View Stereo (MVS) technique into NeRF while they still entail a cumbersome fine-tuning process f… ▽ More

    Submitted 26 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023. Project website: https://mxuai.github.io/WaveNeRF/

  38. arXiv:2307.00842  [pdf, other

    cs.CV

    VINECS: Video-based Neural Character Skinning

    Authors: Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann, Christian Theobalt

    Abstract: Rigging and skinning clothed human avatars is a challenging task and traditionally requires a lot of manual work and expertise. Recent methods addressing it either generalize across different characters or focus on capturing the dynamics of a single character observed under different pose configurations. However, the former methods typically predict solely static skinning weights, which perform po… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  39. arXiv:2306.00547  [pdf, other

    cs.CV cs.GR

    AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

    Authors: Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt

    Abstract: Capturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs and others. While these data modalities provide… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 17 pages, 17 figures. Project page: https://vcai.mpi-inf.mpg.de/projects/AvatarStudio/

  40. arXiv:2305.14093  [pdf, other

    cs.CV

    Weakly Supervised 3D Open-vocabulary Segmentation

    Authors: Kunhao Liu, Fangneng Zhan, Jiahui Zhang, Muyu Xu, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, Shijian Lu

    Abstract: Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it… ▽ More

    Submitted 9 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2023

  41. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

    Authors: Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt

    Abstract: Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powe… ▽ More

    Submitted 17 July, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to SIGGRAPH 2023. Project page: https://vcai.mpi-inf.mpg.de/projects/DragGAN/

  42. arXiv:2305.03462  [pdf, other

    cs.CV cs.GR

    General Neural Gauge Fields

    Authors: Fangneng Zhan, Lingjie Liu, Adam Kortylewski, Christian Theobalt

    Abstract: The recent advance of neural fields, such as neural radiance fields, has significantly pushed the boundary of scene representation learning. Aiming to boost the computation efficiency and rendering quality of 3D scenes, a popular line of research maps the 3D coordinate system to another measuring system, e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion of coordinate s… ▽ More

    Submitted 7 February, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: ICLR 2023

  43. arXiv:2305.01599  [pdf, other

    cs.CV cs.GR

    EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

    Authors: Xinyu Yi, Yuxiao Zhou, Marc Habermann, Vladislav Golyanik, Shaohua Pan, Christian Theobalt, Feng Xu

    Abstract: Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, inc… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted by SIGGRAPH 2023. Project page: https://xinyu-yi.github.io/EgoLocate/

  44. arXiv:2304.10266  [pdf, other

    cs.CV

    OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

    Authors: Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski

    Abstract: Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and th… ▽ More

    Submitted 26 July, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2111.14341

  45. arXiv:2303.18193  [pdf, other

    cs.CV cs.GR cs.LG

    GVP: Generative Volumetric Primitives

    Authors: Mallikarjun B R, Xingang Pan, Mohamed Elgharib, Christian Theobalt

    Abstract: Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution i… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: https://vcai.mpi-inf.mpg.de/projects/GVP/index.html

  46. arXiv:2303.16202  [pdf, other

    cs.CV

    CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes

    Authors: Harshil Bhatia, Edith Tretschk, Zorah Lähner, Marcel Seelbach Benkner, Michael Moeller, Christian Theobalt, Vladislav Golyanik

    Abstract: Jointly matching multiple, non-rigidly deformed 3D shapes is a challenging, $\mathcal{NP}$-hard problem. A perfect matching is necessarily cycle-consistent: Following the pairwise point correspondences along several shapes must end up at the starting vertex of the original shape. Unfortunately, existing quantum shape-matching methods do not support multiple shapes and even less cycle consistency.… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Computer Vision and Pattern Recognition (CVPR) 2023; 22 pages, 24 figures and 5 tables; Project page: https://4dqv.mpi-inf.mpg.de/CCuantuMM/

  47. arXiv:2303.15951  [pdf, other

    cs.CV cs.GR

    F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories

    Authors: Peng Wang, Yuan Liu, Zhaoxi Chen, Lingjie Liu, Ziwei Liu, Taku Komura, Christian Theobalt, Wenping Wang

    Abstract: This paper presents a novel grid-based NeRF called F2-NeRF (Fast-Free-NeRF) for novel view synthesis, which enables arbitrary input camera trajectories and only costs a few minutes for training. Existing fast grid-based NeRF training frameworks, like Instant-NGP, Plenoxels, DVGO, or TensoRF, are mainly designed for bounded scenes and rely on space warping to handle unbounded scenes. Existing two w… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project page: https://totoro97.github.io/projects/f2-nerf

  48. arXiv:2303.14471  [pdf, other

    cs.CV cs.GR

    HQ3DAvatar: High Quality Controllable 3D Head Avatar

    Authors: Kartik Teotia, Mallikarjun B R, Xingang Pan, Hyeongwoo Kim, Pablo Garrido, Mohamed Elgharib, Christian Theobalt

    Abstract: Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometri… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 16 Pages, 15 Figures. Project page: https://vcai.mpi-inf.mpg.de/projects/HQ3DAvatar/

  49. arXiv:2303.14001  [pdf, other

    cs.CV

    Grid-guided Neural Radiance Fields for Large Urban Scenes

    Authors: Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, Dahua Lin

    Abstract: Purely MLP-based neural radiance fields (NeRF-based methods) often suffer from underfitting with blurred renderings on large-scale scenes due to limited model capacity. Recent approaches propose to geographically divide the scene and adopt multiple sub-NeRFs to model each region individually, leading to linear scale-up in training costs and the number of sub-NeRFs as the scene expands. An alternat… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR2023, Project page at https://city-super.github.io/gridnerf/

  50. arXiv:2303.06424  [pdf, other

    cs.CV

    Regularized Vector Quantization for Tokenized Image Synthesis

    Authors: Jiahui Zhang, Fangneng Zhan, Christian Theobalt, Shijian Lu

    Abstract: Quantizing images into discrete representations has been a fundamental problem in unified generative modeling. Predominant approaches learn the discrete representation either in a deterministic manner by selecting the best-matching token or in a stochastic manner by sampling from a predicted distribution. However, deterministic quantization suffers from severe codebook collapse and misalignment wi… ▽ More

    Submitted 14 October, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023