Zum Hauptinhalt springen

Showing 1–50 of 108 results for author: Wonka, P

.
  1. arXiv:2408.14819  [pdf, other

    cs.CV

    Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

    Authors: Abdelrahman Eldesokey, Peter Wonka

    Abstract: We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. Layout control has been widely studied to alleviate the shortcomings of T2I diffusion models in understanding objects' placement and relationships from text descriptions. Nevertheless, existing approaches for layout control are limited to 2D layouts, require the user to provide a static lay… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Project Page: https://abdo-eldesokey.github.io/build-a-scene/

  2. arXiv:2406.15020  [pdf, other

    cs.CV

    A3D: Does Diffusion Dream about 3D Alignment?

    Authors: Savva Ignatyev, Nina Konovalova, Daniil Selikhanovych, Nikolay Patakin, Oleg Voynov, Dmitry Senushkin, Alexander Filippov, Anton Konushin, Peter Wonka, Evgeny Burnaev

    Abstract: We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality objects represented by 3D neural radiance fields. These methods handle multiple t… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.12831  [pdf, other

    cs.CV cs.AI cs.MM

    VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

    Authors: Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang

    Abstract: Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  4. arXiv:2406.08659  [pdf, other

    cs.CV

    Vivid-ZOO: Multi-View Video Generation with Diffusion Model

    Authors: Bing Li, Cheng Zheng, Wenxuan Zhu, Jinjie Mai, Biao Zhang, Peter Wonka, Bernard Ghanem

    Abstract: While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The new challenges posed by T2MVid generation lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution. To this end, we propose a novel diffusion-based pipeline tha… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our project page is at https://hi-zhengcheng.github.io/vividzoo/

  5. arXiv:2406.06679  [pdf, other

    cs.CV

    PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation

    Authors: Zhenyu Li, Shariq Farooq Bhat, Peter Wonka

    Abstract: This paper introduces PatchRefiner, an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. While depth estimation is crucial for applications such as autonomous driving, 3D generative modeling, and 3D reconstruction, achieving accurate high-resolution depth in real-world scenarios is challenging due to the constraints of existing architectures a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  6. arXiv:2406.00347  [pdf, other

    cs.CV

    E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network

    Authors: Hanxiao Wang, Mingyang Zhao, Weize Quan, Zhen Chen, Dong-ming Yan, Peter Wonka

    Abstract: Point cloud normal estimation is a fundamental task in 3D geometry processing. While recent learning-based methods achieve notable advancements in normal prediction, they often overlook the critical aspect of equivariance. This results in inefficient learning of symmetric patterns. To address this issue, we propose E3-Net to achieve equivariance for normal estimation. We introduce an efficient ran… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2405.16947  [pdf, other

    cs.CV

    Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

    Authors: Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan, Adam Kortylewski, Christian Theobalt, Peter Wonka

    Abstract: We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, w… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project webpage: https://qianwangx.github.io/VidSeg_diffusion/

  8. arXiv:2405.15188  [pdf, other

    cs.CV

    PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction

    Authors: Bingchen Yang, Haiyong Jiang, Hao Pan, Peter Wonka, Jun Xiao, Guosheng Lin

    Abstract: Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one ste… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2403.12585  [pdf, other

    cs.CV

    LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing

    Authors: Yazeed Alharbi, Peter Wonka

    Abstract: We present a novel, training-free approach for textual editing of real images using diffusion models. Unlike prior methods that rely on computationally expensive finetuning, our approach leverages LAtent SPatial Alignment (LASPA) to efficiently preserve image details. We demonstrate how the diffusion process is amenable to spatial guidance using a reference image, leading to semantically coherent… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  10. arXiv:2402.05803  [pdf, other

    cs.CV cs.GR

    AvatarMMC: 3D Head Avatar Generation and Editing with Multi-Modal Conditioning

    Authors: Wamiq Reyaz Para, Abdelrahman Eldesokey, Zhenyu Li, Pradyumna Reddy, Jiankang Deng, Peter Wonka

    Abstract: We introduce an approach for 3D head avatar generation and editing with multi-modal conditioning based on a 3D Generative Adversarial Network (GAN) and a Latent Diffusion Model (LDM). 3D GANs can generate high-quality head avatars given a single or no condition. However, it is challenging to generate samples that adhere to multiple conditions of different modalities. On the other hand, LDMs excel… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  11. arXiv:2401.03395  [pdf, other

    cs.CV

    Deep Learning-based Image and Video Inpainting: A Survey

    Authors: Weize Quan, Jiaxi Chen, Yanli Liu, Dong-Ming Yan, Peter Wonka

    Abstract: Image and video inpainting is a classic problem in computer vision and computer graphics, aiming to fill in the plausible and realistic content in the missing areas of images and videos. With the advance of deep learning, this problem has achieved significant progress recently. The goal of this paper is to comprehensively review the deep learning-based methods for image and video inpainting. Speci… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: accepted to IJCV

  12. arXiv:2312.08871  [pdf, other

    cs.CV

    VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

    Authors: Jian Shi, Peter Wonka

    Abstract: We present \textit{VoxelKP}, a novel fully sparse network architecture tailored for human keypoint estimation in LiDAR data. The key challenge is that objects are distributed sparsely in 3D space, while human keypoint detection requires detailed local information wherever humans are present. We propose four novel ideas in this paper. First, we propose sparse selective kernels to capture multi-scal… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  13. arXiv:2312.08548  [pdf, other

    cs.CV

    EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

    Authors: Mykola Lavreniuk, Shariq Farooq Bhat, Matthias Müller, Peter Wonka

    Abstract: This work presents the network architecture EVP (Enhanced Visual Perception). EVP builds on the previous work VPD which paved the way to use the Stable Diffusion network for computer vision tasks. We propose two major enhancements. First, we develop the Inverse Multi-Attentive Feature Refinement (IMAFR) module which enhances feature learning capabilities by aggregating spatial information from hig… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  14. arXiv:2312.07133  [pdf, other

    cs.CV cs.LG

    LatentMan: Generating Consistent Animated Characters using Image Diffusion Models

    Authors: Abdelrahman Eldesokey, Peter Wonka

    Abstract: We propose a zero-shot approach for generating consistent videos of animated characters based on Text-to-Image (T2I) diffusion models. Existing Text-to-Video (T2V) methods are expensive to train and require large-scale video datasets to produce diverse characters and motions. At the same time, their zero-shot alternatives fail to produce temporally consistent videos with continuous motion. We stri… ▽ More

    Submitted 2 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: CVPRW 2024. Project page: https://abdo-eldesokey.github.io/latentman/

  15. arXiv:2312.04654  [pdf, other

    cs.CV cs.AI cs.GR

    NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

    Authors: Savva Ignatyev, Daniil Selikhanovych, Oleg Voynov, Yiqun Wang, Peter Wonka, Stamatios Lefkimmiatis, Evgeny Burnaev

    Abstract: We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to comple… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  16. arXiv:2312.03079  [pdf, other

    cs.CV cs.GR

    LooseControl: Lifting ControlNet for Generalized Depth Conditioning

    Authors: Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka

    Abstract: We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth-conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enable… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  17. arXiv:2312.02284  [pdf, other

    cs.CV

    PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

    Authors: Zhenyu Li, Shariq Farooq Bhat, Peter Wonka

    Abstract: Single image depth estimation is a foundational task in computer vision and generative modeling. However, prevailing depth estimation models grapple with accommodating the increasing resolutions commonplace in today's consumer cameras and devices. Existing high-resolution strategies show promise, but they often face limitations, ranging from error propagation to the loss of high-frequency details.… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  18. arXiv:2311.18113  [pdf, other

    cs.CV cs.GR

    Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features

    Authors: Thomas Wimmer, Peter Wonka, Maks Ovsjanikov

    Abstract: With the immense growth of dataset sizes and computing resources in recent years, so-called foundation models have become popular in NLP and vision tasks. In this work, we propose to explore foundation models for the task of keypoint detection on 3D shapes. A unique characteristic of keypoint detection is that it requires semantic and geometric awareness while demanding high localization accuracy.… ▽ More

    Submitted 27 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024, Project page: https://wimmerth.github.io/back-to-3d.html

  19. arXiv:2311.17984  [pdf, other

    cs.CV

    4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More

    Submitted 26 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024; Project page: https://sherwinbahmani.github.io/4dfy

  20. arXiv:2311.15435  [pdf, other

    cs.CV cs.GR cs.LG

    Functional Diffusion

    Authors: Biao Zhang, Peter Wonka

    Abstract: We propose a new class of generative diffusion models, called functional diffusion. In contrast to previous work, functional diffusion works on samples that are represented by functions with a continuous domain. Functional diffusion can be seen as an extension of classical diffusion models to an infinite-dimensional domain. Functional diffusion is very versatile as images, videos, audio, 3D shapes… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: For the project site, see https://1zb.github.io/functional-diffusion/

  21. ART-Owen Scrambling

    Authors: Abdalla G. M. Ahmed, Matt Pharr, Peter Wonka

    Abstract: We present a novel algorithm for implementing Owen-scrambling, combining the generation and distribution of the scrambling bits in a single self-contained compact process. We employ a context-free grammar to build a binary tree of symbols, and equip each symbol with a scrambling code that affects all descendant nodes. We nominate the grammar of adaptive regular tiles (ART) derived from the repetit… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: To appear at SIGGRAPH Asia 2023

  22. arXiv:2310.18511  [pdf, other

    cs.CV cs.AI

    3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition

    Authors: Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny

    Abstract: In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes… ▽ More

    Submitted 12 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: https://3dcompat-dataset.org/v2/

  23. arXiv:2310.10640  [pdf, other

    cs.CV

    LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

    Authors: Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

    Abstract: Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in generating images from short, single-object descriptions, these models often struggle to faithfully capture all the nuanced details within longer and more elaborate text… ▽ More

    Submitted 25 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  24. arXiv:2310.08471  [pdf, other

    cs.CV cs.GR

    WinSyn: A High Resolution Testbed for Synthetic Data

    Authors: Tom Kelly, John Femiani, Peter Wonka

    Abstract: We present WinSyn, a unique dataset and testbed for creating high-quality synthetic data with procedural modeling techniques. The dataset contains high-resolution photographs of windows, selected from locations around the world, with 89,318 individual window crops showcasing diverse geometric and material characteristics. We evaluate a procedural model by training semantic segmentation networks on… ▽ More

    Submitted 28 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: cvpr version

  25. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  26. arXiv:2306.17843  [pdf, other

    cs.CV

    Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

    Authors: Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

    Abstract: We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing… ▽ More

    Submitted 23 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: webpage: https://guochengqian.github.io/project/magic123/

  27. arXiv:2306.06925  [pdf, other

    cs.GR math.CO

    Analysis and Synthesis of Digital Dyadic Sequences

    Authors: Abdalla G. M. Ahmed, Mikhail Skopenkov, Markus Hadwiger, Peter Wonka

    Abstract: We explore the space of matrix-generated (0, m, 2)-nets and (0, 2)-sequences in base 2, also known as digital dyadic nets and sequences. In computer graphics, they are arguably leading the competition for use in rendering. We provide a complete characterization of the design space and count the possible number of constructions with and without considering possible reorderings of the point set. Bas… ▽ More

    Submitted 23 September, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: 17 pages, 11 figures. Minor improvement of exposition; references to earlier proofs of Theorems 3.1 and 3.3 added

  28. arXiv:2306.03253  [pdf, other

    cs.CV

    Zero-Shot 3D Shape Correspondence

    Authors: Ahmed Abdelreheem, Abdelrahman Eldesokey, Maks Ovsjanikov, Peter Wonka

    Abstract: We propose a novel zero-shot approach to computing correspondences between 3D shapes. Existing approaches mainly focus on isometric and near-isometric shape pairs (e.g., human vs. human), but less attention has been given to strongly non-isometric and inter-class shape matching (e.g., human vs. cow). To this end, we introduce a fully automatic method that exploits the exceptional reasoning capabil… ▽ More

    Submitted 27 September, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Project webpage: https://samir55.github.io/3dshapematch/

  29. arXiv:2305.18047  [pdf, other

    cs.CV

    InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

    Authors: Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka

    Abstract: Recent works have explored text-guided image editing using diffusion models and generated edited images based on text prompts. However, the models struggle to accurately locate the regions to be edited and faithfully perform precise edits. In this work, we propose a framework termed InstructEdit that can do fine-grained editing based on user instructions. Our proposed framework has three component… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Project page: https://qianwangx.github.io/InstructEdit/

  30. arXiv:2305.17929  [pdf, other

    cs.CV cs.AI cs.GR

    Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

    Authors: Yue Fan, Ivan Skorokhodov, Oleg Voynov, Savva Ignatyev, Evgeny Burnaev, Peter Wonka, Yiqun Wang

    Abstract: We develop a method that recovers the surface, materials, and illumination of a scene from its posed multi-view images. In contrast to prior work, it does not require any additional data and can handle glossy objects or bright lighting. It is a progressive inverse rendering approach, which consists of three stages. First, we reconstruct the scene radiance and signed distance function (SDF) with ou… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 12 pages, 10 figures. Project page: https://authors-hub.github.io/Factored-NeuS

  31. arXiv:2305.05594  [pdf, other

    cs.CV cs.AI cs.GR

    PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces

    Authors: Yiqun Wang, Ivan Skorokhodov, Peter Wonka

    Abstract: A signed distance function (SDF) parametrized by an MLP is a common ingredient of neural surface reconstruction. We build on the successful recent method NeuS to extend it by three new components. The first component is to borrow the tri-plane representation from EG3D and represent signed distance fields as a mixture of tri-planes and MLPs instead of representing it with MLPs only. Using tri-plane… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: CVPR 2023; 20 Pages; Project page: \url{https://github.com/yiqun-wang/PET-NeuS}

  32. arXiv:2304.04909  [pdf, other

    cs.CV

    SATR: Zero-Shot Semantic Segmentation of 3D Shapes

    Authors: Ahmed Abdelreheem, Ivan Skorokhodov, Maks Ovsjanikov, Peter Wonka

    Abstract: We explore the task of zero-shot semantic segmentation of 3D shapes by using large-scale off-the-shelf 2D image recognition models. Surprisingly, we find that modern zero-shot 2D object detectors are better suited for this task than contemporary text/image similarity predictors or even zero-shot 2D segmentation networks. Our key finding is that it is possible to extract accurate 3D segmentation ma… ▽ More

    Submitted 20 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Project webpage: https://samir55.github.io/SATR/

  33. arXiv:2303.16765  [pdf, other

    cs.CV

    MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

    Authors: Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka

    Abstract: Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noi… ▽ More

    Submitted 30 March, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Project page: https://github.com/QianWangX/MDP-Diffusion

  34. arXiv:2303.15893  [pdf, other

    cs.CV cs.GR cs.LG

    VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs

    Authors: Anna Frühstück, Nikolaos Sarafianos, Yuanlu Xu, Peter Wonka, Tony Tung

    Abstract: We introduce VIVE3D, a novel approach that extends the capabilities of image-based 3D GANs to video editing and is able to represent the input video in an identity-preserving and temporally consistent way. We propose two new building blocks. First, we introduce a novel GAN inversion technique specifically tailored to 3D GANs by jointly embedding multiple frames and optimizing for the camera parame… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project webpage and video available at http://afruehstueck.github.io/vive3D

  35. arXiv:2303.14706  [pdf, other

    cs.CV

    BlobGAN-3D: A Spatially-Disentangled 3D-Aware Generative Model for Indoor Scenes

    Authors: Qian Wang, Yiqun Wang, Michael Birsak, Peter Wonka

    Abstract: 3D-aware image synthesis has attracted increasing interest as it models the 3D nature of our real world. However, performing realistic object-level editing of the generated images in the multi-object scenario still remains a challenge. Recently, a 2D GAN termed BlobGAN has demonstrated great multi-object editing capabilities on real-world indoor scene datasets. In this work, we propose BlobGAN-3D,… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

  36. arXiv:2303.01416  [pdf, other

    cs.CV cs.AI cs.GR

    3D generation on ImageNet

    Authors: Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

    Abstract: Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene. This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses. In this work, we develop a 3D gen… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 (Oral)

    Journal ref: ICLR 2023

  37. arXiv:2302.14696  [pdf, other

    cs.CV

    Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection

    Authors: Jian Shi, Pengyi Zhang, Ni Zhang, Hakim Ghazzai, Peter Wonka

    Abstract: Medical imaging often contains critical fine-grained features, such as tumors or hemorrhages, crucial for diagnosis yet potentially too subtle for detection with conventional methods. In this paper, we introduce \textit{DIA}, dissolving is amplifying. DIA is a fine-grained anomaly detection framework for medical images. First, we introduce \textit{dissolving transformations}. We employ diffusion w… ▽ More

    Submitted 6 July, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

  38. arXiv:2302.12288  [pdf, other

    cs.CV

    ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

    Authors: Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller

    Abstract: This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while mainta… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  39. arXiv:2301.11445  [pdf, other

    cs.CV cs.GR

    3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models

    Authors: Biao Zhang, Jiapeng Tang, Matthias Niessner, Peter Wonka

    Abstract: We introduce 3DShape2VecSet, a novel shape representation for neural fields designed for generative diffusion models. Our shape representation can encode 3D shapes given as surface models or point clouds, and represents them as neural fields. The concept of neural fields has previously been combined with a global latent vector, a regular grid of latent vectors, or an irregular grid of latent vecto… ▽ More

    Submitted 1 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by SIGGRAPH 2023 (Journal Track), Project website: https://1zb.github.io/3DShape2VecSet/, Project demo: https://youtu.be/KKQsQccpBFk

  40. arXiv:2301.02700  [pdf, other

    cs.CV cs.GR

    3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

    Authors: Rameen Abdal, Hsin-Ying Lee, Peihao Zhu, Menglei Chai, Aliaksandr Siarohin, Peter Wonka, Sergey Tulyakov

    Abstract: Modern 3D-GANs synthesize geometry and texture by training on large-scale datasets with a consistent structure. Training such models on stylized, artistic data, with often unknown, highly variable geometry, and camera information has not yet been shown possible. Can we train a 3D GAN on such artistic data, while maintaining multi-view consistency and texture quality? To this end, we propose an ada… ▽ More

    Submitted 26 March, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

    Comments: Project Page: https://rameenabdal.github.io/3DAvatarGAN/

  41. arXiv:2212.06250  [pdf, other

    cs.CV

    ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes

    Authors: Ahmed Abdelreheem, Kyle Olszewski, Hsin-Ying Lee, Peter Wonka, Panos Achlioptas

    Abstract: The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit c… ▽ More

    Submitted 1 April, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: The project's webpage is https://scanents3d.github.io/

  42. arXiv:2210.11419  [pdf, other

    cs.CV

    GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network

    Authors: Jheng-Wei Su, Chi-Han Peng, Peter Wonka, Hung-Kuo Chu

    Abstract: Reconstructing 3D layouts from multiple $360^{\circ}$ panoramas has received increasing attention recently as estimating a complete layout of a large-scale and complex room from a single panorama is very difficult. The state-of-the-art method, called PSMNet, introduces the first learning-based framework that jointly estimates the room layout and registration given a pair of panoramas. However, PSM… ▽ More

    Submitted 21 October, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

  43. arXiv:2209.00281  [pdf, other

    cs.LG

    Large-Scale Auto-Regressive Modeling Of Street Networks

    Authors: Michael Birsak, Tom Kelly, Wamiq Para, Peter Wonka

    Abstract: We present a novel generative method for the creation of city-scale road layouts. While the output of recent methods is limited in both size of the covered area and diversity, our framework produces large traversable graphs of high quality consisting of vertices and edges representing complete street networks covering 400 square kilometers or more. While our framework can process general 2D embedd… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

  44. arXiv:2208.12064  [pdf, other

    cs.LG eess.SP

    Assesment of material layers in building walls using GeoRadar

    Authors: Ildar Gilmutdinov, Ingrid Schloegel, Alois Hinterleitner, Peter Wonka, Michael Wimmer

    Abstract: Assessing the structure of a building with non-invasive methods is an important problem. One of the possible approaches is to use GeoRadar to examine wall structures by analyzing the data obtained from the scans. We propose a data-driven approach to evaluate the material composition of a wall from its GPR radargrams. In order to generate training data, we use gprMax to model the scanning process.… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

  45. arXiv:2208.11948  [pdf, other

    cs.CV

    Learning to Construct 3D Building Wireframes from 3D Line Clouds

    Authors: Yicheng Luo, Jing Ren, Xuefei Zhe, Di Kang, Yajing Xu, Peter Wonka, Linchao Bao

    Abstract: Line clouds, though under-investigated in the previous work, potentially encode more compact structural information of buildings than point clouds extracted from multi-view images. In this work, we propose the first network to process line clouds for building wireframe abstraction. The network takes a line cloud as input , i.e., a nonstructural and unordered set of 3D line segments extracted from… ▽ More

    Submitted 4 November, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: 10 pages, 6 figures

  46. arXiv:2206.10535  [pdf, other

    cs.CV cs.AI cs.LG

    EpiGRAF: Rethinking training of 3D GANs

    Authors: Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, Peter Wonka

    Abstract: A very recent trend in generative modeling is building 3D-aware generators from 2D image collections. To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions. During the past months, there appeared more than 10 works that address this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature… ▽ More

    Submitted 15 December, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  47. arXiv:2206.07850  [pdf, other

    cs.CV cs.GR

    HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details

    Authors: Yiqun Wang, Ivan Skorokhodov, Peter Wonka

    Abstract: Neural rendering can be used to reconstruct implicit representations of shapes without 3D supervision. However, current neural surface reconstruction methods have difficulty learning high-frequency geometry details, so the reconstructed shapes are often over-smoothed. We develop HF-NeuS, a novel method to improve the quality of surface reconstruction in neural rendering. We follow recent work to m… ▽ More

    Submitted 22 September, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: To appear in NeurIPS 2022. Project page: https://github.com/yiqun-wang/HFS

  48. arXiv:2206.07798  [pdf, other

    cs.GR cs.LG stat.AP

    Gaussian Blue Noise

    Authors: Abdalla G. M. Ahmed, Jing Ren, Peter Wonka

    Abstract: Among the various approaches for producing point distributions with blue noise spectrum, we argue for an optimization framework using Gaussian kernels. We show that with a wise selection of optimization parameters, this approach attains unprecedented quality, provably surpassing the current state of the art attained by the optimal transport (BNOT) approach. Further, we show that our algorithm scal… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  49. RLSS: A Deep Reinforcement Learning Algorithm for Sequential Scene Generation

    Authors: Azimkhon Ostonov, Peter Wonka, Dominik L. Michels

    Abstract: We present RLSS: a reinforcement learning algorithm for sequential scene generation. This is based on employing the proximal policy optimization (PPO) algorithm for generative problems. In particular, we consider how to effectively reduce the action space by including a greedy search algorithm in the learning process. Our experiments demonstrate that our method converges for a relatively large num… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Accepted at the IEEE Winter Conference on Applications of Computer Vision, WACV 2022

    Journal ref: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 2723-2732

  50. arXiv:2205.14657  [pdf, other

    cs.CV cs.GR cs.LG

    COFS: Controllable Furniture layout Synthesis

    Authors: Wamiq Reyaz Para, Paul Guerrero, Niloy Mitra, Peter Wonka

    Abstract: Scalable generation of furniture layouts is essential for many applications in virtual reality, augmented reality, game development and synthetic data generation. Many existing methods tackle this problem as a sequence generation problem which imposes a specific ordering on the elements of the layout making such methods impractical for interactive editing or scene completion. Additionally, most me… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

    Comments: Initial Version