Search | arXiv e-print repository

A3D: Does Diffusion Dream about 3D Alignment?

Authors: Savva Ignatyev, Nina Konovalova, Daniil Selikhanovych, Nikolay Patakin, Oleg Voynov, Dmitry Senushkin, Alexander Filippov, Anton Konushin, Peter Wonka, Evgeny Burnaev

Abstract: We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality objects represented by 3D neural radiance fields. These methods handle multiple t… ▽ More We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded in distilling the knowledge from 2D diffusion models to high-quality objects represented by 3D neural radiance fields. These methods handle multiple text queries separately, and therefore, the resulting objects have a high variability in object pose and structure. However, in some applications such as geometry editing, it is desirable to obtain aligned objects. In order to achieve alignment, we propose to optimize the continuous trajectories between the aligned objects, by modeling a space of linear pairwise interpolations of the textual embeddings with a single NeRF representation. We demonstrate that similar objects, consisting of semantically corresponding parts, can be well aligned in 3D space without costly modifications to the generation process. We provide several practical scenarios including mesh editing and object hybridization that benefit from geometry alignment and experimentally demonstrate the efficiency of our method. https://voyleg.github.io/a3d/ △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.14449 [pdf, other]

Adversarial Schrödinger Bridge Matching

Authors: Nikita Gushchin, Daniil Selikhanovych, Sergei Kholkin, Evgeny Burnaev, Alexander Korotin

Abstract: The Schrödinger Bridge (SB) problem offers a powerful framework for combining optimal transport and diffusion models. A promising recent approach to solve the SB problem is the Iterative Markovian Fitting (IMF) procedure, which alternates between Markovian and reciprocal projections of continuous-time stochastic processes. However, the model built by the IMF procedure has a long inference time due… ▽ More The Schrödinger Bridge (SB) problem offers a powerful framework for combining optimal transport and diffusion models. A promising recent approach to solve the SB problem is the Iterative Markovian Fitting (IMF) procedure, which alternates between Markovian and reciprocal projections of continuous-time stochastic processes. However, the model built by the IMF procedure has a long inference time due to using many steps of numerical solvers for stochastic differential equations. To address this limitation, we propose a novel Discrete-time IMF (D-IMF) procedure in which learning of stochastic processes is replaced by learning just a few transition probabilities in discrete time. Its great advantage is that in practice it can be naturally implemented using the Denoising Diffusion GAN (DD-GAN), an already well-established adversarial generative modeling technique. We show that our D-IMF procedure can provide the same quality of unpaired domain translation as the IMF, using only several generation steps instead of hundreds. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2312.04654 [pdf, other]

NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion

Authors: Savva Ignatyev, Daniil Selikhanovych, Oleg Voynov, Yiqun Wang, Peter Wonka, Stamatios Lefkimmiatis, Evgeny Burnaev

Abstract: We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to comple… ▽ More We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured. Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner. We introduce three components. First, we suggest employing normal maps as a pure geometric representation for SDS instead of color renderings which are entangled with the appearance information. Second, we introduce the freezing of the SDS noise during training which results in more coherent gradients and better convergence. Third, we propose Multi-View SDS as a way to condition the generation of the non-observable part of the surface without fine-tuning or making changes to the underlying 2D Stable Diffusion model. We evaluate our approach on the BlendedMVS dataset demonstrating significant qualitative and quantitative improvements over competing methods. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2301.12874 [pdf, other]

Extremal Domain Translation with Neural Optimal Transport

Authors: Milena Gazdieva, Alexander Korotin, Daniil Selikhanovych, Evgeny Burnaev

Abstract: In many unpaired image domain translation problems, e.g., style transfer or super-resolution, it is important to keep the translated image similar to its respective input image. We propose the extremal transport (ET) which is a mathematical formalization of the theoretically best possible unpaired translation between a pair of domains w.r.t. the given similarity function. Inspired by the recent ad… ▽ More In many unpaired image domain translation problems, e.g., style transfer or super-resolution, it is important to keep the translated image similar to its respective input image. We propose the extremal transport (ET) which is a mathematical formalization of the theoretically best possible unpaired translation between a pair of domains w.r.t. the given similarity function. Inspired by the recent advances in neural optimal transport (OT), we propose a scalable algorithm to approximate ET maps as a limit of partial OT maps. We test our algorithm on toy examples and on the unpaired image-to-image translation task. The code is publicly available at https://github.com/milenagazdieva/ExtremalNeuralOptimalTransport △ Less

Submitted 2 November, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2205.15269 [pdf, other]

Kernel Neural Optimal Transport

Authors: Alexander Korotin, Daniil Selikhanovych, Evgeny Burnaev

Abstract: We study the Neural Optimal Transport (NOT) algorithm which uses the general optimal transport formulation and learns stochastic transport plans. We show that NOT with the weak quadratic cost might learn fake plans which are not optimal. To resolve this issue, we introduce kernel weak quadratic costs. We show that they provide improved theoretical guarantees and practical performance. We test NOT… ▽ More We study the Neural Optimal Transport (NOT) algorithm which uses the general optimal transport formulation and learns stochastic transport plans. We show that NOT with the weak quadratic cost might learn fake plans which are not optimal. To resolve this issue, we introduce kernel weak quadratic costs. We show that they provide improved theoretical guarantees and practical performance. We test NOT with kernel costs on the unpaired image-to-image translation task. △ Less

Submitted 1 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2201.12220 [pdf, other]

Neural Optimal Transport

Authors: Alexander Korotin, Daniil Selikhanovych, Evgeny Burnaev

Abstract: We present a novel neural-networks-based algorithm to compute optimal transport maps and plans for strong and weak transport costs. To justify the usage of neural networks, we prove that they are universal approximators of transport plans between probability distributions. We evaluate the performance of our optimal transport algorithm on toy examples and on the unpaired image-to-image translation. We present a novel neural-networks-based algorithm to compute optimal transport maps and plans for strong and weak transport costs. To justify the usage of neural networks, we prove that they are universal approximators of transport plans between probability distributions. We evaluate the performance of our optimal transport algorithm on toy examples and on the unpaired image-to-image translation. △ Less

Submitted 1 March, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

Journal ref: The 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2112.05505 [pdf, other]

DeepRLS: A Recurrent Network Architecture with Least Squares Implicit Layers for Non-blind Image Deconvolution

Authors: Iaroslav Koshelev, Daniil Selikhanovych, Stamatios Lefkimmiatis

Abstract: In this work, we study the problem of non-blind image deconvolution and propose a novel recurrent network architecture that leads to very competitive restoration results of high image quality. Motivated by the computational efficiency and robustness of existing large scale linear solvers, we manage to express the solution to this problem as the solution of a series of adaptive non-negative least-s… ▽ More In this work, we study the problem of non-blind image deconvolution and propose a novel recurrent network architecture that leads to very competitive restoration results of high image quality. Motivated by the computational efficiency and robustness of existing large scale linear solvers, we manage to express the solution to this problem as the solution of a series of adaptive non-negative least-squares problems. This gives rise to our proposed Recurrent Least Squares Deconvolution Network (RLSDN) architecture, which consists of an implicit layer that imposes a linear constraint between its input and output. By design, our network manages to serve two important purposes simultaneously. The first is that it implicitly models an effective image prior that can adequately characterize the set of natural images, while the second is that it recovers the corresponding maximum a posteriori (MAP) estimate. Experiments on publicly available datasets, comparing recent state-of-the-art methods, show that our proposed RLSDN approach achieves the best reported performance both for grayscale and color images for all tested scenarios. Furthermore, we introduce a novel training strategy that can be adopted by any network architecture that involves the solution of linear systems as part of its pipeline. Our strategy eliminates completely the need to unroll the iterations required by the linear solver and, thus, it reduces significantly the memory footprint during training. Consequently, this enables the training of deeper network architectures which can further improve the reconstruction results. △ Less

Submitted 10 December, 2021; originally announced December 2021.

ACM Class: I.4.4

arXiv:1809.00382 [pdf, other]

The global rate of convergence for optimal tensor methods in smooth convex optimization

Authors: Alexander Gasnikov, Pavel Dvurechensky, Eduard Gorbunov, Evgeniya Vorontsova, Daniil Selikhanovych, César A. Uribe

Abstract: We consider convex optimization problems with the objective function having Lipshitz-continuous $p$-th order derivative, where $p\geq 1$. We propose a new tensor method, which closes the gap between the lower $O\left(\varepsilon^{-\frac{2}{3p+1}} \right)$ and upper $O\left(\varepsilon^{-\frac{1}{p+1}} \right)$ iteration complexity bounds for this class of optimization problems. We also consider… ▽ More We consider convex optimization problems with the objective function having Lipshitz-continuous $p$-th order derivative, where $p\geq 1$. We propose a new tensor method, which closes the gap between the lower $O\left(\varepsilon^{-\frac{2}{3p+1}} \right)$ and upper $O\left(\varepsilon^{-\frac{1}{p+1}} \right)$ iteration complexity bounds for this class of optimization problems. We also consider uniformly convex functions, and show how the proposed method can be accelerated under this additional assumption. Moreover, we introduce a $p$-th order condition number which naturally arises in the complexity analysis of tensor methods under this assumption. Finally, we make a numerical study of the proposed optimal method and show that in practice it is faster than the best known accelerated tensor method. We also compare the performance of tensor methods for $p=2$ and $p=3$ and show that the 3rd-order method is superior to the 2nd-order method in practice. △ Less

Submitted 3 February, 2019; v1 submitted 2 September, 2018; originally announced September 2018.

Comments: In the current version we present a translation into English of the main derivations, which first appeared on September 2, 2018 in Russian, extend the analysis from the case of strongly convex objective to the case of uniformly convex objectives and add the numerical analysis of our results

MSC Class: 90C25; 90C30; 68Q25; 65K05; 65Y20; 68W40 ACM Class: G.1.6

Journal ref: Proc.MachineLearningResearch 99 (2019) 1374-1391

Showing 1–8 of 8 results for author: Selikhanovych, D