-
DreamWalk: Style Space Exploration using Diffusion Guidance
Authors:
Michelle Shu,
Charles Herrmann,
Richard Strong Bowen,
Forrester Cole,
Ramin Zabih
Abstract:
Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained cont…
▽ More
Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained control over the style and substance specified by the prompt, for example to adjust the intensity of styles in different regions of the image (Figure 1). Our approach is to decompose the text prompt into conceptual elements, and apply a separate guidance term for each element in a single diffusion process. We introduce guidance scale functions to control when in the diffusion process and \emph{where} in the image to intervene. Since the method is based solely on adjusting diffusion guidance, it does not require fine-tuning or manipulating the internal layers of the diffusion model's neural network, and can be used in conjunction with LoRA- or DreamBooth-trained models (Figure2). Project page: https://mshu1.github.io/dreamwalk.github.io/
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Dimensions of Motion: Monocular Prediction through Flow Subspaces
Authors:
Richard Strong Bowen,
Richard Tucker,
Ramin Zabih,
Noah Snavely
Abstract:
We introduce a way to learn to estimate a scene representation from a single image by predicting a low-dimensional subspace of optical flow for each training example, which encompasses the variety of possible camera and object movement. Supervision is provided by a novel loss which measures the distance between this predicted flow subspace and an observed optical flow. This provides a new approach…
▽ More
We introduce a way to learn to estimate a scene representation from a single image by predicting a low-dimensional subspace of optical flow for each training example, which encompasses the variety of possible camera and object movement. Supervision is provided by a novel loss which measures the distance between this predicted flow subspace and an observed optical flow. This provides a new approach to learning scene representation tasks, such as monocular depth prediction or instance segmentation, in an unsupervised fashion using in-the-wild input videos without requiring camera poses, intrinsics, or an explicit multi-view stereo step. We evaluate our method in multiple settings, including an indoor depth prediction task where it achieves comparable performance to recent methods trained with more supervision.
△ Less
Submitted 26 October, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Deep survival analysis with longitudinal X-rays for COVID-19
Authors:
Michelle Shu,
Richard Strong Bowen,
Charles Herrmann,
Gengmo Qi,
Michele Santacatterina,
Ramin Zabih
Abstract:
Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds. However, classical techniques like the Cox model cannot directly incorporate images due to their high dimensionality. We propose a deep learning approach that naturally incorporates multiple, time-dependent imaging studies as well as non-imaging data into time-to-event analysis. Our techniqu…
▽ More
Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds. However, classical techniques like the Cox model cannot directly incorporate images due to their high dimensionality. We propose a deep learning approach that naturally incorporates multiple, time-dependent imaging studies as well as non-imaging data into time-to-event analysis. Our techniques are benchmarked on a clinical dataset of 1,894 COVID-19 patients, and show that image sequences significantly improve predictions. For example, classical time-to-event methods produce a concordance error of around 30-40% for predicting hospital admission, while our error is 25% without images and 20% with multiple X-rays included. Ablation studies suggest that our models are not learning spurious features such as scanner artifacts. While our focus and evaluation is on COVID-19, the methods we develop are broadly applicable.
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
Object-centered image stitching
Authors:
Charles Herrmann,
Chen Wang,
Richard Strong Bowen,
Emil Keyder,
Ramin Zabih
Abstract:
Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams. As described in [1], the seam finding phase attempts to place seams between pixels where the transition between source im…
▽ More
Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams. As described in [1], the seam finding phase attempts to place seams between pixels where the transition between source images is not noticeable. Here, we observe that the most problematic failures of this approach occur when objects are cropped, omitted, or duplicated. We therefore take an object-centered approach to the problem, leveraging recent advances in object detection [2,3,4]. We penalize candidate solutions with this class of error by modifying the energy function used in the seam finding stage. This produces substantially more realistic stitching results on challenging imagery. In addition, these methods can be used to determine when there is non-recoverable occlusion in the input data, and also suggest a simple evaluation metric that can be used to evaluate the output of stitching algorithms.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Robust image stitching with multiple registrations
Authors:
Charles Herrmann,
Chen Wang,
Richard Strong Bowen,
Emil Keyder,
Michael Krainin,
Ce Liu,
Ramin Zabih
Abstract:
Panorama creation is one of the most widely deployed techniques in computer vision. In addition to industry applications such as Google Street View, it is also used by millions of consumers in smartphones and other cameras. Traditionally, the problem is decomposed into three phases: registration, which picks a single transformation of each source image to align it to the other inputs, seam finding…
▽ More
Panorama creation is one of the most widely deployed techniques in computer vision. In addition to industry applications such as Google Street View, it is also used by millions of consumers in smartphones and other cameras. Traditionally, the problem is decomposed into three phases: registration, which picks a single transformation of each source image to align it to the other inputs, seam finding, which selects a source image for each pixel in the final result, and blending, which fixes minor visual artifacts. Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion. We propose instead the use of multiple registrations, permitting regions of the image at different depths to be captured with greater accuracy. MRF inference techniques naturally extend to seam finding over multiple registrations, and we show here that their energy functions can be readily modified with new terms that discourage duplication and tearing, common problems that are exacerbated by the use of multiple registrations. Our techniques are closely related to layer-based stereo, and move image stitching closer to explicit scene modeling. Experimental evidence demonstrates that our techniques often generate significantly better panoramas when there is substantial motion or parallax.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Learning to Autofocus
Authors:
Charles Herrmann,
Richard Strong Bowen,
Neal Wadhwa,
Rahul Garg,
Qiurui He,
Jonathan T. Barron,
Ramin Zabih
Abstract:
Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance. We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning. Our dataset is labeled with per-pixel depths obtained from multi-view stereo, following "Learning single camera depth estimation using dual-pixels". Using this data…
▽ More
Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance. We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning. Our dataset is labeled with per-pixel depths obtained from multi-view stereo, following "Learning single camera depth estimation using dual-pixels". Using this dataset, we apply modern deep classification models and an ordinal regression loss to obtain an efficient learning-based autofocus technique. We demonstrate that our approach provides a significant improvement compared with previous learned and non-learned methods: our model reduces the mean absolute error by a factor of 3.6 over the best comparable baseline algorithm. Our dataset and code are publicly available.
△ Less
Submitted 2 May, 2020; v1 submitted 25 April, 2020;
originally announced April 2020.
-
Channel selection using Gumbel Softmax
Authors:
Charles Herrmann,
Richard Strong Bowen,
Ramin Zabih
Abstract:
Important applications such as mobile computing require reducing the computational costs of neural network inference. Ideally, applications would specify their preferred tradeoff between accuracy and speed, and the network would optimize this end-to-end, using classification error to remove parts of the network. Increasing speed can be done either during training - e.g., pruning filters - or durin…
▽ More
Important applications such as mobile computing require reducing the computational costs of neural network inference. Ideally, applications would specify their preferred tradeoff between accuracy and speed, and the network would optimize this end-to-end, using classification error to remove parts of the network. Increasing speed can be done either during training - e.g., pruning filters - or during inference - e.g., conditionally executing a subset of the layers. We propose a single end-to-end framework that can improve inference efficiency in both settings. We use a combination of batch activation loss and classification loss, and Gumbel reparameterization to learn network structure. We train end-to-end, and the same technique supports pruning as well as conditional computation. We obtain promising experimental results for ImageNet classification with ResNet (45-52% less computation).
△ Less
Submitted 23 November, 2020; v1 submitted 10 December, 2018;
originally announced December 2018.
-
Group-Theoretic Partial Matrix Multiplication
Authors:
Richard Strong Bowen,
Bo Chen,
Hendrik Orem,
Martijn van Schaardenburg
Abstract:
A generalization of recent group-theoretic matrix multiplication algorithms to an analogue of the theory of partial matrix multiplication is presented. We demonstrate that the added flexibility of this approach can in some cases improve upper bounds on the exponent of matrix multiplication yielded by group-theoretic full matrix multiplication. The group theory behind our partial matrix multiplic…
▽ More
A generalization of recent group-theoretic matrix multiplication algorithms to an analogue of the theory of partial matrix multiplication is presented. We demonstrate that the added flexibility of this approach can in some cases improve upper bounds on the exponent of matrix multiplication yielded by group-theoretic full matrix multiplication. The group theory behind our partial matrix multiplication algorithms leads to the problem of maximizing a quantity representing the "fullness" of a given partial matrix pattern. This problem is shown to be NP-hard, and two algorithms, one optimal and another non-optimal but polynomial-time, are given for solving it.
△ Less
Submitted 13 February, 2009;
originally announced February 2009.