-
WonderWorld: Interactive 3D Scene Generation from a Single Image
Authors:
Hong-Xing Yu,
Haoyi Duan,
Charles Herrmann,
William T. Freeman,
Jiajun Wu
Abstract:
We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian…
▽ More
We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian Surfels and a guided diffusion-based depth estimation method, WonderWorld generates geometrically consistent extrapolation while significantly reducing computational time. Our framework generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. We demonstrate the potential of WonderWorld for applications in virtual reality, gaming, and creative design, where users can quickly generate and navigate immersive, potentially infinite virtual worlds from a single image. Our approach represents a significant advancement in interactive 3D scene generation, opening up new possibilities for user-driven content creation and exploration in virtual environments. We will release full code and software for reproducibility. Project website: https://WonderWorld-2024.github.io/
△ Less
Submitted 14 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
DreamWalk: Style Space Exploration using Diffusion Guidance
Authors:
Michelle Shu,
Charles Herrmann,
Richard Strong Bowen,
Forrester Cole,
Ramin Zabih
Abstract:
Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained cont…
▽ More
Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering," constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained control over the style and substance specified by the prompt, for example to adjust the intensity of styles in different regions of the image (Figure 1). Our approach is to decompose the text prompt into conceptual elements, and apply a separate guidance term for each element in a single diffusion process. We introduce guidance scale functions to control when in the diffusion process and \emph{where} in the image to intervene. Since the method is based solely on adjusting diffusion guidance, it does not require fine-tuning or manipulating the internal layers of the diffusion model's neural network, and can be used in conjunction with LoRA- or DreamBooth-trained models (Figure2). Project page: https://mshu1.github.io/dreamwalk.github.io/
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Authors:
Omer Bar-Tal,
Hila Chefer,
Omer Tov,
Charles Herrmann,
Roni Paiss,
Shiran Zada,
Ariel Ephrat,
Junhwa Hur,
Guanghui Liu,
Amit Raj,
Yuanzhen Li,
Michael Rubinstein,
Tomer Michaeli,
Oliver Wang,
Deqing Sun,
Tali Dekel,
Inbar Mosseri
Abstract:
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synth…
▽ More
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.
△ Less
Submitted 5 February, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Efficient Hybrid Zoom using Camera Fusion on Mobile Phones
Authors:
Xiaotong Wu,
Wei-Sheng Lai,
YiChang Shih,
Charles Herrmann,
Michael Krainin,
Deqing Sun,
Chia-Kai Liang
Abstract:
DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems cr…
▽ More
DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Boundary Attention: Learning to Localize Boundaries under High Noise
Authors:
Mia Gaia Polansky,
Charles Herrmann,
Junhwa Hur,
Deqing Sun,
Dor Verbin,
Todd Zickler
Abstract:
We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlapp…
▽ More
We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlapping patch within an image. It operates in a bottom-up fashion, similar to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional description of local boundary structure, a notion of spatial consistency that is learned instead of designed, and a sequence of operations that is end-to-end differentiable. We train our model using simple synthetic data and then evaluate it using photographs that were captured under low-light conditions with variable amounts of noise. We find that our method generalizes to natural images corrupted by real sensor noise, and predicts consistent boundaries under increasingly noisy conditions where other state-of-the-art methods fail.
△ Less
Submitted 18 March, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Authors:
Saurabh Saxena,
Junhwa Hur,
Charles Herrmann,
Deqing Sun,
David J. Fleet
Abstract:
While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized mult…
▽ More
While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
WonderJourney: Going from Anywhere to Everywhere
Authors:
Hong-Xing Yu,
Haoyi Duan,
Junhwa Hur,
Kyle Sargent,
Michael Rubinstein,
William T. Freeman,
Forrester Cole,
Deqing Sun,
Noah Snavely,
Jiajun Wu,
Charles Herrmann
Abstract:
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes…
▽ More
We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/
△ Less
Submitted 12 April, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
Authors:
Jiao Sun,
Deqing Fu,
Yushi Hu,
Su Wang,
Royi Rassin,
Da-Cheng Juan,
Dana Alon,
Charles Herrmann,
Sjoerd van Steenkiste,
Ranjay Krishna,
Cyrus Rashtchian
Abstract:
Despite their wide-spread success, Text-to-Image models (T2I) still struggle to produce images that are both aesthetically pleasing and faithful to the user's input text. We introduce DreamSync, a model-agnostic training algorithm by design that improves T2I models to be faithful to the text input. DreamSync builds off a recent insight from TIFA's evaluation framework -- that large vision-language…
▽ More
Despite their wide-spread success, Text-to-Image models (T2I) still struggle to produce images that are both aesthetically pleasing and faithful to the user's input text. We introduce DreamSync, a model-agnostic training algorithm by design that improves T2I models to be faithful to the text input. DreamSync builds off a recent insight from TIFA's evaluation framework -- that large vision-language models (VLMs) can effectively identify the fine-grained discrepancies between generated images and the text inputs. DreamSync uses this insight to train T2I models without any labeled data; it improves T2I models using its own generations. First, it prompts the model to generate several candidate images for a given input text. Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality. After selection, we use LoRA to iteratively finetune the T2I model to guide its generation towards the selected best generations. DreamSync does not need any additional human annotation. model architecture changes, or reinforcement learning. Despite its simplicity, DreamSync improves both the semantic alignment and aesthetic appeal of two diffusion-based T2I models, evidenced by multiple benchmarks (+1.7% on TIFA, +2.9% on DSG1K, +3.4% on VILA aesthetic) and human evaluation.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Authors:
Junyi Zhang,
Charles Herrmann,
Junhwa Hur,
Eric Chen,
Varun Jampani,
Deqing Sun,
Ming-Hsuan Yang
Abstract:
While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporatin…
▽ More
While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporating this information can markedly enhance semantic correspondence performance with simple but effective solutions in both zero-shot and supervised settings. We also construct a new challenging benchmark for semantic correspondence built from an existing animal pose estimation dataset, for both pre-training validating models. Our method achieves a [email protected] score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset, outperforming the state of the art by 5.5p and 11.0p absolute gains, respectively. Our code and datasets are publicly available at: https://telling-left-from-right.github.io/.
△ Less
Submitted 24 March, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Authors:
Kyle Sargent,
Zizhang Li,
Tanmay Shah,
Charles Herrmann,
Hong-Xing Yu,
Yunzhi Zhang,
Eric Ryan Chan,
Dmitry Lagun,
Li Fei-Fei,
Deqing Sun,
Jiajun Wu
Abstract:
We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture obje…
▽ More
We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/
△ Less
Submitted 23 April, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Substance or Style: What Does Your Image Embedding Know?
Authors:
Cyrus Rashtchian,
Charles Herrmann,
Chun-Sung Ferng,
Ayan Chakrabarti,
Dilip Krishnan,
Deqing Sun,
Da-Cheng Juan,
Andrew Tomkins
Abstract:
Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding th…
▽ More
Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
Authors:
Saurabh Saxena,
Charles Herrmann,
Junhwa Hur,
Abhishek Kar,
Mohammad Norouzi,
Deqing Sun,
David J. Fleet
Abstract:
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also…
▽ More
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, and a simple form of coarse-to-fine refinement, one can train state-of-the-art diffusion models for depth and optical flow estimation. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model, DDVM (Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\% on the KITTI optical flow benchmark, about 25\% better than the best published method. For an overview see https://diffusion-vision.github.io.
△ Less
Submitted 5 December, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence
Authors:
Junyi Zhang,
Charles Herrmann,
Junhwa Hur,
Luisa Polania Cabrera,
Varun Jampani,
Deqing Sun,
Ming-Hsuan Yang
Abstract:
Text-to-image diffusion models have made significant advances in generating and editing high-quality images. As a result, numerous approaches have explored the ability of diffusion model features to understand and process single images for downstream tasks, e.g., classification, semantic segmentation, and stylization. However, significantly less is known about what these features reveal across mul…
▽ More
Text-to-image diffusion models have made significant advances in generating and editing high-quality images. As a result, numerous approaches have explored the ability of diffusion model features to understand and process single images for downstream tasks, e.g., classification, semantic segmentation, and stylization. However, significantly less is known about what these features reveal across multiple, different images and objects. In this work, we exploit Stable Diffusion (SD) features for semantic and dense correspondence and discover that with simple post-processing, SD features can perform quantitatively similar to SOTA representations. Interestingly, the qualitative analysis reveals that SD features have very different properties compared to existing representation learning features, such as the recently released DINOv2: while DINOv2 provides sparse but accurate matches, SD features provide high-quality spatial information but sometimes inaccurate semantic matches. We demonstrate that a simple fusion of these two features works surprisingly well, and a zero-shot evaluation using nearest neighbors on these fused features provides a significant performance gain over state-of-the-art methods on benchmark datasets, e.g., SPair-71k, PF-Pascal, and TSS. We also show that these correspondences can enable interesting applications such as instance swapping in two images.
△ Less
Submitted 28 November, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
Authors:
Kyle Sargent,
Jing Yu Koh,
Han Zhang,
Huiwen Chang,
Charles Herrmann,
Pratul Srinivasan,
Jiajun Wu,
Deqing Sun
Abstract:
Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder…
▽ More
Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder into a two-stage vector-quantized autoencoder. Our Stage 1 allows for the reconstruction of an input image and the ability to change the camera position around the image, and our Stage 2 allows for the generation of new 3D scenes. VQ3D is capable of generating and reconstructing 3D-aware images from the 1000-class ImageNet dataset of 1.2 million training images. We achieve an ImageNet generation FID score of 16.8, compared to 69.8 for the next best baseline method.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Accidental Light Probes
Authors:
Hong-Xing Yu,
Samir Agarwala,
Charles Herrmann,
Richard Szeliski,
Noah Snavely,
Jiajun Wu,
Deqing Sun
Abstract:
Recovering lighting in a scene from a single image is a fundamental problem in computer vision. While a mirror ball light probe can capture omnidirectional lighting, light probes are generally unavailable in everyday images. In this work, we study recovering lighting from accidental light probes (ALPs) -- common, shiny objects like Coke cans, which often accidentally appear in daily scenes. We pro…
▽ More
Recovering lighting in a scene from a single image is a fundamental problem in computer vision. While a mirror ball light probe can capture omnidirectional lighting, light probes are generally unavailable in everyday images. In this work, we study recovering lighting from accidental light probes (ALPs) -- common, shiny objects like Coke cans, which often accidentally appear in daily scenes. We propose a physically-based approach to model ALPs and estimate lighting from their appearances in single images. The main idea is to model the appearance of ALPs by photogrammetrically principled shading and to invert this process via differentiable rendering to recover incidental illumination. We demonstrate that we can put an ALP into a scene to allow high-fidelity lighting estimation. Our model can also recover lighting for existing images that happen to contain an ALP.
△ Less
Submitted 10 June, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Self-supervised AutoFlow
Authors:
Hsin-Ping Huang,
Charles Herrmann,
Junhwa Hur,
Erika Lu,
Kyle Sargent,
Austin Stone,
Ming-Hsuan Yang,
Deqing Sun
Abstract:
Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric. Observing a strong correlation between the ground truth search metric and self-supervised losses, we introduce self-supervised AutoFlow to handle real-world videos without ground truth labels. Using self-supervised loss as t…
▽ More
Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric. Observing a strong correlation between the ground truth search metric and self-supervised losses, we introduce self-supervised AutoFlow to handle real-world videos without ground truth labels. Using self-supervised loss as the search metric, our self-supervised AutoFlow performs on par with AutoFlow on Sintel and KITTI where ground truth is available, and performs better on the real-world DAVIS dataset. We further explore using self-supervised AutoFlow in the (semi-)supervised setting and obtain competitive results against the state of the art.
△ Less
Submitted 22 May, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Disentangling Architecture and Training for Optical Flow
Authors:
Deqing Sun,
Charles Herrmann,
Fitsum Reda,
Michael Rubinstein,
David Fleet,
William T. Freeman
Abstract:
How important are training details and datasets to recent optical flow models like RAFT? And do they generalize? To explore these questions, rather than develop a new model, we revisit three prominent models, PWC-Net, IRR-PWC and RAFT, with a common set of modern training techniques and datasets, and observe significant performance gains, demonstrating the importance and generality of these traini…
▽ More
How important are training details and datasets to recent optical flow models like RAFT? And do they generalize? To explore these questions, rather than develop a new model, we revisit three prominent models, PWC-Net, IRR-PWC and RAFT, with a common set of modern training techniques and datasets, and observe significant performance gains, demonstrating the importance and generality of these training details. Our newly trained PWC-Net and IRR-PWC models show surprisingly large improvements, up to 30% versus original published results on Sintel and KITTI 2015 benchmarks. They outperform the more recent Flow1D on KITTI 2015 while being 3x faster during inference. Our newly trained RAFT achieves an Fl-all score of 4.31% on KITTI 2015, more accurate than all published optical flow methods at the time of writing. Our results demonstrate the benefits of separating the contributions of models, training techniques and datasets when analyzing performance gains of optical flow methods. Our source code will be publicly available.
△ Less
Submitted 19 September, 2022; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Kubric: A scalable dataset generator
Authors:
Klaus Greff,
Francois Belletti,
Lucas Beyer,
Carl Doersch,
Yilun Du,
Daniel Duckworth,
David J. Fleet,
Dan Gnanapragasam,
Florian Golemo,
Charles Herrmann,
Thomas Kipf,
Abhijit Kundu,
Dmitry Lagun,
Issam Laradji,
Hsueh-Ti,
Liu,
Henning Meyer,
Yishu Miao,
Derek Nowrouzezahrai,
Cengiz Oztireli,
Etienne Pot,
Noha Radwan,
Daniel Rebain,
Sara Sabour,
Mehdi S. M. Sajjadi
, et al. (10 additional authors not shown)
Abstract:
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential…
▽ More
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Pyramid Adversarial Training Improves ViT Performance
Authors:
Charles Herrmann,
Kyle Sargent,
Lu Jiang,
Ramin Zabih,
Huiwen Chang,
Ce Liu,
Dilip Krishnan,
Deqing Sun
Abstract:
Aggressive data augmentation is a key component of the strong generalization capabilities of Vision Transformer (ViT). One such data augmentation technique is adversarial training (AT); however, many prior works have shown that this often results in poor clean accuracy. In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall perf…
▽ More
Aggressive data augmentation is a key component of the strong generalization capabilities of Vision Transformer (ViT). One such data augmentation technique is adversarial training (AT); however, many prior works have shown that this often results in poor clean accuracy. In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance. We pair it with a "matched" Dropout and stochastic depth regularization, which adopts the same Dropout and stochastic depth configuration for the clean and adversarial samples. Similar to the improvements on CNNs by AdvProp (not directly applicable to ViT), our pyramid adversarial training breaks the trade-off between in-distribution accuracy and out-of-distribution robustness for ViT and related architectures. It leads to 1.82% absolute improvement on ImageNet clean accuracy for the ViT-B model when trained only on ImageNet-1K data, while simultaneously boosting performance on 7 ImageNet robustness metrics, by absolute numbers ranging from 1.76% to 15.68%. We set a new state-of-the-art for ImageNet-C (41.42 mCE), ImageNet-R (53.92%), and ImageNet-Sketch (41.04%) without extra data, using only the ViT-B/16 backbone and our pyramid adversarial training. Our code is publicly available at pyramidat.github.io.
△ Less
Submitted 2 September, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Deep survival analysis with longitudinal X-rays for COVID-19
Authors:
Michelle Shu,
Richard Strong Bowen,
Charles Herrmann,
Gengmo Qi,
Michele Santacatterina,
Ramin Zabih
Abstract:
Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds. However, classical techniques like the Cox model cannot directly incorporate images due to their high dimensionality. We propose a deep learning approach that naturally incorporates multiple, time-dependent imaging studies as well as non-imaging data into time-to-event analysis. Our techniqu…
▽ More
Time-to-event analysis is an important statistical tool for allocating clinical resources such as ICU beds. However, classical techniques like the Cox model cannot directly incorporate images due to their high dimensionality. We propose a deep learning approach that naturally incorporates multiple, time-dependent imaging studies as well as non-imaging data into time-to-event analysis. Our techniques are benchmarked on a clinical dataset of 1,894 COVID-19 patients, and show that image sequences significantly improve predictions. For example, classical time-to-event methods produce a concordance error of around 30-40% for predicting hospital admission, while our error is 25% without images and 20% with multiple X-rays included. Ablation studies suggest that our models are not learning spurious features such as scanner artifacts. While our focus and evaluation is on COVID-19, the methods we develop are broadly applicable.
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
AutoFlow: Learning a Better Training Set for Optical Flow
Authors:
Deqing Sun,
Daniel Vlasic,
Charles Herrmann,
Varun Jampani,
Michael Krainin,
Huiwen Chang,
Ramin Zabih,
William T. Freeman,
Ce Liu
Abstract:
Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to render training data for optical flow that optimizes the performance of a model on a target dataset. AutoFlow takes a layered approach to render synthetic data,…
▽ More
Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to render training data for optical flow that optimizes the performance of a model on a target dataset. AutoFlow takes a layered approach to render synthetic data, where the motion, shape, and appearance of each layer are controlled by learnable hyperparameters. Experimental results show that AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT. Our code and data are available at https://autoflow-google.github.io .
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Scaling Beyond Bandwidth Limitations: Wireless Control With Stability Guarantees Under Overload
Authors:
Fabian Mager,
Dominik Baumann,
Carsten Herrmann,
Sebastian Trimpe,
Marco Zimmerling
Abstract:
An important class of cyber-physical systems relies on multiple agents that jointly perform a task by coordinating their actions over a wireless network. Examples include self-driving cars in intelligent transportation and production robots in smart manufacturing. However, the scalability of existing control-over-wireless solutions is limited as they cannot resolve overload situations in which the…
▽ More
An important class of cyber-physical systems relies on multiple agents that jointly perform a task by coordinating their actions over a wireless network. Examples include self-driving cars in intelligent transportation and production robots in smart manufacturing. However, the scalability of existing control-over-wireless solutions is limited as they cannot resolve overload situations in which the communication demand exceeds the available bandwidth. This paper presents a novel co-design of distributed control and wireless communication that overcomes this limitation by dynamically allocating the available bandwidth to agents with the greatest need to communicate. Experiments on a real cyber-physical testbed with 20 agents, each consisting of a low-power wireless embedded device and a cart-pole system, demonstrate that our solution achieves significantly better control performance under overload than the state of the art. We further prove that our co-design guarantees closed-loop stability for physical systems with stochastic linear time-invariant dynamics.
△ Less
Submitted 25 January, 2022; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Object-centered image stitching
Authors:
Charles Herrmann,
Chen Wang,
Richard Strong Bowen,
Emil Keyder,
Ramin Zabih
Abstract:
Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams. As described in [1], the seam finding phase attempts to place seams between pixels where the transition between source im…
▽ More
Image stitching is typically decomposed into three phases: registration, which aligns the source images with a common target image; seam finding, which determines for each target pixel the source image it should come from; and blending, which smooths transitions over the seams. As described in [1], the seam finding phase attempts to place seams between pixels where the transition between source images is not noticeable. Here, we observe that the most problematic failures of this approach occur when objects are cropped, omitted, or duplicated. We therefore take an object-centered approach to the problem, leveraging recent advances in object detection [2,3,4]. We penalize candidate solutions with this class of error by modifying the energy function used in the seam finding stage. This produces substantially more realistic stitching results on challenging imagery. In addition, these methods can be used to determine when there is non-recoverable occlusion in the input data, and also suggest a simple evaluation metric that can be used to evaluate the output of stitching algorithms.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Robust image stitching with multiple registrations
Authors:
Charles Herrmann,
Chen Wang,
Richard Strong Bowen,
Emil Keyder,
Michael Krainin,
Ce Liu,
Ramin Zabih
Abstract:
Panorama creation is one of the most widely deployed techniques in computer vision. In addition to industry applications such as Google Street View, it is also used by millions of consumers in smartphones and other cameras. Traditionally, the problem is decomposed into three phases: registration, which picks a single transformation of each source image to align it to the other inputs, seam finding…
▽ More
Panorama creation is one of the most widely deployed techniques in computer vision. In addition to industry applications such as Google Street View, it is also used by millions of consumers in smartphones and other cameras. Traditionally, the problem is decomposed into three phases: registration, which picks a single transformation of each source image to align it to the other inputs, seam finding, which selects a source image for each pixel in the final result, and blending, which fixes minor visual artifacts. Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion. We propose instead the use of multiple registrations, permitting regions of the image at different depths to be captured with greater accuracy. MRF inference techniques naturally extend to seam finding over multiple registrations, and we show here that their energy functions can be readily modified with new terms that discourage duplication and tearing, common problems that are exacerbated by the use of multiple registrations. Our techniques are closely related to layer-based stereo, and move image stitching closer to explicit scene modeling. Experimental evidence demonstrates that our techniques often generate significantly better panoramas when there is substantial motion or parallax.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Learning to Autofocus
Authors:
Charles Herrmann,
Richard Strong Bowen,
Neal Wadhwa,
Rahul Garg,
Qiurui He,
Jonathan T. Barron,
Ramin Zabih
Abstract:
Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance. We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning. Our dataset is labeled with per-pixel depths obtained from multi-view stereo, following "Learning single camera depth estimation using dual-pixels". Using this data…
▽ More
Autofocus is an important task for digital cameras, yet current approaches often exhibit poor performance. We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning. Our dataset is labeled with per-pixel depths obtained from multi-view stereo, following "Learning single camera depth estimation using dual-pixels". Using this dataset, we apply modern deep classification models and an ordinal regression loss to obtain an efficient learning-based autofocus technique. We demonstrate that our approach provides a significant improvement compared with previous learned and non-learned methods: our model reduces the mean absolute error by a factor of 3.6 over the best comparable baseline algorithm. Our dataset and code are publicly available.
△ Less
Submitted 2 May, 2020; v1 submitted 25 April, 2020;
originally announced April 2020.
-
Channel selection using Gumbel Softmax
Authors:
Charles Herrmann,
Richard Strong Bowen,
Ramin Zabih
Abstract:
Important applications such as mobile computing require reducing the computational costs of neural network inference. Ideally, applications would specify their preferred tradeoff between accuracy and speed, and the network would optimize this end-to-end, using classification error to remove parts of the network. Increasing speed can be done either during training - e.g., pruning filters - or durin…
▽ More
Important applications such as mobile computing require reducing the computational costs of neural network inference. Ideally, applications would specify their preferred tradeoff between accuracy and speed, and the network would optimize this end-to-end, using classification error to remove parts of the network. Increasing speed can be done either during training - e.g., pruning filters - or during inference - e.g., conditionally executing a subset of the layers. We propose a single end-to-end framework that can improve inference efficiency in both settings. We use a combination of batch activation loss and classification loss, and Gumbel reparameterization to learn network structure. We train end-to-end, and the same technique supports pruning as well as conditional computation. We obtain promising experimental results for ImageNet classification with ResNet (45-52% less computation).
△ Less
Submitted 23 November, 2020; v1 submitted 10 December, 2018;
originally announced December 2018.
-
Open Set Logo Detection and Retrieval
Authors:
Andras Tüzkö,
Christian Herrmann,
Daniel Manger,
Jürgen Beyerer
Abstract:
Current logo retrieval research focuses on closed set scenarios. We argue that the logo domain is too large for this strategy and requires an open set approach. To foster research in this direction, a large-scale logo dataset, called Logos in the Wild, is collected and released to the public. A typical open set logo retrieval application is, for example, assessing the effectiveness of advertisemen…
▽ More
Current logo retrieval research focuses on closed set scenarios. We argue that the logo domain is too large for this strategy and requires an open set approach. To foster research in this direction, a large-scale logo dataset, called Logos in the Wild, is collected and released to the public. A typical open set logo retrieval application is, for example, assessing the effectiveness of advertisement in sports event broadcasts. Given a query sample in shape of a logo image, the task is to find all further occurrences of this logo in a set of images or videos. Currently, common logo retrieval approaches are unsuitable for this task because of their closed world assumption. Thus, an open set logo retrieval method is proposed in this work which allows searching for previously unseen logos by a single query sample. A two stage concept with separate logo detection and comparison is proposed where both modules are based on task specific CNNs. If trained with the Logos in the Wild data, significant performance improvements are observed, especially compared with state-of-the-art closed set approaches.
△ Less
Submitted 30 October, 2017;
originally announced October 2017.
-
A discriminative view of MRF pre-processing algorithms
Authors:
Chen Wang,
Charles Herrmann,
Ramin Zabih
Abstract:
While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem. MRF inference can be accelerated by pre-processing techniques like Dead End Elimination (DEE) or QPBO-based approaches which compute the optimal labeling of a subset of variables. These techniques are guaranteed to never wrongly label a variable but they often leave a large num…
▽ More
While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem. MRF inference can be accelerated by pre-processing techniques like Dead End Elimination (DEE) or QPBO-based approaches which compute the optimal labeling of a subset of variables. These techniques are guaranteed to never wrongly label a variable but they often leave a large number of variables unlabeled. We address this shortcoming by interpreting pre-processing as a classification problem, which allows us to trade off false positives (i.e., giving a variable an incorrect label) versus false negatives (i.e., failing to label a variable). We describe an efficient discriminative rule that finds optimal solutions for a subset of variables. Our technique provides both per-instance and worst-case guarantees concerning the quality of the solution. Empirical studies were conducted over several benchmark datasets. We obtain a speedup factor of 2 to 12 over expansion moves without preprocessing, and on difficult non-submodular energy functions produce slightly lower energy.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Unconstrained Face Detection and Open-Set Face Recognition Challenge
Authors:
Manuel Günther,
Peiyun Hu,
Christian Herrmann,
Chi Ho Chan,
Min Jiang,
Shufan Yang,
Akshay Raj Dhamija,
Deva Ramanan,
Jürgen Beyerer,
Josef Kittler,
Mohamad Al Jazaery,
Mohammad Iqbal Nouyed,
Guodong Guo,
Cezary Stankiewicz,
Terrance E. Boult
Abstract:
Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses…
▽ More
Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses, weather conditions and image blur. Although face verification or closed-set face identification have surpassed human capabilities on some datasets, open-set identification is much more complex as it needs to reject both unknown identities and false accepts from the face detector. We show that unconstrained face detection can approach high detection rates albeit with moderate false accept rates. By contrast, open-set face recognition is currently weak and requires much more attention.
△ Less
Submitted 25 September, 2018; v1 submitted 7 August, 2017;
originally announced August 2017.
-
An Algebraic View on the Semantics of model Composition
Authors:
Christoph Herrmann,
Holger Krahn,
Bernhard Rumpe,
Martin Schindler,
Steven Völkel
Abstract:
Due to the increased complexity of software development projects more and more systems are described by models. The sheer size makes it impractical to describe these systems by a single model. Instead many models are developed that provide several complementary views on the system to be developed. This however leads to a need for compositional models. This paper describes a foundational theory of…
▽ More
Due to the increased complexity of software development projects more and more systems are described by models. The sheer size makes it impractical to describe these systems by a single model. Instead many models are developed that provide several complementary views on the system to be developed. This however leads to a need for compositional models. This paper describes a foundational theory of model composition in form of an algebra to explicitly clarify different variants and uses of composition, their interplay with the semantics of the involved models and their composition operators.
△ Less
Submitted 22 September, 2014;
originally announced September 2014.
-
Orchestration of Global Software Engineering Projects
Authors:
Christian Bartelt,
Manfred Broy,
Christoph Herrmann,
Eric Knauss,
Marco Kuhrmann,
Andreas Rausch,
Bernhard Rumpe,
Kurt Schneider
Abstract:
Global software engineering has become a fact in many companies due to real necessity in practice. In contrast to co-located projects global projects face a number of additional software engineering challenges. Among them quality management has become much more difficult and schedule and budget overruns can be observed more often. Compared to co-located projects global software engineering is even…
▽ More
Global software engineering has become a fact in many companies due to real necessity in practice. In contrast to co-located projects global projects face a number of additional software engineering challenges. Among them quality management has become much more difficult and schedule and budget overruns can be observed more often. Compared to co-located projects global software engineering is even more challenging due to the need for integration of different cultures, different languages, and different time zones - across companies, and across countries. The diversity of development locations on several levels seriously endangers an effective and goal-oriented progress of projects. In this position paper we discuss reasons for global development, sketch settings for distribution and views of orchestration of dislocated companies in a global project that can be seen as a "virtual project environment". We also present a collection of questions, which we consider relevant for global software engineering. The questions motivate further discussion to derive a research agenda in global software engineering.
△ Less
Submitted 22 September, 2014;
originally announced September 2014.
-
Scaling-Up Model-Based-Development for Large Heterogeneous Systems with Compositional Modeling
Authors:
Christoph Herrmann,
Holger Krahn,
Bernhard Rumpe,
Martin Schindler,
Steven Völkel
Abstract:
Model-based development and in particular MDA [1], [2] have promised to be especially suited for the development of complex, heterogeneous, and large software systems. However, so far MDA has failed to fulfill this promise to a larger extent because of tool support being inadequate and clumsy and methodologies not being appropriate for an effective development. This article discusses what went wro…
▽ More
Model-based development and in particular MDA [1], [2] have promised to be especially suited for the development of complex, heterogeneous, and large software systems. However, so far MDA has failed to fulfill this promise to a larger extent because of tool support being inadequate and clumsy and methodologies not being appropriate for an effective development. This article discusses what went wrong in current MDA approaches and what needs to be done to make MDA suited for ultra-large, distributed systems.
△ Less
Submitted 22 September, 2014;
originally announced September 2014.
-
SSELab: A Plug-In-Based Framework for Web-Based Project Portals
Authors:
Christoph Herrmann,
Thomas Kurpick,
Bernhard Rumpe
Abstract:
Tools are an essential part of every software engineering project. But the number of tools that are used in all phases of the software development life-cycle and their complexity is growing continually. Consequently, the setup and maintenance of current tool chains and development environments requires much effort and consumes a lot of time. One approach to counter this, is to employ web-based sys…
▽ More
Tools are an essential part of every software engineering project. But the number of tools that are used in all phases of the software development life-cycle and their complexity is growing continually. Consequently, the setup and maintenance of current tool chains and development environments requires much effort and consumes a lot of time. One approach to counter this, is to employ web-based systems for development tasks, because centralized systems simplify the administration and the deployment of new features. But desktop IDEs play an important role in software development projects today, and will not be replaced entirely by web-based environments in the near future. Therefore, supporting a mixture of hosted tools and tools integrated into desktop IDEs is a sensible approach. In this paper, we present the SSELab, a framework for web- based project portals that attempts to migrate more software development tools from desktop to server environments, but still allows their integration into modern desktop IDEs. It supports the deployment of tools as hosted services using plug-in systems on the server-side. Additionally, it provides access to these tools by a set of clients that can be used in different contexts, either from the command line, from within IDEs such as Eclipse, or from web pages. In the paper, we discuss the architecture and the extensibility of the SSELab framework. Furthermore, we share our experiences with creating an instance of the framework and integrating various tools for our own software development projects.
△ Less
Submitted 1 September, 2014;
originally announced September 2014.
-
Supporting acceptance testing in distributed software projects with integrated feedback systems: Experiences and requirements
Authors:
Olga Liskin,
Christoph Herrmann,
Eric Knauss,
Thomas Kurpick,
Bernhard Rumpe,
Kurt Schneider
Abstract:
During acceptance testing customers assess whether a system meets their expectations and often identify issues that should be improved. These findings have to be communicated to the developers a task we observed to be error prone, especially in distributed teams. Here, it is normally not possible to have developer representatives from every site attend the test. Developers who were not present mig…
▽ More
During acceptance testing customers assess whether a system meets their expectations and often identify issues that should be improved. These findings have to be communicated to the developers a task we observed to be error prone, especially in distributed teams. Here, it is normally not possible to have developer representatives from every site attend the test. Developers who were not present might misunderstand insufficiently documented findings. This hinders fixing the issues and endangers customer satisfaction. Integrated feedback systems promise to mitigate this problem. They allow to easily capture findings and their context. Correctly applied, this technique could improve feedback, while reducing customer effort. This paper collects our experiences from comparing acceptance testing with and without feedback systems in a distributed project. Our results indicate that this technique can improve acceptance testing if certain requirements are met. We identify key requirements feedback systems should meet to support acceptance testing.
△ Less
Submitted 1 September, 2014;
originally announced September 2014.
-
Satisfiability of cross product terms is complete for real nondeterministic polytime Blum-Shub-Smale machines
Authors:
Christian Herrmann,
Johanna Sokoli,
Martin Ziegler
Abstract:
Nondeterministic polynomial-time Blum-Shub-Smale Machines over the reals give rise to a discrete complexity class between NP and PSPACE. Several problems, mostly from real algebraic geometry / polynomial systems, have been shown complete (under many-one reduction by polynomial-time Turing machines) for this class. We exhibit a new one based on questions about expressions built from cross products…
▽ More
Nondeterministic polynomial-time Blum-Shub-Smale Machines over the reals give rise to a discrete complexity class between NP and PSPACE. Several problems, mostly from real algebraic geometry / polynomial systems, have been shown complete (under many-one reduction by polynomial-time Turing machines) for this class. We exhibit a new one based on questions about expressions built from cross products only.
△ Less
Submitted 5 September, 2013;
originally announced September 2013.
-
Transmodal Analysis of Neural Signals
Authors:
Yaroslav O. Halchenko,
Michael Hanke,
James V. Haxby,
Stephen Jose Hanson,
Christoph S. Herrmann
Abstract:
Localizing neuronal activity in the brain, both in time and in space, is a central challenge to advance the understanding of brain function. Because of the inability of any single neuroimaging techniques to cover all aspects at once, there is a growing interest to combine signals from multiple modalities in order to benefit from the advantages of each acquisition method. Due to the complexity and…
▽ More
Localizing neuronal activity in the brain, both in time and in space, is a central challenge to advance the understanding of brain function. Because of the inability of any single neuroimaging techniques to cover all aspects at once, there is a growing interest to combine signals from multiple modalities in order to benefit from the advantages of each acquisition method. Due to the complexity and unknown parameterization of any suggested complete model of BOLD response in functional magnetic resonance imaging (fMRI), the development of a reliable ultimate fusion approach remains difficult. But besides the primary goal of superior temporal and spatial resolution, conjoint analysis of data from multiple imaging modalities can alternatively be used to segregate neural information from physiological and acquisition noise. In this paper we suggest a novel methodology which relies on constructing a quantifiable mapping of data from one modality (electroencephalography; EEG) into another (fMRI), called transmodal analysis of neural signals (TRANSfusion). TRANSfusion attempts to map neural data embedded within the EEG signal into its reflection in fMRI data. Assessing the mapping performance on unseen data allows to localize brain areas where a significant portion of the signal could be reliably reconstructed, hence the areas neural activity of which is reflected in both EEG and fMRI data. Consecutive analysis of the learnt model allows to localize areas associated with specific frequency bands of EEG, or areas functionally related (connected or coherent) to any given EEG sensor. We demonstrate the performance of TRANSfusion on artificial and real data from an auditory experiment. We further speculate on possible alternative uses: cross-modal data filtering and EEG-driven interpolation of fMRI signals to obtain arbitrarily high temporal sampling of BOLD.
△ Less
Submitted 8 July, 2013;
originally announced July 2013.
-
Computational Complexity of Quantum Satisfiability
Authors:
Christian Herrmann,
Martin Ziegler
Abstract:
Quantum logic was introduced in 1936 by Garrett Birkhoff and John von Neumann as a framework for capturing the logical peculiarities of quantum observables. It generalizes, and on 1-dimensional Hilbert space coincides with, Boolean propositional logic.
We introduce the weak and strong satisfiability problem for quantum logic terms. It turns out that in dimension two both are also NP-complete.…
▽ More
Quantum logic was introduced in 1936 by Garrett Birkhoff and John von Neumann as a framework for capturing the logical peculiarities of quantum observables. It generalizes, and on 1-dimensional Hilbert space coincides with, Boolean propositional logic.
We introduce the weak and strong satisfiability problem for quantum logic terms. It turns out that in dimension two both are also NP-complete.
For higher-dimensional spaces R^d and C^d with d>2 fixed, on the other hand, we show both problems to be complete for the nondeterministic Blum-Shub-Smale model of real computation. This provides a unified view on both Turing and real BSS complexity theory; and extends the still relatively scarce family of NP_R-complete problems with one perhaps closest in spirit to the classical Cook-Levin Theorem.
Our investigations on the dimensions a term is weakly/strongly satisfiable in lead to satisfiability problems in indefinite finite and finally in infinite dimension. Here, strong satisfiability turns out as polynomial-time equivalent to the feasibility of noncommutative integer polynomial equations
△ Less
Submitted 12 November, 2012; v1 submitted 10 April, 2010;
originally announced April 2010.