Search | arXiv e-print repository

This&That: Language-Gesture Controlled Video Generation for Robot Planning

Authors: Boyang Wang, Nikhil Sridhar, Chao Feng, Mark Van der Merwe, Adam Fishman, Nima Fazeli, Jeong Joon Park

Abstract: We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communicat… ▽ More We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intents, and 3) translating visual planning into robot actions. We propose language-gesture conditioning to generate videos, which is both simpler and clearer than existing language-only methods, especially in complex and uncertain environments. We then suggest a behavioral cloning design that seamlessly incorporates the video plans. This&That demonstrates state-of-the-art effectiveness in addressing the above three challenges, and justifies the use of video generation as an intermediate representation for generalizable task planning and execution. Project website: https://cfeng16.github.io/this-and-that/. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.17763 [pdf, other]

DiffusionPDE: Generative PDE-Solving Under Partial Observation

Authors: Jiahe Huang, Guandao Yang, Zichen Wang, Jeong Joon Park

Abstract: We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which… ▽ More We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Project page: https://jhhuangchloe.github.io/Diffusion-PDE/

arXiv:2403.17920 [pdf, other]

TC4D: Trajectory-Conditioned Text-to-4D Generation

Authors: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

Abstract: Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The la… ▽ More Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The lack of a more flexible motion model contributes to the gap in realism between 4D generation methods and recent, near-photorealistic video generation models. Here, we propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We represent the global motion of a scene's bounding box using rigid transformation along a trajectory parameterized by a spline. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion, which we evaluate qualitatively and through a user study. Video results can be viewed on our website: https://sherwinbahmani.github.io/tc4d. △ Less

Submitted 10 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: Project Page: https://sherwinbahmani.github.io/tc4d

arXiv:2403.03221 [pdf, other]

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Authors: Chris Rockwell, Nilesh Kulkarni, Linyi Jin, Jeong Joon Park, Justin Johnson, David F. Fouhey

Abstract: Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how t… ▽ More Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how to combine the best of both methods; our approach yields results that are both precise and robust, while also accurately inferring translation scales. At the heart of our model lies a Transformer that (1) learns to balance between solved and learned pose estimations, and (2) provides a prior to guide a solver. A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators, showing state-of-the-art performance in 6DoF pose estimation on Matterport3D, InteriorNet, StreetLearn, and Map-free Relocalization. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024. Project Page: https://crockwell.github.io/far/

arXiv:2311.17984 [pdf, other]

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce scenes with realistic appearance and 3D structure -- but no motion. Text-to-video models are trained on relatively smaller video datasets and can produce scenes with motion, but poorer appearance and 3D structure. While these models have complementary strengths, they also have opposing weaknesses, making it difficult to combine them in a way that alleviates this three-way tradeoff. Here, we introduce hybrid score distillation sampling, an alternating optimization procedure that blends supervision signals from multiple pre-trained diffusion models and incorporates benefits of each for high-fidelity text-to-4D generation. Using hybrid SDS, we demonstrate synthesis of 4D scenes with compelling appearance, 3D structure, and motion. △ Less

Submitted 26 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: CVPR 2024; Project page: https://sherwinbahmani.github.io/4dfy

arXiv:2306.10463 [pdf, other]

Lighthouses and Global Graph Stabilization: Active SLAM for Low-compute, Narrow-FoV Robots

Authors: Mohit Deshpande, Richard Kim, Dhruva Kumar, Jong Jin Park, Jim Zamiska

Abstract: Autonomous exploration to build a map of an unknown environment is a fundamental robotics problem. However, the quality of the map directly influences the quality of subsequent robot operation. Instability in a simultaneous localization and mapping (SLAM) system can lead to poorquality maps and subsequent navigation failures during or after exploration. This becomes particularly noticeable in cons… ▽ More Autonomous exploration to build a map of an unknown environment is a fundamental robotics problem. However, the quality of the map directly influences the quality of subsequent robot operation. Instability in a simultaneous localization and mapping (SLAM) system can lead to poorquality maps and subsequent navigation failures during or after exploration. This becomes particularly noticeable in consumer robotics, where compute budget and limited field-of-view are very common. In this work, we propose (i) the concept of lighthouses: panoramic views with high visual information content that can be used to maintain the stability of the map locally in their neighborhoods and (ii) the final stabilization strategy for global pose graph stabilization. We call our novel exploration strategy SLAM-aware exploration (SAE) and evaluate its performance on real-world home environments. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: 7 pages, 7 figures

Journal ref: International Conference on Robotics and Automation (ICRA) 2023

arXiv:2304.02602 [pdf, other]

Generative Novel View Synthesis with 3D-Aware Diffusion Models

Authors: Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein

Abstract: We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorp… ▽ More We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. This latent feature field captures the distribution over possible scene representations and improves our method's ability to generate view-consistent novel renderings. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences. We demonstrate state-of-the-art results on synthetic renderings and room-scale scenes; we also show compelling results for challenging, real-world objects. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Project page: https://nvlabs.github.io/genvs

arXiv:2303.12074 [pdf, other]

CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea Tagliasacchi

Abstract: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D l… ▽ More In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works. △ Less

Submitted 8 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: ICCV 2023; Webpage: https://sherwinbahmani.github.io/cc3d/

arXiv:2303.12050 [pdf, other]

CurveCloudNet: Processing Point Clouds with 1D Structure

Authors: Colton Stearns, Davis Rempe, Jiateng Liu, Alex Fu, Sebastien Mascha, Jeong Joon Park, Despoina Paschalidou, Leonidas J. Guibas

Abstract: Modern depth sensors such as LiDAR operate by sweeping laser-beams across the scene, resulting in a point cloud with notable 1D curve-like structures. In this work, we introduce a new point cloud processing scheme and backbone, called CurveCloudNet, which takes advantage of the curve-like structure inherent to these sensors. While existing backbones discard the rich 1D traversal patterns and rely… ▽ More Modern depth sensors such as LiDAR operate by sweeping laser-beams across the scene, resulting in a point cloud with notable 1D curve-like structures. In this work, we introduce a new point cloud processing scheme and backbone, called CurveCloudNet, which takes advantage of the curve-like structure inherent to these sensors. While existing backbones discard the rich 1D traversal patterns and rely on generic 3D operations, CurveCloudNet parameterizes the point cloud as a collection of polylines (dubbed a "curve cloud"), establishing a local surface-aware ordering on the points. By reasoning along curves, CurveCloudNet captures lightweight curve-aware priors to efficiently and accurately reason in several diverse 3D environments. We evaluate CurveCloudNet on multiple synthetic and real datasets that exhibit distinct 3D size and structure. We demonstrate that CurveCloudNet outperforms both point-based and sparse-voxel backbones in various segmentation settings, notably scaling to large scenes better than point-based alternatives while exhibiting improved single-object performance over sparse-voxel alternatives. In all, CurveCloudNet is an efficient and accurate backbone that can handle a larger variety of 3D environments than past works. △ Less

Submitted 1 February, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.10133 [pdf, other]

DS-MPEPC: Safe and Deadlock-Avoiding Robot Navigation in Cluttered Dynamic Scenes

Authors: Senthil Hariharan Arul, Jong Jin Park, Dinesh Manocha

Abstract: We present an algorithm for safe robot navigation in complex dynamic environments using a variant of model predictive equilibrium point control. We use an optimization formulation to navigate robots gracefully in dynamic environments by optimizing over a trajectory cost function at each timestep. We present a novel trajectory cost formulation that significantly reduces the conservative and deadloc… ▽ More We present an algorithm for safe robot navigation in complex dynamic environments using a variant of model predictive equilibrium point control. We use an optimization formulation to navigate robots gracefully in dynamic environments by optimizing over a trajectory cost function at each timestep. We present a novel trajectory cost formulation that significantly reduces the conservative and deadlock behaviors and generates smooth trajectories. In particular, we propose a new collision probability function that effectively captures the risk associated with a given configuration and the time to avoid collisions based on the velocity direction. Moreover, we propose a terminal state cost based on the expected time-to-goal and time-to-collision values that helps in avoiding trajectories that could result in deadlock. We evaluate our cost formulation in multiple simulated and real-world scenarios, including narrow corridors with dynamic obstacles, and observe significantly improved navigation behavior and reduced deadlocks as compared to prior methods. △ Less

Submitted 17 March, 2023; originally announced March 2023.

arXiv:2303.09554 [pdf, other]

PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision

Authors: Konstantinos Tertikas, Despoina Paschalidou, Boxiao Pan, Jeong Joon Park, Mikaela Angelina Uy, Ioannis Emiris, Yannis Avrithis, Leonidas Guibas

Abstract: Impressive progress in generative models and implicit representations gave rise to methods that can generate 3D shapes of high quality. However, being able to locally control and edit shapes is another essential property that can unlock several content creation applications. Local control can be achieved with part-aware models, but existing methods require 3D supervision and cannot produce texture… ▽ More Impressive progress in generative models and implicit representations gave rise to methods that can generate 3D shapes of high quality. However, being able to locally control and edit shapes is another essential property that can unlock several content creation applications. Local control can be achieved with part-aware models, but existing methods require 3D supervision and cannot produce textures. In this work, we devise PartNeRF, a novel part-aware generative model for editable 3D shape synthesis that does not require any explicit 3D supervision. Our model generates objects as a set of locally defined NeRFs, augmented with an affine transformation. This enables several editing operations such as applying transformations on parts, mixing parts from different objects etc. To ensure distinct, manipulable parts we enforce a hard assignment of rays to parts that makes sure that the color of each ray is only determined by a single NeRF. As a result, altering one part does not affect the appearance of the others. Evaluations on various ShapeNet categories demonstrate the ability of our model to generate editable 3D objects of improved fidelity, compared to previous part-based generative approaches that require 3D supervision or models relying on NeRFs. △ Less

Submitted 21 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: To appear in CVPR 2023, Project Page: https://ktertikas.github.io/part_nerf

arXiv:2301.09629 [pdf, other]

LEGO-Net: Learning Regular Rearrangements of Objects in Rooms

Authors: Qiuhong Anna Wei, Sijie Ding, Jeong Joon Park, Rahul Sajnani, Adrien Poulenard, Srinath Sridhar, Leonidas Guibas

Abstract: Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types of symmetry, co-linearity or co-circularity, spacing uniformity in linear or circular patterns, and further inter-object relationships that relate to style and functionality. Previous approaches for this tas… ▽ More Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types of symmetry, co-linearity or co-circularity, spacing uniformity in linear or circular patterns, and further inter-object relationships that relate to style and functionality. Previous approaches for this task relied on human input to explicitly specify goal state, or synthesized scenes from scratch -- but such methods do not address the rearrangement of existing messy scenes without providing a goal state. In this paper, we present LEGO-Net, a data-driven transformer-based iterative method for LEarning reGular rearrangement of Objects in messy rooms. LEGO-Net is partly inspired by diffusion models -- it starts with an initial messy state and iteratively ''de-noises'' the position and orientation of objects to a regular state while reducing distance traveled. Given randomly perturbed object positions and orientations in an existing dataset of professionally-arranged scenes, our method is trained to recover a regular re-arrangement. Results demonstrate that our method is able to reliably rearrange room scenes and outperform other methods. We additionally propose a metric for evaluating regularity in room arrangements using number-theoretic machinery. △ Less

Submitted 24 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: Project page: https://ivl.cs.brown.edu/projects/lego-net

arXiv:2212.04096 [pdf, other]

ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

Authors: Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, Achuta Kadambi

Abstract: This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid… ▽ More This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid latents) which could be a voxel grid or triplane grid. Each school of thought has tradeoffs. Grid latents are coarse and lose high-frequency detail. In contrast, point latents preserve detail. However, point latents are more difficult to decode into a surface, and quality and runtime suffer. In this paper, we propose ALTO to sequentially alternate between geometric representations, before converging to an easy-to-decode latent. We find that this preserves spatial expressiveness and makes decoding lightweight. We validate ALTO on implicit 3D recovery and observe not only a performance improvement over the state-of-the-art, but a runtime improvement of 3-10$\times$. Project website at https://visual.ee.ucla.edu/alto.htm/. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2211.17260 [pdf, other]

SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene

Authors: Minjung Son, Jeong Joon Park, Leonidas Guibas, Gordon Wetzstein

Abstract: Generative models have shown great promise in synthesizing photorealistic 3D objects, but they require large amounts of training data. We introduce SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene. Once trained, SinGRAF generates different realizations of this 3D scene that preserve the appearance of the input while varying scene layout. For this purpo… ▽ More Generative models have shown great promise in synthesizing photorealistic 3D objects, but they require large amounts of training data. We introduce SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene. Once trained, SinGRAF generates different realizations of this 3D scene that preserve the appearance of the input while varying scene layout. For this purpose, we build on recent progress in 3D GAN architectures and introduce a novel progressive-scale patch discrimination approach during training. With several experiments, we demonstrate that the results produced by SinGRAF outperform the closest related works in both quality and diversity by a large margin. △ Less

Submitted 2 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: CVPR 2023. Project page: https://www.computationalimaging.org/publications/singraf/

arXiv:2206.14797 [pdf, other]

3D-Aware Video Generation

Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte

Abstract: Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neu… ▽ More Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs. △ Less

Submitted 9 August, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: TMLR 2023; Project page: https://sherwinbahmani.github.io/3dvidgen

arXiv:2112.11427 [pdf, other]

StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation

Authors: Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, Ira Kemelmacher-Shlizerman

Abstract: We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation, while solving two main challenges in 3D-aware GANs: 1) high-resolution, view-consistent generation of the RGB images, and 2) detailed 3D shape. We achieve this by merging a SDF-b… ▽ More We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation, while solving two main challenges in 3D-aware GANs: 1) high-resolution, view-consistent generation of the RGB images, and 2) detailed 3D shape. We achieve this by merging a SDF-based 3D representation with a style-based 2D generator. Our 3D implicit network renders low-resolution feature maps, from which the style-based network generates view-consistent, 1024x1024 images. Notably, our SDF-based 3D modeling defines detailed 3D surfaces, leading to consistent volume rendering. Our method shows higher quality results compared to state of the art in terms of visual and geometric quality. △ Less

Submitted 29 March, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

Comments: Camera-Ready version. Paper was accepted as oral to CVPR 2022. Added discussions and figures from the rebuttal to the supplementary material (sections C & F). Project Webpage: https://stylesdf.github.io/

arXiv:2112.04645 [pdf, other]

BACON: Band-limited Coordinate Networks for Multiscale Scene Representation

Authors: David B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon Wetzstein

Abstract: Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these netw… ▽ More Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. BACON has constrained behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without per-scale supervision. We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality. △ Less

Submitted 28 March, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Project page: https://www.computationalimaging.org/publications/bacon/

arXiv:2106.13937 [pdf, ps, other]

Unified Simultaneous Wireless Information and Power Transfer for IoT: Signaling and Architecture with Deep Learning Adaptive Control

Authors: Jong Jin Park, Jong Ho Moon, Hyeon Ho Jang, Dong In Kim

Abstract: In this paper, we propose a unified SWIPT signal and its architecture design in order to take advantage of both single tone and multi-tone signaling by adjusting only the power allocation ratio of a unified signal. For this, we design a novel unified and integrated receiver architecture for the proposed unified SWIPT signaling, which consumes low power with an envelope detection. To relieve the co… ▽ More In this paper, we propose a unified SWIPT signal and its architecture design in order to take advantage of both single tone and multi-tone signaling by adjusting only the power allocation ratio of a unified signal. For this, we design a novel unified and integrated receiver architecture for the proposed unified SWIPT signaling, which consumes low power with an envelope detection. To relieve the computational complexity of the receiver, we propose an adaptive control algorithm by which the transmitter adjusts the communication mode through temporal convolutional network (TCN) based asymmetric processing. To this end, the transmitter optimizes the modulation index and power allocation ratio in short-term scale while updating the mode switching threshold in long-term scale. We demonstrate that the proposed unified SWIPT system improves the achievable rate under the self-powering condition of low-power IoT devices. Consequently it is foreseen to effectively deploy low-power IoT networks that concurrently supply both information and energy wirelessly to the devices by using the proposed unified SWIPT and adaptive control algorithm in place at the transmitter side. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: 15 pages, 15 figures

arXiv:2001.04642 [pdf, other]

Seeing the World in a Bag of Chips

Authors: Jeong Joon Park, Aleksander Holynski, Steve Seitz

Abstract: We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors. Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. In cases where scene surface has a strong mirror-like material comp… ▽ More We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors. Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. In cases where scene surface has a strong mirror-like material component, we generate highly detailed environment images, revealing room composition, objects, people, buildings, and trees visible through windows. Our approach yields state of the art view synthesis techniques, operates on low dynamic range imagery, and is robust to geometric and calibration errors. △ Less

Submitted 15 June, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: CVPR 2020

arXiv:1902.06035 [pdf, ps, other]

Heterogeneous Coexistence of Cognitive Radio Networks in TV White Space

Authors: Kaigui Bian, Lin Chen, Yuanxing Zhang, Jung-Min Jerr Park, Xiaojiang Du, Xiaoming Li

Abstract: Wireless standards (e.g., IEEE 802.11af and 802.22) have been developed for enabling opportunistic access in TV white space (TVWS) using cognitive radio (CR) technology. When heterogeneous CR networks that are based on different wireless standards operate in the same TVWS, coexistence issues can potentially cause major problems. Enabling collaborative coexistence via direct coordination between he… ▽ More Wireless standards (e.g., IEEE 802.11af and 802.22) have been developed for enabling opportunistic access in TV white space (TVWS) using cognitive radio (CR) technology. When heterogeneous CR networks that are based on different wireless standards operate in the same TVWS, coexistence issues can potentially cause major problems. Enabling collaborative coexistence via direct coordination between heterogeneous CR networks is very challenging, due to incompatible MAC/PHY designs of coexisting networks, requirement of an over-the-air common control channel for inter-network communications, and time synchronization across devices from different networks. Moreover, such a coexistence scheme would require competing networks or service providers to exchange sensitive control information that may raise conflict of interest issues and customer privacy concerns. In this paper, we present an architecture for enabling collaborative coexistence of heterogeneous CR networks over TVWS, called Symbiotic Heterogeneous coexistence ARchitecturE (SHARE). Define "indirect coordination" first before using it. Because coexistence cannot avoid coordination By mimicking the symbiotic relationships between heterogeneous organisms in a stable ecosystem, SHARE establishes an indirect coordination mechanism between heterogeneous CR networks via a mediator system, which avoids the drawbacks of direct coordination. SHARE includes two spectrum sharing algorithms whose designs were inspired by well-known models and theories from theoretical ecology, viz, the interspecific competition model and the ideal free distribution model. △ Less

Submitted 15 February, 2019; originally announced February 2019.

arXiv:1901.05103 [pdf, other]

DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

Authors: Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove

Abstract: Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape re… ▽ More Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. DeepSDF, like its classical counterpart, represents a shape's surface by a continuous volumetric field: the magnitude of a point in the field represents the distance to the surface boundary and the sign indicates whether the region is inside (-) or outside (+) of the shape, hence our representation implicitly encodes a shape's boundary as the zero-level-set of the learned function while explicitly representing the classification of space as being part of the shapes interior or not. While classical SDF's both in analytical or discretized voxel form typically represent the surface of a single shape, DeepSDF can represent an entire class of shapes. Furthermore, we show state-of-the-art performance for learned 3D shape representation and completion while reducing the model size by an order of magnitude compared with previous work. △ Less

Submitted 15 January, 2019; originally announced January 2019.

arXiv:1809.02057 [pdf]

Surface Light Field Fusion

Authors: Jeong Joon Park, Richard Newcombe, Steve Seitz

Abstract: We present an approach for interactively scanning highly reflective objects with a commodity RGBD sensor. In addition to shape, our approach models the surface light field, encoding scene appearance from all directions. By factoring the surface light field into view-independent and wavelength-independent components, we arrive at a representation that can be robustly estimated with IR-equipped comm… ▽ More We present an approach for interactively scanning highly reflective objects with a commodity RGBD sensor. In addition to shape, our approach models the surface light field, encoding scene appearance from all directions. By factoring the surface light field into view-independent and wavelength-independent components, we arrive at a representation that can be robustly estimated with IR-equipped commodity depth sensors, and achieves high quality results. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: Project Website: http://grail.cs.washington.edu/projects/slfusion/

Journal ref: 3DV 2018

arXiv:1712.00010 [pdf, ps, other]

Highrisk Prediction from Electronic Medical Records via Deep Attention Networks

Authors: You Jin Kim, Yun-Geun Lee, Jeong Whun Kim, Jin Joo Park, Borim Ryu, Jung-Woo Ha

Abstract: Predicting highrisk vascular diseases is a significant issue in the medical domain. Most predicting methods predict the prognosis of patients from pathological and radiological measurements, which are expensive and require much time to be analyzed. Here we propose deep attention models that predict the onset of the high risky vascular disease from symbolic medical histories sequence of hypertensio… ▽ More Predicting highrisk vascular diseases is a significant issue in the medical domain. Most predicting methods predict the prognosis of patients from pathological and radiological measurements, which are expensive and require much time to be analyzed. Here we propose deep attention models that predict the onset of the high risky vascular disease from symbolic medical histories sequence of hypertension patients such as ICD-10 and pharmacy codes only, Medical History-based Prediction using Attention Network (MeHPAN). We demonstrate two types of attention models based on 1) bidirectional gated recurrent unit (R-MeHPAN) and 2) 1D convolutional multilayer model (C-MeHPAN). Two MeHPAN models are evaluated on approximately 50,000 hypertension patients with respect to precision, recall, f1-measure and area under the curve (AUC). Experimental results show that our MeHPAN methods outperform standard classification models. Comparing two MeHPANs, R-MeHPAN provides more better discriminative capability with respect to all metrics while C-MeHPAN presents much shorter training time with competitive accuracy. △ Less

Submitted 30 November, 2017; originally announced December 2017.

Comments: Accepted poster at NIPS 2017 Workshop on Machine Learning for Health (https://ml4health.github.io/2017/)

arXiv:1510.06342 [pdf, other]

Prevalence and recoverability of syntactic parameters in sparse distributed memories

Authors: Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli

Abstract: We propose a new method, based on Sparse Distributed Memory (Kanerva Networks), for studying dependency relations between different syntactic parameters in the Principles and Parameters model of Syntax. We store data of syntactic parameters of world languages in a Kanerva Network and we check the recoverability of corrupted parameter data from the network. We find that different syntactic paramete… ▽ More We propose a new method, based on Sparse Distributed Memory (Kanerva Networks), for studying dependency relations between different syntactic parameters in the Principles and Parameters model of Syntax. We store data of syntactic parameters of world languages in a Kanerva Network and we check the recoverability of corrupted parameter data from the network. We find that different syntactic parameters have different degrees of recoverability. We identify two different effects: an overall underlying relation between the prevalence of parameters across languages and their degree of recoverability, and a finer effect that makes some parameters more easily recoverable beyond what their prevalence would indicate. We interpret a higher recoverability for a syntactic parameter as an indication of the existence of a dependency relation, through which the given parameter can be determined using the remaining uncorrupted data. △ Less

Submitted 21 October, 2015; originally announced October 2015.

Comments: 13 pages, LaTeX, 4 jpeg figures

MSC Class: 91F20

Showing 1–24 of 24 results for author: Park, J J