Search | arXiv e-print repository

arXiv:2409.05484 [pdf, other]

CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement

Authors: Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang

Abstract: Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE,… ▽ More Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE, a causal generative framework tailored for single-cell gene perturbation modeling, enhanced with counterfactual reasoning-based artifact disentanglement. Throughout training, CRADLE-VAE models the underlying latent distribution of technical artifacts and perturbation effects present in single-cell datasets. It employs counterfactual reasoning to effectively disentangle such artifacts by modulating the latent basal spaces and learns robust features for generating cellular response data with improved quality. Experimental results demonstrate that this approach improves not only treatment effect estimation performance but also generative quality as well. The CRADLE-VAE codebase is publicly available at https://github.com/dmis-lab/CRADLE-VAE. △ Less

Submitted 9 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

arXiv:2408.15199 [pdf, other]

Crossing Rays: Evaluation of Bimanual Mid-air Selection Techniques in an Immersive Environment

Authors: DongHoon Kim, Dongyun Han, Siyeon Bak, Isaac Cho

Abstract: Mid-air navigation offers a method of aerial travel that mitigates the constraints associated with continuous navigation. A mid-air selection technique is essential to enable such navigation. In this paper, we consider four variations of intersection-based bimanual mid-air selection techniques with visual aids and supporting features: Simple-Ray, Simple-Stripe, Precision-Stripe, and Cursor-Sync. W… ▽ More Mid-air navigation offers a method of aerial travel that mitigates the constraints associated with continuous navigation. A mid-air selection technique is essential to enable such navigation. In this paper, we consider four variations of intersection-based bimanual mid-air selection techniques with visual aids and supporting features: Simple-Ray, Simple-Stripe, Precision-Stripe, and Cursor-Sync. We evaluate their performance and user experience compared to an unimanual mid-air selection technique using two tasks that require selecting a mid-air position with or without a reference object. Our findings indicate that the bimanual techniques generally demonstrate faster selection times compared to the unimanual technique. With a supporting feature, the bimanual techniques can provide a more accurate selection than the unimanual technique. Based on our results, we discuss the effect of selection technique's visual aids and supporting features on performance and user experience for mid-air selection. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: This paper is accepted as a conference paper on ISMAR 2024

arXiv:2408.14034 [pdf, other]

Renyi reflected entropy and entanglement wedge cross section with cosmic branes in AdS/BCFT

Authors: Byoungjoon Ahn, Sang-Eon Bak, Keun-Young Kim, Mitsuhiro Nishida

Abstract: In this study, we calculate the $m-1$ correction to the reflected entropy for two adjacent intervals on a half-infinite line within the AdS$_3$/BCFT$_2$ framework, where $m$ is a Renyi index for a canonical purification. We utilize the doubling trick and compute the leading terms in the large central charge expansion of correlation functions in the holographic BCFT. In the corresponding AdS space… ▽ More In this study, we calculate the $m-1$ correction to the reflected entropy for two adjacent intervals on a half-infinite line within the AdS$_3$/BCFT$_2$ framework, where $m$ is a Renyi index for a canonical purification. We utilize the doubling trick and compute the leading terms in the large central charge expansion of correlation functions in the holographic BCFT. In the corresponding AdS space with an end of the world brane, we analyze the entanglement wedge cross section, the dual counterpart of reflected entropy. This AdS/BCFT setup allows us to explore a richer set of phases in the entanglement wedge cross section. The $m-1$ correction in the holographic BCFT manifests as modifications in the entanglement wedge cross section induced by cosmic branes. For the adjacent intervals anchored to the boundary of BCFT, we show the duality between the entanglement wedge cross section with the backreaction from a cosmic brane and Renyi reflected entropy at all orders in $m-1$. Furthermore, by analyzing the entanglement wedge cross section for general adjacent intervals, we provide guidance for an $ε$-expansion of five-point functions in the holographic CFT, where $ε$ is the rescaled conformal dimension by the central charge. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 34 pages, 10 figures

arXiv:2408.02295 [pdf, other]

Generalized Gaussian Temporal Difference Error For Uncertainty-aware Reinforcement Learning

Authors: Seyeon Kim, Joonhun Lee, Namhoon Cho, Sungjun Han, Seungeon Baek

Abstract: Conventional uncertainty-aware temporal difference (TD) learning methods often rely on simplistic assumptions, typically including a zero-mean Gaussian distribution for TD errors. Such oversimplification can lead to inaccurate error representations and compromised uncertainty estimation. In this paper, we introduce a novel framework for generalized Gaussian error modeling in deep reinforcement lea… ▽ More Conventional uncertainty-aware temporal difference (TD) learning methods often rely on simplistic assumptions, typically including a zero-mean Gaussian distribution for TD errors. Such oversimplification can lead to inaccurate error representations and compromised uncertainty estimation. In this paper, we introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning, applicable to both discrete and continuous control settings. Our framework enhances the flexibility of error distribution modeling by incorporating higher-order moments, particularly kurtosis, thereby improving the estimation and mitigation of data-dependent noise, i.e., aleatoric uncertainty. We examine the influence of the shape parameter of the generalized Gaussian distribution (GGD) on aleatoric uncertainty and provide a closed-form expression that demonstrates an inverse relationship between uncertainty and the shape parameter. Additionally, we propose a theoretically grounded weighting scheme to fully leverage the GGD. To address epistemic uncertainty, we enhance the batch inverse variance weighting by incorporating bias reduction and kurtosis considerations, resulting in improved robustness. Extensive experimental evaluations using policy gradient algorithms demonstrate the consistent efficacy of our method, showcasing significant performance improvements. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.01040 [pdf, other]

Privacy-Preserving Split Learning with Vision Transformers using Patch-Wise Random and Noisy CutMix

Authors: Seungeun Oh, Sihun Baek, Jihong Park, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim

Abstract: In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private… ▽ More In computer vision, the vision transformer (ViT) has increasingly superseded the convolutional neural network (CNN) for improved accuracy and robustness. However, ViT's large model sizes and high sample complexity make it difficult to train on resource-constrained edge devices. Split learning (SL) emerges as a viable solution, leveraging server-side resources to train ViTs while utilizing private data from distributed devices. However, SL requires additional information exchange for weight updates between the device and the server, which can be exposed to various attacks on private training data. To mitigate the risk of data breaches in classification tasks, inspired from the CutMix regularization, we propose a novel privacy-preserving SL framework that injects Gaussian noise into smashed data and mixes randomly chosen patches of smashed data across clients, coined DP-CutMixSL. Our analysis demonstrates that DP-CutMixSL is a differentially private (DP) mechanism that strengthens privacy protection against membership inference attacks during forward propagation. Through simulations, we show that DP-CutMixSL improves privacy protection against membership inference attacks, reconstruction attacks, and label inference attacks, while also improving accuracy compared to DP-SL and DP-MixSL. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 23 pages, 11 figures, 8 tables, to be published in Transactions on Machine Learning Research (TMLR)

arXiv:2407.09318 [pdf, other]

Quantum backreaction effect in optical soliton

Authors: Sang-Shin Baak, Friedrich Koenig

Abstract: Optical solitons classically are stationary solutions of the nonlinear Schrödinger equation. We perform a quantum field theoretic treatment by quantising a linearised fluctuation field around the classical soliton solution which can be seen as providing a background spacetime for the field. The linearised fluctuation modifies the soliton background, which is often neglected, reminiscent of the non… ▽ More Optical solitons classically are stationary solutions of the nonlinear Schrödinger equation. We perform a quantum field theoretic treatment by quantising a linearised fluctuation field around the classical soliton solution which can be seen as providing a background spacetime for the field. The linearised fluctuation modifies the soliton background, which is often neglected, reminiscent of the nondepleted-pump approximation. Going beyond this approximation and by using a number-conserving Bogoljubov approach, we find unstable modes that grow as the soliton propagates. Eventually, these unstable modes induce a considerable (backreaction) effect in the soliton. We calculate the backreaction in the classical field fully analytically in the leading second order. The result is a quadratic local decrease of the soliton photon number in propagation due to the backreaction effect of the unstable mode. Provided the initial pulse is close to the classical soliton solution, the unstable mode contributions always become dominant. We also consider practical scenarios for observing this quantum-induced soliton distortion, in the spectral domain. The backreaction, which we expect to be present in bright and dark, discrete and continuous solitons and other nonlinear pulses plays an important role for future optical analogue gravity experiments, for soliton lasers, and optical communications. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: (34 pages, 7 figures)

arXiv:2407.07133 [pdf]

Neuromimetic metaplasticity for adaptive continual learning

Authors: Suhee Cho, Hyeonsu Lee, Seungdae Baek, Se-Bum Paik

Abstract: Conventional intelligent systems based on deep neural network (DNN) models encounter challenges in achieving human-like continual learning due to catastrophic forgetting. Here, we propose a metaplasticity model inspired by human working memory, enabling DNNs to perform catastrophic forgetting-free continual learning without any pre- or post-processing. A key aspect of our approach involves impleme… ▽ More Conventional intelligent systems based on deep neural network (DNN) models encounter challenges in achieving human-like continual learning due to catastrophic forgetting. Here, we propose a metaplasticity model inspired by human working memory, enabling DNNs to perform catastrophic forgetting-free continual learning without any pre- or post-processing. A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility. This strategy allowed the network to successfully learn a continuous stream of information, even under unexpected changes in input length. The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications, dynamically allocating memory resources to retain both old and new information. Furthermore, the model demonstrated robustness against data poisoning attacks by selectively filtering out erroneous memories, leveraging the Hebb repetition effect to reinforce the retention of significant data. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 25 pages, 5 figures, 1 table, 4 supplementary figures

arXiv:2407.04039 [pdf, ps, other]

doi 10.2172/2372903

Flexible Stellarator Physics Facility

Authors: F. I. Parra, S. -G. Baek, M. Churchill, D. R. Demers, B. Dudson, N. M. Ferraro, B. Geiger, S. Gerhardt, K. C. Hammond, S. Hudson, R. Jorge, E. Kolemen, D. M. Kriete, S. T. A. Kumar, M. Landreman, C. Lowe, D. A. Maurer, F. Nespoli, N. Pablant, M. J. Pueschel, A. Punjabi, J. A. Schwartz, C. P. S. Swanson, A. M. Wright

Abstract: We propose to build a Flexible Stellarator Physics Facility to explore promising regions of the vast parameter space of disruption-free stellarator solutions for Fusion Pilot Plants (FPPs). We propose to build a Flexible Stellarator Physics Facility to explore promising regions of the vast parameter space of disruption-free stellarator solutions for Fusion Pilot Plants (FPPs). △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: White paper submitted to FESAC subcommittee on Facilities, 8 pages

arXiv:2406.19634 [pdf, other]

CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services

Authors: DongKi Noh, Hyungtae Lim, Gyuho Eoh, Duckyu Choi, Jeongsik Choi, Hyunjun Lim, SeungMin Baek, Hyun Myung

Abstract: In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However,… ▽ More In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However, we have encountered challenges in implementing recent innovative frameworks when handling service robots with low-end processors and insufficient sensor data, such as low-resolution 2D LiDAR sensors. Specifically, regarding commercial robots, consistent performance in different hardware configurations and environments is more crucial than the performance dedicated to specific sensors or environments. Therefore, we propose a) a multi-stage %hierarchical approach for global pose estimation in embedded systems; b) a graph generation method with zero constraints for synchronized sensors; and c) a robust and memory-efficient method for long-term pose-graph optimization. As verified in in-home and large-scale indoor environments, the proposed method yields consistent global pose estimation for services in commercial fields. Furthermore, the proposed method exhibits potential commercial viability considering the consistent performance verified via mass production and long-term (> 5 years) operation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Journal ref: IEEE Robotics and Automation Letters, 2024

arXiv:2406.18704 [pdf]

Superconducting phase diagram in Bi$_x$Ni$_{1-x}$ thin films$\colon$ the effects of Bi stoichiometry on superconductivity

Authors: Jihun Park, Jarryd A. Horn, Dylan J. Kirsch, Rohit K. Pant, Hyeok Yoon, Sungha Baek, Suchismita Sarker, Apurva Mehta, Xiaohang Zhang, Seunghun Lee, Richard Greene, Johnpierre Paglione, Ichiro Takeuchi

Abstract: The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unex… ▽ More The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unexplored. In this work, we systematically studied the effects of Bi stoichiometry on the superconductivity of Bi$_x$Ni$_{1-x}$ thin films (${x} \approx$ 0.5 to 0.9) fabricated via a composition-spread approach. The superconducting phase map of Bi$_x$Ni$_{1-x}$ thin films exhibited a superconducting composition region attributable to the intermetallic Bi$_3$Ni phase with different amount of excess Bi, revealed by synchrotron X-ray diffraction analysis. Interestingly, the mixed phase region with Bi$_3$Ni and Bi showed unusual increases in the superconducting transition temperature and residual resistance ratio as more Bi impurities were included, with the maximum ${T}_{c}$ ($=$ 4.2 K) observed at $x \approx$ 0.79. A correlation analysis of structural, electrical, and magneto-transport characteristics across the composition variation revealed that the unusual superconducting $"$dome$"$ is due to two competing roles of Bi$\colon$ impurity scattering and carrier doping. We found that the carrier doping effect is dominant in the mild doping regime (0.74 $\leq {x} \leq$ 0.79), while impurity scattering becomes more pronounced at larger Bi stoichiometry. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.14277 [pdf, other]

Augmenting Query and Passage for Retrieval-Augmented Generation using LLMs for Open-Domain Question Answering

Authors: Minsang Kim, Cheoneum Park, Seungjun Baek

Abstract: Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambig… ▽ More Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambiguous or complex. In this paper, we propose a simple yet efficient method called question and passage augmentation via LLMs for open-domain QA. Our method first decomposes the original questions into multiple-step sub-questions. By augmenting the original question with detailed sub-questions and planning, we are able to make the query more specific on what needs to be retrieved, improving the retrieval performance. In addition, to compensate for the case where the retrieved passages contain distracting information or divided opinions, we augment the retrieved passages with self-generated passages by LLMs to guide the answer extraction. Experimental results show that the proposed scheme outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14124 [pdf, other]

Measuring Sample Importance in Data Pruning for Training LLMs from a Data Compression Perspective

Authors: Minsang Kim, Seungjun Baek

Abstract: Compute-efficient training of large language models (LLMs) has become an important research problem. In this work, we consider data pruning as a method of data-efficient training of LLMs, where we take a data compression view on data pruning. We argue that the amount of information of a sample, or the achievable compression on its description length, represents its sample importance. The key idea… ▽ More Compute-efficient training of large language models (LLMs) has become an important research problem. In this work, we consider data pruning as a method of data-efficient training of LLMs, where we take a data compression view on data pruning. We argue that the amount of information of a sample, or the achievable compression on its description length, represents its sample importance. The key idea is that, less informative samples are likely to contain redundant information, and thus should be pruned first. We leverage log-likelihood function of trained models as a surrogate to measure information content of samples. Experiments reveal a surprising insight that information-based pruning can enhance the generalization capability of the model, improves upon language modeling and downstream tasks as compared to the model trained on the entire dataset. △ Less

Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.11056 [pdf, other]

Reachability Analysis for Linear Systems with Uncertain Parameters using Polynomial Zonotopes

Authors: Yushen Huang, Ertai Luo, Stanley Bak, Yifan Sun

Abstract: In real world applications, uncertain parameters are the rule rather than the exception. We present a reachability algorithm for linear systems with uncertain parameters and inputs using set propagation of polynomial zonotopes. In contrast to previous methods, our approach is able to tightly capture the non-convexity of the reachable set. Building up on our main result, we show how our reachabilit… ▽ More In real world applications, uncertain parameters are the rule rather than the exception. We present a reachability algorithm for linear systems with uncertain parameters and inputs using set propagation of polynomial zonotopes. In contrast to previous methods, our approach is able to tightly capture the non-convexity of the reachable set. Building up on our main result, we show how our reachability algorithm can be extended to handle linear time-varying systems as well as linear systems with time-varying parameters. Moreover, our approach opens up new possibilities for reachability analysis of linear time-invariant systems, nonlinear systems, and hybrid systems. We compare our approach to other state of the art methods, with superior tightness on two benchmarks including a 9-dimensional vehicle platooning system. Moreover, as part of the journal extension, we investigate through a polynomial zonotope with special structure named multi-affine zonotopes and its optimization problem. We provide the corresponding optimization algorithm and experiment over the examples obatined from two benchmark systems, showing the efficiency and scalability comparing to the state of the art method for handling such type of set representation. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.03461 [pdf, other]

Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts

Authors: Dominik Scheuble, Chenyang Lei, Seung-Hwan Baek, Mario Bijelic, Felix Heide

Abstract: Lidar has become a cornerstone sensing modality for 3D vision, especially for large outdoor scenarios and autonomous driving. Conventional lidar sensors are capable of providing centimeter-accurate distance information by emitting laser pulses into a scene and measuring the time-of-flight (ToF) of the reflection. However, the polarization of the received light that depends on the surface orientati… ▽ More Lidar has become a cornerstone sensing modality for 3D vision, especially for large outdoor scenarios and autonomous driving. Conventional lidar sensors are capable of providing centimeter-accurate distance information by emitting laser pulses into a scene and measuring the time-of-flight (ToF) of the reflection. However, the polarization of the received light that depends on the surface orientation and material properties is usually not considered. As such, the polarization modality has the potential to improve scene reconstruction beyond distance measurements. In this work, we introduce a novel long-range polarization wavefront lidar sensor (PolLidar) that modulates the polarization of the emitted and received light. Departing from conventional lidar sensors, PolLidar allows access to the raw time-resolved polarimetric wavefronts. We leverage polarimetric wavefronts to estimate normals, distance, and material properties in outdoor scenarios with a novel learned reconstruction method. To train and evaluate the method, we introduce a simulated and real-world long-range dataset with paired raw lidar data, ground truth distance, and normal maps. We find that the proposed method improves normal and distance reconstruction by 53\% mean angular error and 41\% mean absolute error compared to existing shape-from-polarization (SfP) and ToF methods. Code and data are open-sourced at https://light.princeton.edu/pollidar. △ Less

Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at CVPR 2024; Project Website: https://light.princeton.edu/publication/pollidar

arXiv:2406.00157 [pdf, other]

doi 10.1007/978-3-031-65112-0_5

Verification of Neural Network Control Systems in Continuous Time

Authors: Ali ArjomandBigdeli, Andrew Mata, Stanley Bak

Abstract: Neural network controllers are currently being proposed for use in many safety-critical tasks. Most analysis methods for neural network control systems assume a fixed control period. In control theory, higher frequency usually improves performance. However, for current analysis methods, increasing the frequency complicates verification. In the limit, when actuation is performed continuously, no ex… ▽ More Neural network controllers are currently being proposed for use in many safety-critical tasks. Most analysis methods for neural network control systems assume a fixed control period. In control theory, higher frequency usually improves performance. However, for current analysis methods, increasing the frequency complicates verification. In the limit, when actuation is performed continuously, no existing neural network control systems verification methods are able to analyze the system. In this work, we develop the first verification method for continuously-actuated neural network control systems. We accomplish this by adding a level of abstraction to model the neural network controller. The abstraction is a piecewise linear model with added noise to account for local linearization error. The soundness of the abstraction can be checked using open-loop neural network verification tools, although we demonstrate bottlenecks in existing tools when handling the required specifications. We demonstrate the approach's efficacy by applying it to a vision-based autonomous airplane taxiing system and compare with a fixed frequency analysis baseline. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 17 pages, 7 figures, Proceedings of the 7th International Symposium on AI Verification (SAIV)

arXiv:2405.18554 [pdf, other]

Scalable Surrogate Verification of Image-based Neural Network Control Systems using Composition and Unrolling

Authors: Feiyang Cai, Chuchu Fan, Stanley Bak

Abstract: Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the re… ▽ More Verifying safety of neural network control systems that use images as input is a difficult problem because, from a given system state, there is no known way to mathematically model what images are possible in the real-world. We build on recent work that considers a surrogate verification approach, training a conditional generative adversarial network (cGAN) as an image generator in place of the real world. This enables set-based formal analysis of the closed-loop system, providing analysis beyond simulation and testing. While existing work is effective on small examples, excessive overapproximation both within a single control period and across multiple control periods limits its scalability. We propose approaches to overcome these two sources of error. First, we overcome one-step error by composing the system's dynamics along with the cGAN and neural network controller, without losing the dependencies between input states and the control outputs as in the monotonic analysis of the system dynamics. Second, we reduce multi-step error by repeating the single-step composition, essentially unrolling multiple steps of the control loop into a large neural network. We then leverage existing network verification tools to compute accurate reachable sets for multiple steps, avoiding the accumulation of abstraction error at each step. We demonstrate the effectiveness of our approach in terms of both accuracy and scalability using two case studies: an autonomous aircraft taxiing system and an advanced emergency braking system. On the aircraft taxiing system, the converged reachable set is 175% larger using the prior baseline method compared with our proposed approach. On the emergency braking system, with 24x the number of image output variables from the cGAN, the baseline method fails to prove any states are safe, whereas our improvements enable set-based safety analysis. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.09705 [pdf, other]

The Realization of a Gas Puff Imaging System on the Wendelstein 7-X Stellarator

Authors: J. L. Terry, A. von Stechow, S. G. Baek, S. B. Ballinger, O. Grulke, C. von Sehren, R. Laube, C. Killer, F. Scharmer, K. J. Brunner, J. Knauer, S. Bois, the W7-X Team

Abstract: A system for studying the spatio-temporal dynamics of fluctuations in the boundary of the W7-X plasma using the Gas-Puff Imaging (GPI) technique has been designed, constructed, installed, and operated. This GPI system addresses a number of challenges specific to long-pulse superconducting devices like W7-X, including the long distance between the plasma and the vacuum vessel wall, the long distanc… ▽ More A system for studying the spatio-temporal dynamics of fluctuations in the boundary of the W7-X plasma using the Gas-Puff Imaging (GPI) technique has been designed, constructed, installed, and operated. This GPI system addresses a number of challenges specific to long-pulse superconducting devices like W7-X, including the long distance between the plasma and the vacuum vessel wall, the long distance between the plasma and diagnostic ports, the range of last closed flux surface locations for different magnetic configurations in W7-X, and management of heat loads on the system's plasma-facing components. The system features a pair of "converging-diverging" nozzles for partially collimating the gas puffed locally $\approx$135 mm radially outboard of the plasma boundary, a pop-up turning mirror for viewing the gas puff emission from the side (also acting as a shutter for the re-entrant vacuum window), and a high-throughput optical system that collects visible emission resulting from the interaction between the puffed gas and the plasma and directs it along a water-cooled re-entrant tube directly onto the 8 x 16 pixel detector array of the fast camera. The DEGAS 2 neutrals code was used to simulate the H$_α$ (656 nm) and the HeI (587 nm) line emission expected from well-characterized gas-puffs of H$_2$ and He and excited within typical edge plasma profiles in W7-X, thereby predicting line brightnesses used to reduce the risks associated with system sensitivity and placement of the field of view. Operation of GPI on W7-X shows excellent signal to noise ratios (>100) over the field of view for minimally perturbing gas puffs. The GPI system provides detailed measurements of the 2-dimensional (radial and poloidal) dynamics of plasma fluctuations in the W7-X edge, scrape-off layer, and in and around the magnetic islands that make up the island divertor configuration employed on W7-X. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 30 pages, 23 figures, submitted to Review of Scientific Instruments

arXiv:2405.05746 [pdf, other]

3D bulk field theories for 2D non-unitary N=1 supersymmetric minimal models

Authors: Seungjoo Baek, Dongmin Gang

Abstract: We propose bulk 3D N=4 rank-0 superconformal field theories, which are related to 2D N=1 supersymmetric minimal models, SM(2, ...) and SM(3, ...), via recently discovered non-unitary bulk-boundary correspondence. The correspondence relates a 3D N=4 rank-0 superconformal field theory to 2D chiral rational conformal field theories. A topologically twisted theory of the rank-0 SCFT supports the ratio… ▽ More We propose bulk 3D N=4 rank-0 superconformal field theories, which are related to 2D N=1 supersymmetric minimal models, SM(2, ...) and SM(3, ...), via recently discovered non-unitary bulk-boundary correspondence. The correspondence relates a 3D N=4 rank-0 superconformal field theory to 2D chiral rational conformal field theories. A topologically twisted theory of the rank-0 SCFT supports the rational chiral algebra at the boundary upon a proper choice of boundary condition. We test the proposal by checking several non-trivial dictionaries of the correspondence. △ Less

Submitted 3 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 32 pages, 3 figures

arXiv:2405.02499 [pdf, other]

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

arXiv:2404.15664 [pdf, other]

Exact Cluster Dynamics of Indirect Reciprocity in Complete Graphs

Authors: Minwoo Bae, Takashi Shimada, Seung Ki Baek

Abstract: Heider's balance theory emphasizes cognitive consistency in assessing others, as is expressed by ``The enemy of my enemy is my friend.'' At the same time, the theory of indirect reciprocity provides us with a dynamical framework to study how to assess others based on their actions as well as how to act toward them based on the assessments. Well-known are the `leading eight' from L1 to L8, the eigh… ▽ More Heider's balance theory emphasizes cognitive consistency in assessing others, as is expressed by ``The enemy of my enemy is my friend.'' At the same time, the theory of indirect reciprocity provides us with a dynamical framework to study how to assess others based on their actions as well as how to act toward them based on the assessments. Well-known are the `leading eight' from L1 to L8, the eight norms for assessment and action to foster cooperation in social dilemmas while resisting the invasion of mutant norms prescribing alternative actions. In this work, we begin by showing that balance is equivalent to stationarity of dynamics only for L4 and L6 (Stern Judging) among the leading eight. Stern Judging reflects an intuitive idea that good merits reward whereas evil warrants punishment. By analyzing the dynamics of Stern Judging in complete graphs, we prove that this norm almost always segregates the graph into two mutually hostile groups as the graph size grows. We then compare L4 with Stern Judging: The only difference of L4 is that a good player's cooperative action toward a bad one is regarded as good. This subtle difference transforms large populations governed by L4 to a ``paradise'' where cooperation prevails and positive assessments abound. Our study thus helps us understand the relationship between individual norms and their emergent consequences at a population level, shedding light on the nuanced interplay between cognitive consistency and segregation dynamics. △ Less

Submitted 7 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13541 [pdf, other]

Generalizable Novel-View Synthesis using a Stereo Camera

Authors: Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho

Abstract: In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo… ▽ More In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024. Project page URL: https://jinwonjoon.github.io/stereonerf/

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00916 [pdf, other]

Gyro-based Neural Single Image Deblurring

Authors: Heemin Yang, Jaesung Rim, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho

Abstract: In this paper, we present GyroDeblurNet, a novel single image deblurring method that utilizes a gyro sensor to effectively resolve the ill-posedness of image deblurring. The gyro sensor provides valuable information about camera motion during exposure time that can significantly improve deblurring quality. However, effectively exploiting real-world gyro data is challenging due to significant error… ▽ More In this paper, we present GyroDeblurNet, a novel single image deblurring method that utilizes a gyro sensor to effectively resolve the ill-posedness of image deblurring. The gyro sensor provides valuable information about camera motion during exposure time that can significantly improve deblurring quality. However, effectively exploiting real-world gyro data is challenging due to significant errors from various sources including sensor noise, the disparity between the positions of a camera module and a gyro sensor, the absence of translational motion information, and moving objects whose motions cannot be captured by a gyro sensor. To handle gyro error, GyroDeblurNet is equipped with two novel neural network blocks: a gyro refinement block and a gyro deblurring block. The gyro refinement block refines the error-ridden gyro data using the blur information from the input image. On the other hand, the gyro deblurring block removes blur from the input image using the refined gyro data and further compensates for gyro error by leveraging the blur information from the input image. For training a neural network with erroneous gyro data, we propose a training strategy based on the curriculum learning. We also introduce a novel gyro data embedding scheme to represent real-world intricate camera shakes. Finally, we present a synthetic dataset and a real dataset for the training and evaluation of gyro-based single image deblurring. Our experiments demonstrate that our approach achieves state-of-the-art deblurring quality by effectively utilizing erroneous gyro data. △ Less

Submitted 8 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 14 pages, 11 figures

arXiv:2404.00562 [pdf, other]

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Authors: Junuk Cha, Jihyeon Kim, Jae Shin Yoon, Seungryul Baek

Abstract: This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and… ▽ More This paper introduces the first text-guided work for generating the sequence of hand-object interaction in 3D. The main challenge arises from the lack of labeled data where existing ground-truth datasets are nowhere near generalizable in interaction type and object category, which inhibits the modeling of diverse 3D hand-object interaction with the correct physical implication (e.g., contacts and semantics) from text prompts. To address this challenge, we propose to decompose the interaction generation task into two subtasks: hand-object contact generation; and hand-object motion generation. For contact generation, a VAE-based network takes as input a text and an object mesh, and generates the probability of contacts between the surfaces of hands and the object during the interaction. The network learns a variety of local geometry structure of diverse objects that is independent of the objects' category, and thus, it is applicable to general objects. For motion generation, a Transformer-based diffusion model utilizes this 3D contact map as a strong prior for generating physically plausible hand-object motion as a function of text prompts by learning from the augmented labeled dataset; where we annotate text labels from many existing 3D hand and object motion data. Finally, we further introduce a hand refiner module that minimizes the distance between the object surface and hand joints to improve the temporal stability of the object-hand contacts and to suppress the penetration artifacts. In the experiments, we demonstrate that our method can generate more realistic and diverse interactions compared to other baseline methods. We also show that our method is applicable to unseen objects. We will release our model and newly labeled data as a strong foundation for future research. Codes and data are available in: https://github.com/JunukCha/Text2HOI. △ Less

Submitted 1 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.18013 [pdf, other]

doi 10.1007/JHEP05(2024)331

Rindler Fluids from Gravitational Shockwaves

Authors: Sang-Eon Bak, Cynthia Keeler, Yiwen Zhang, Kathryn M. Zurek

Abstract: We study a correspondence between gravitational shockwave geometry and its fluid description near a Rindler horizon in Minkowski spacetime. Utilizing the Petrov classification that describes algebraic symmetries for Lorentzian spaces, we establish an explicit mapping between a potential fluid and the shockwave metric perturbation, where the Einstein equation for the shockwave geometry is equivalen… ▽ More We study a correspondence between gravitational shockwave geometry and its fluid description near a Rindler horizon in Minkowski spacetime. Utilizing the Petrov classification that describes algebraic symmetries for Lorentzian spaces, we establish an explicit mapping between a potential fluid and the shockwave metric perturbation, where the Einstein equation for the shockwave geometry is equivalent to the incompressibility condition of the fluid, augmented by a shockwave source. Then we consider an Ansatz of a stochastic quantum source for the potential fluid, which has the physical interpretation of shockwaves created by vacuum energy fluctuations. Under such circumstance, the Einstein equation, or equivalently, the incompressibility condition for the fluid, becomes a stochastic differential equation. By smearing the quantum source on a stretched horizon in a Lorentz invariant manner with a Planckian width (similarly to the membrane paradigm), we integrate fluctuations near the Rindler horizon to find an accumulated effect of the variance in the round-trip time of a photon traversing the horizon of a causal diamond. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 21 pages, 1 figure

Report number: CALT-TH 2024-016

Journal ref: JHEP 05 (2024) 331

arXiv:2403.16428 [pdf, other]

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Zheng Liu, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the h… ▽ More We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions. △ Less

Submitted 5 August, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted to ECCV 2024

arXiv:2403.06592 [pdf, other]

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Authors: Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi

Abstract: This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with… ▽ More This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection. △ Less

Submitted 20 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: Preprint version, final version will be available at https://openaccess.thecvf.com The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) (2024) Published by: IEEE & CVF

arXiv:2403.05346 [pdf, other]

VLM-PL: Advanced Pseudo Labeling Approach for Class Incremental Object Detection via Vision-Language Model

Authors: Junsu Kim, Yunhoe Ku, Jihyeon Kim, Junuk Cha, Seungryul Baek

Abstract: In the field of Class Incremental Object Detection (CIOD), creating models that can continuously learn like humans is a major challenge. Pseudo-labeling methods, although initially powerful, struggle with multi-scenario incremental learning due to their tendency to forget past knowledge. To overcome this, we introduce a new approach called Vision-Language Model assisted Pseudo-Labeling (VLM-PL). T… ▽ More In the field of Class Incremental Object Detection (CIOD), creating models that can continuously learn like humans is a major challenge. Pseudo-labeling methods, although initially powerful, struggle with multi-scenario incremental learning due to their tendency to forget past knowledge. To overcome this, we introduce a new approach called Vision-Language Model assisted Pseudo-Labeling (VLM-PL). This technique uses Vision-Language Model (VLM) to verify the correctness of pseudo ground-truths (GTs) without requiring additional model training. VLM-PL starts by deriving pseudo GTs from a pre-trained detector. Then, we generate custom queries for each pseudo GT using carefully designed prompt templates that combine image and text features. This allows the VLM to classify the correctness through its responses. Furthermore, VLM-PL integrates refined pseudo and real GTs from upcoming training, effectively combining new and old knowledge. Extensive experiments conducted on the Pascal VOC and MS COCO datasets not only highlight VLM-PL's exceptional performance in multi-scenario but also illuminate its effectiveness in dual-scenario by achieving state-of-the-art results in both. △ Less

Submitted 8 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: Accept to CVPRW2024 (CLvision). The camera-ready version of the manuscript

arXiv:2402.17323 [pdf, other]

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

Authors: Junsu Kim, Hoseong Cho, Jihyeon Kim, Yihalem Yimolal Tiruneh, Seungryul Baek

Abstract: In the field of class incremental learning (CIL), generative replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the complexities of scenes involving multiple labels. In this pape… ▽ More In the field of class incremental learning (CIL), generative replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the complexities of scenes involving multiple labels. In this paper, we propose a novel approach called stable diffusion deep generative replay (SDDGR) for CIOD. Our method utilizes a diffusion-based generative model with pre-trained text-to-diffusion networks to generate realistic and diverse synthetic images. SDDGR incorporates an iterative refinement strategy to produce high-quality images encompassing old classes. Additionally, we adopt an L2 knowledge distillation technique to improve the retention of prior knowledge in synthetic images. Furthermore, our approach includes pseudo-labeling for old objects within new task images, preventing misclassification as background elements. Extensive experiments on the COCO 2017 dataset demonstrate that SDDGR significantly outperforms existing algorithms, achieving a new state-of-the-art in various CIOD scenarios. The source code will be made available to the public. △ Less

Submitted 7 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accept to CVPR 2024. The camera-ready version

arXiv:2402.13560 [pdf, other]

doi 10.1016/j.optlastec.2024.111436

Design and characterization of individual addressing optics based on multi-channel acousto-optic modulator for $^{171}$Yb$^+$ qubits

Authors: Sungjoo Lim, Seunghyun Baek, Jacob Whitlow, Marissa D'Onofrio, Tianyi Chen, Samuel Phiri, Stephen Crain, Kenneth R. Brown, Jungsang Kim, Junki Kim

Abstract: We present the design and characterization of individual addressing optics based on a multi-channel acousto-optic modulator (AOM) for trapped ytterbium-171 ions. The design parameters of the individual addressing system were determined based on the tradeoff between the expected crosstalk and the required numerical aperture of the projection objective lens. The target beam diameter and separation w… ▽ More We present the design and characterization of individual addressing optics based on a multi-channel acousto-optic modulator (AOM) for trapped ytterbium-171 ions. The design parameters of the individual addressing system were determined based on the tradeoff between the expected crosstalk and the required numerical aperture of the projection objective lens. The target beam diameter and separation were 1.90 $μ$m and 4.28 $μ$m, respectively. The individual beams shaped by the projection optics were characterized by an imaging sensor and a field probe ion. The resulting effective beam diameters and separations were approximately 2.34--2.36 $μ$m and 4.31 $μ$m, respectively, owing to residual aberration. △ Less

Submitted 30 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 14 pages, 5 figures

Journal ref: Optics & Laser Technology 180, 111436 (2025)

arXiv:2402.12503 [pdf, other]

PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling

Authors: Phong C. H. Nguyen, Xinlun Cheng, Shahab Azarfar, Pradeep Seshadri, Yen T. Nguyen, Munho Kim, Sanghun Choi, H. S. Udaykumar, Stephen Baek

Abstract: Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfa… ▽ More Modeling unsteady, fast transient, and advection-dominated physics problems is a pressing challenge for physics-aware deep learning (PADL). The physics of complex systems is governed by large systems of partial differential equations (PDEs) and ancillary constitutive models with nonlinear structures, as well as evolving state fields exhibiting sharp gradients and rapidly deforming material interfaces. Here, we investigate an inductive bias approach that is versatile and generalizable to model generic nonlinear field evolution problems. Our study focuses on the recent physics-aware recurrent convolutions (PARC), which incorporates a differentiator-integrator architecture that inductively models the spatiotemporal dynamics of generic physical systems. We extend the capabilities of PARC to simulate unsteady, transient, and advection-dominant systems. The extended model, referred to as PARCv2, is equipped with differential operators to model advection-reaction-diffusion equations, as well as a hybrid integral solver for stable, long-time predictions. PARCv2 is tested on both standard benchmark problems in fluid dynamics, namely Burgers and Navier-Stokes equations, and then applied to more complex shock-induced reaction problems in energetic materials. We evaluate the behavior of PARCv2 in comparison to other physics-informed and learning bias models and demonstrate its potential to model unsteady and advection-dominant dynamics regimes. △ Less

Submitted 24 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11597 [pdf, other]

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

Authors: Guijin Son, Sangwon Baek, Sangdae Nam, Ilgyun Jeong, Seungone Kim

Abstract: Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25… ▽ More Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25 tasks. Each task in the MTI Bench involves 2 to 3 sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces the total inference time by 1.46 times in average since it does not require multiple inference calls. Interestingly, contrary to the expectation that LLMs would perform better when tasks are divided, we find that state-of-the-art LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3% and 12.4% improved performance with Multi-Task Inference compared to Single-Task Inference on the MTI Bench. We release the MTI Bench dataset and our code at this link https://github.com/guijinSON/MTI-Bench. △ Less

Submitted 6 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: acl 2024 (main)

arXiv:2402.08174 [pdf, other]

doi 10.1145/3589334.3645372

Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

Authors: Minsang Kim, Seungjun Baek

Abstract: Learning positional information of nodes in a graph is important for link prediction tasks. We propose a representation of positional information using representative nodes called landmarks. A small number of nodes with high degree centrality are selected as landmarks, which serve as reference points for the nodes' positions. We justify this selection strategy for well-known random graph models an… ▽ More Learning positional information of nodes in a graph is important for link prediction tasks. We propose a representation of positional information using representative nodes called landmarks. A small number of nodes with high degree centrality are selected as landmarks, which serve as reference points for the nodes' positions. We justify this selection strategy for well-known random graph models and derive closed-form bounds on the average path lengths involving landmarks. In a model for power-law graphs, we prove that landmarks provide asymptotically exact information on inter-node distances. We apply theoretical insights to practical networks and propose Hierarchical Position embedding with Landmarks and Clustering (HPLC). HPLC combines landmark selection and graph clustering, where the graph is partitioned into densely connected clusters in which nodes with the highest degree are selected as landmarks. HPLC leverages the positional information of nodes based on landmarks at various levels of hierarchy such as nodes' distances to landmarks, inter-landmark distances and hierarchical grouping of clusters. Experiments show that HPLC achieves state-of-the-art performances of link prediction on various datasets in terms of HIT@K, MRR, and AUC. The code is available at \url{https://github.com/kmswin1/HPLC}. △ Less

Submitted 19 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: The International World Wide Web Conference (WWW) 2024, Accepted paper

arXiv:2402.06440 [pdf, other]

A Method for Decrypting Data Infected with Rhysida Ransomware

Authors: Giyoon Kim, Soojin Kang, Seungjun Baek, Kimoon Kim, Jongsung Kim

Abstract: Ransomware is malicious software that is a prominent global cybersecurity threat. Typically, ransomware encrypts data on a system, rendering the victim unable to decrypt it without the attacker's private key. Subsequently, victims often pay a substantial ransom to recover their data, yet some may still incur damage or loss. This study examines Rhysida ransomware, which caused significant damage in… ▽ More Ransomware is malicious software that is a prominent global cybersecurity threat. Typically, ransomware encrypts data on a system, rendering the victim unable to decrypt it without the attacker's private key. Subsequently, victims often pay a substantial ransom to recover their data, yet some may still incur damage or loss. This study examines Rhysida ransomware, which caused significant damage in the second half of 2023, and proposes a decryption method. Rhysida ransomware employed a secure random number generator to generate the encryption key and subsequently encrypt the data. However, an implementation vulnerability existed that enabled us to regenerate the internal state of the random number generator at the time of infection. We successfully decrypted the data using the regenerated random number generator. To the best of our knowledge, this is the first successful decryption of Rhysida ransomware. We aspire for our work to contribute to mitigating the damage inflicted by the Rhysida ransomware. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.03559 [pdf, other]

Constrained Synthesis with Projected Diffusion Models

Authors: Jacob K Christopher, Stephen Baek, Ferdinando Fioretto

Abstract: This paper introduces an approach to endow generative diffusion processes the ability to satisfy and certify compliance with constraints and physical principles. The proposed method recast the traditional sampling process of generative diffusion models as a constrained optimization problem, steering the generated data distribution to remain within a specified region to ensure adherence to the give… ▽ More This paper introduces an approach to endow generative diffusion processes the ability to satisfy and certify compliance with constraints and physical principles. The proposed method recast the traditional sampling process of generative diffusion models as a constrained optimization problem, steering the generated data distribution to remain within a specified region to ensure adherence to the given constraints. These capabilities are validated on applications featuring both convex and challenging, non-convex, constraints as well as ordinary differential equations, in domains spanning from synthesizing new materials with precise morphometric properties, generating physics-informed motion, optimizing paths in planning scenarios, and human motion synthesis. △ Less

Submitted 23 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.16771 [pdf]

MolPLA: A Molecular Pretraining Framework for Learning Cores, R-Groups and their Linker Joints

Authors: Mogan Gim, Jueon Park, Soyon Park, Sanghoon Lee, Seungheun Baek, Junhyun Lee, Ngoc-Quang Nguyen, Jaewoo Kang

Abstract: Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts inmolecules that implicate their core struct… ▽ More Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts inmolecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios. Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates. The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.07331 [pdf, other]

Rapid Estimation of Left Ventricular Contractility with a Physics-Informed Neural Network Inverse Modeling Approach

Authors: Ehsan Naghavi, Haifeng Wang, Lei Fan, Jenny S. Choy, Ghassan Kassab, Seungik Baek, Lik-Chuan Lee

Abstract: Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisf… ▽ More Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisfy a system of ordinary differential equations (ODEs) associated with a lumped parameter description of the circulatory system. The model predictions have a maximum error of less than 5% when compared to those obtained by solving the ODEs numerically. An inverse modeling approach using the PINN model is also developed to rapidly estimate model parameters (in $\sim$ 3 mins) from single-beat LV pressure and volume waveforms. Using synthetic LV pressure and volume waveforms generated by the PINN model with different model parameter values, we show that the inverse modeling approach can recover the corresponding ground truth values, which suggests that the model parameters are unique. The PINN inverse modeling approach is then applied to estimate LV contractility indexed by the end-systolic elastance $E_{es}$ using waveforms acquired from 11 swine models, including waveforms acquired before and after administration of dobutamine (an inotropic agent) in 3 animals. The estimated $E_{es}$ is about 58% to 284% higher for the data associated with dobutamine compared to those without, which implies that this approach can be used to estimate LV contractility using single-beat measurements. The PINN inverse modeling can potentially be used in the clinic to simultaneously estimate LV contractility and other physiological parameters from single-beat measurements. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.06415 [pdf, other]

3D Reconstruction of Interacting Multi-Person in Clothing from a Single Image

Authors: Junuk Cha, Hansol Lee, Jaewon Kim, Nhat Nguyen Bao Truong, Jae Shin Yoon, Seungryul Baek

Abstract: This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image. The main challenge arises from the occlusion: a part of a human body is not visible from a single view due to the occlusion by others or the self, which introduces missing geometry and physical implausibility (e.g., penetration). We over… ▽ More This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image. The main challenge arises from the occlusion: a part of a human body is not visible from a single view due to the occlusion by others or the self, which introduces missing geometry and physical implausibility (e.g., penetration). We overcome this challenge by utilizing two human priors for complete 3D geometry and surface contacts. For the geometry prior, an encoder learns to regress the image of a person with missing body parts to the latent vectors; a decoder decodes these vectors to produce 3D features of the associated geometry; and an implicit network combines these features with a surface normal map to reconstruct a complete and detailed 3D humans. For the contact prior, we develop an image-space contact detector that outputs a probability distribution of surface contacts between people in 3D. We use these priors to globally refine the body poses, enabling the penetration-free and accurate reconstruction of interacting multi-person in clothing on the scene space. The results demonstrate that our method is complete, globally coherent, and physically plausible compared to existing methods. △ Less

Submitted 2 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to WACV 2024

arXiv:2401.06369 [pdf, other]

doi 10.35848/1347-4065/ad3533

Low-Loss Polarization-Maintaining Optical Router for Photonic Quantum Information Processing

Authors: Pengfei Wang, Soyoung Baek, Keiichi Edamatsu, Fumihiro Kaneda

Abstract: In photonic quantum applications, optical routers are required to handle single photons with low loss, high speed, and preservation of their quantum states. Single-photon routing with maintained polarization states is particularly important for utilizing them as qubits. Here, we demonstrate a polarization-maintaining electro-optic router compatible with single photons. Our custom electro-optic mod… ▽ More In photonic quantum applications, optical routers are required to handle single photons with low loss, high speed, and preservation of their quantum states. Single-photon routing with maintained polarization states is particularly important for utilizing them as qubits. Here, we demonstrate a polarization-maintaining electro-optic router compatible with single photons. Our custom electro-optic modulator is embedded in a configuration of a Mach-Zehnder interferometer, where each optical component achieves polarization-maintaining operation. We observe the performance of the router with 2-4% loss, 20 dB switching extinction ratio, 2.9 ns rise time, and $>$ 99% polarization process fidelity to an ideal identity operation. △ Less

Submitted 6 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Journal ref: Jpn. J. Appl. Phys. 63 040901 (2024)

arXiv:2401.03835 [pdf, other]

Limitations of Data-Driven Spectral Reconstruction -- Optics-Aware Analysis and Mitigation

Authors: Qiang Fu, Matheus Souza, Eunsue Choi, Suhyun Shin, Seung-Hwan Baek, Wolfgang Heidrich

Abstract: Hyperspectral imaging empowers machine vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware. In this paper we systematically analyze the performance of such m… ▽ More Hyperspectral imaging empowers machine vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware. In this paper we systematically analyze the performance of such methods, evaluating both the practical limitations with respect to current datasets and overfitting, as well as fundamental limitations with respect to the nature of the information encoded in the RGB images, and the dependency of this information on the optical system of the camera. We find that, the current models are not robust under slight variations, e.g., in noise level or compression of the RGB file. Without modeling underrepresented spectral content, existing datasets and the models trained on them are limited in their ability to cope with challenging metameric colors. To mitigate this issue, we propose to exploit the combination of metameric data augmentation and optical lens aberrations to improve the encoding of the metameric information into the RGB image, which paves the road towards higher performing spectral imaging and reconstruction approaches. △ Less

Submitted 2 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 12 pages, 7 figures, 8 tables

arXiv:2401.00370 [pdf, other]

UGPNet: Universal Generative Prior for Image Restoration

Authors: Hwayoon Lee, Kyoungkook Kang, Hyeongmin Lee, Seung-Hwan Baek, Sunghyun Cho

Abstract: Recent image restoration methods can be broadly categorized into two classes: (1) regression methods that recover the rough structure of the original image without synthesizing high-frequency details and (2) generative methods that synthesize perceptually-realistic high-frequency details even though the resulting image deviates from the original structure of the input. While both directions have b… ▽ More Recent image restoration methods can be broadly categorized into two classes: (1) regression methods that recover the rough structure of the original image without synthesizing high-frequency details and (2) generative methods that synthesize perceptually-realistic high-frequency details even though the resulting image deviates from the original structure of the input. While both directions have been extensively studied in isolation, merging their benefits with a single framework has been rarely studied. In this paper, we propose UGPNet, a universal image restoration framework that can effectively achieve the benefits of both approaches by simply adopting a pair of an existing regression model and a generative model. UGPNet first restores the image structure of a degraded input using a regression model and synthesizes a perceptually-realistic image with a generative model on top of the regressed output. UGPNet then combines the regressed output and the synthesized output, resulting in a final result that faithfully reconstructs the structure of the original image in addition to perceptually-realistic textures. Our extensive experiments on deblurring, denoising, and super-resolution demonstrate that UGPNet can successfully exploit both regression and generative methods for high-fidelity image restoration. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted to WACV 2024

arXiv:2312.17214 [pdf, other]

doi 10.1007/JHEP07(2024)214

Quantum-Gravitational Null Raychaudhuri Equation

Authors: Sang-Eon Bak, Maulik Parikh, Sudipta Sarkar, Francesco Setti

Abstract: We consider a congruence of null geodesics in the presence of a quantized spacetime metric. The coupling to a quantum metric induces fluctuations in the congruence; we calculate the change in the area of a pencil of geodesics induced by such fluctuations. For the gravitational field in its vacuum state, we find that quantum gravity contributes a correction to the null Raychaudhuri equation which i… ▽ More We consider a congruence of null geodesics in the presence of a quantized spacetime metric. The coupling to a quantum metric induces fluctuations in the congruence; we calculate the change in the area of a pencil of geodesics induced by such fluctuations. For the gravitational field in its vacuum state, we find that quantum gravity contributes a correction to the null Raychaudhuri equation which is of the same sign as the classical terms. We thus derive a quantum-gravitational focusing theorem valid for linearized quantum gravity. △ Less

Submitted 25 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 15 pages, 1 figure, v2. published version in JHEP

Journal ref: JHEP 07 (2024) 214

arXiv:2312.16842 [pdf, other]

Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera

Authors: Hansol Lee, Junuk Cha, Yunhoe Ku, Jae Shin Yoon, Seungryul Baek

Abstract: The appearance of a human in clothing is driven not only by the pose but also by its temporal context, i.e., motion. However, such context has been largely neglected by existing monocular human modeling methods whose neural networks often struggle to learn a video of a person with large dynamics due to the motion ambiguity, i.e., there exist numerous geometric configurations of clothes that are de… ▽ More The appearance of a human in clothing is driven not only by the pose but also by its temporal context, i.e., motion. However, such context has been largely neglected by existing monocular human modeling methods whose neural networks often struggle to learn a video of a person with large dynamics due to the motion ambiguity, i.e., there exist numerous geometric configurations of clothes that are dependent on the context of motion even for the same pose. In this paper, we introduce a method for high-quality modeling of clothed 3D human avatars using a video of a person with dynamic movements. The main challenge comes from the lack of 3D ground truth data of geometry and its temporal correspondences. We address this challenge by introducing a novel compositional human modeling framework that takes advantage of both explicit and implicit human modeling. For explicit modeling, a neural network learns to generate point-wise shape residuals and appearance features of a 3D body model by comparing its 2D rendering results and the original images. This explicit model allows for the reconstruction of discriminative 3D motion features from UV space by encoding their temporal correspondences. For implicit modeling, an implicit network combines the appearance and 3D motion features to decode high-fidelity clothed 3D human avatars with motion-dependent geometry and texture. The experiments show that our method can generate a large variation of secondary motion in a physically plausible way. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16760 [pdf, other]

The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and Results

Authors: Christopher Brix, Stanley Bak, Changliu Liu, Taylor T. Johnson

Abstract: This report summarizes the 4th International Verification of Neural Networks Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with the 35th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural… ▽ More This report summarizes the 4th International Verification of Neural Networks Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with the 35th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2023 iteration, 7 teams participated on a diverse set of 10 scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2212.10376

arXiv:2312.13313 [pdf, other]

ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

Authors: Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho

Abstract: RAW images are rarely shared mainly due to its excessive data size compared to their sRGB counterparts obtained by camera ISPs. Learning the forward and inverse processes of camera ISPs has been recently demonstrated, enabling physically-meaningful RAW-level image processing on input sRGB images. However, existing learning-based ISP methods fail to handle the large variations in the ISP processes… ▽ More RAW images are rarely shared mainly due to its excessive data size compared to their sRGB counterparts obtained by camera ISPs. Learning the forward and inverse processes of camera ISPs has been recently demonstrated, enabling physically-meaningful RAW-level image processing on input sRGB images. However, existing learning-based ISP methods fail to handle the large variations in the ISP processes with respect to camera parameters such as ISO and exposure time, and have limitations when used for various applications. In this paper, we propose ParamISP, a learning-based method for forward and inverse conversion between sRGB and RAW images, that adopts a novel neural-network module to utilize camera parameters, which is dubbed as ParamNet. Given the camera parameters provided in the EXIF data, ParamNet converts them into a feature vector to control the ISP networks. Extensive experiments demonstrate that ParamISP achieve superior RAW and sRGB reconstruction results compared to previous methods and it can be effectively used for a variety of applications such as deblurring dataset synthesis, raw deblurring, HDR reconstruction, and camera-to-camera transfer. △ Less

Submitted 14 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.09139 [pdf, other]

Class-Wise Buffer Management for Incremental Object Detection: An Effective Buffer Training Strategy

Authors: Junsu Kim, Sumin Hong, Chanwoo Kim, Jihyeon Kim, Yihalem Yimolal Tiruneh, Jeongwan On, Jihyun Song, Sunhwa Choi, Seungryul Baek

Abstract: Class incremental learning aims to solve a problem that arises when continuously adding unseen class instances to an existing model This approach has been extensively studied in the context of image classification; however its applicability to object detection is not well established yet. Existing frameworks using replay methods mainly collect replay data without considering the model being traine… ▽ More Class incremental learning aims to solve a problem that arises when continuously adding unseen class instances to an existing model This approach has been extensively studied in the context of image classification; however its applicability to object detection is not well established yet. Existing frameworks using replay methods mainly collect replay data without considering the model being trained and tend to rely on randomness or the number of labels of each sample. Also, despite the effectiveness of the replay, it was not yet optimized for the object detection task. In this paper, we introduce an effective buffer training strategy (eBTS) that creates the optimized replay buffer on object detection. Our approach incorporates guarantee minimum and hierarchical sampling to establish the buffer customized to the trained model. %These methods can facilitate effective retrieval of prior knowledge. Furthermore, we use the circular experience replay training to optimally utilize the accumulated buffer data. Experiments on the MS COCO dataset demonstrate that our eBTS achieves state-of-the-art performance compared to the existing replay schemes. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, Accepted at ICASSP 2024

arXiv:2312.02480 [pdf, other]

Differentiable Point-based Inverse Rendering

Authors: Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek

Abstract: We present differentiable point-based inverse rendering, DPIR, an analysis-by-synthesis method that processes images captured under diverse illuminations to estimate shape and spatially-varying BRDF. To this end, we adopt point-based rendering, eliminating the need for multiple samplings per ray, typical of volumetric rendering, thus significantly enhancing the speed of inverse rendering. To reali… ▽ More We present differentiable point-based inverse rendering, DPIR, an analysis-by-synthesis method that processes images captured under diverse illuminations to estimate shape and spatially-varying BRDF. To this end, we adopt point-based rendering, eliminating the need for multiple samplings per ray, typical of volumetric rendering, thus significantly enhancing the speed of inverse rendering. To realize this idea, we devise a hybrid point-volumetric representation for geometry and a regularized basis-BRDF representation for reflectance. The hybrid geometric representation enables fast rendering through point-based splatting while retaining the geometric details and stability inherent to SDF-based representations. The regularized basis-BRDF mitigates the ill-posedness of inverse rendering stemming from limited light-view angular samples. We also propose an efficient shadow detection method using point-based shadow map rendering. Our extensive evaluations demonstrate that DPIR outperforms prior works in terms of reconstruction accuracy, computational efficiency, and memory footprint. Furthermore, our explicit point-based representation and rendering enables intuitive geometry and reflectance editing. △ Less

Submitted 25 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.02313 [pdf, other]

Coverage Explorer: Coverage-guided Test Generation for Cyber Physical Systems

Authors: Sanaz Sheikhi, Stanley Bak

Abstract: Given the safety-critical functions of autonomous cyber-physical systems (CPS) across diverse domains, testing these systems is essential. While conventional software and hardware testing methodologies offer partial insights, they frequently do not provide adequate coverage in a CPS. In this study, we introduce a testing framework designed to systematically formulate test cases, effectively explor… ▽ More Given the safety-critical functions of autonomous cyber-physical systems (CPS) across diverse domains, testing these systems is essential. While conventional software and hardware testing methodologies offer partial insights, they frequently do not provide adequate coverage in a CPS. In this study, we introduce a testing framework designed to systematically formulate test cases, effectively exploring the state space of CPS. This framework introduces a coverage-centric sampling technique, coupled with a cluster-based methodology for training a surrogate model. The framework then uses model predictive control within the surrogate model to generates test cases tailored to CPS specifications. To evaluate the efficacy of the framework, we applied it on several benchmarks, spanning from a kinematic car to systems like an unmanned aircraft collision avoidance system (ACAS XU) and automatic transmission system. Comparative analyses were conducted against alternative test generation strategies, including randomized testing, as well as falsification using S-TaLiRo. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.18287 [pdf, other]

Dispersed Structured Light for Hyperspectral 3D Imaging

Authors: Suhyun Shin, Seokjun Choi, Felix Heide, Seung-Hwan Baek

Abstract: Hyperspectral 3D imaging aims to acquire both depth and spectral information of a scene. However, existing methods are either prohibitively expensive and bulky or compromise on spectral and depth accuracy. In this work, we present Dispersed Structured Light (DSL), a cost-effective and compact method for accurate hyperspectral 3D imaging. DSL modifies a traditional projector-camera system by placin… ▽ More Hyperspectral 3D imaging aims to acquire both depth and spectral information of a scene. However, existing methods are either prohibitively expensive and bulky or compromise on spectral and depth accuracy. In this work, we present Dispersed Structured Light (DSL), a cost-effective and compact method for accurate hyperspectral 3D imaging. DSL modifies a traditional projector-camera system by placing a sub-millimeter thick diffraction grating film front of the projector. The grating disperses structured light based on light wavelength. To utilize the dispersed structured light, we devise a model for dispersive projection image formation and a per-pixel hyperspectral 3D reconstruction method. We validate DSL by instantiating a compact experimental prototype. DSL achieves spectral accuracy of 18.8nm full-width half-maximum (FWHM) and depth error of 1mm. We demonstrate that DSL outperforms prior work on practical hyperspectral 3D imaging. DSL promises accurate and practical hyperspectral 3D imaging for diverse application domains, including computer vision and graphics, cultural heritage, geology, and biology. △ Less

Submitted 25 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.17396 [pdf, other]

Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset

Authors: Yujin Jeon, Eunsue Choi, Youngchan Kim, Yunseong Moon, Khalid Omer, Felix Heide, Seung-Hwan Baek

Abstract: Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing da… ▽ More Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although spectro-polarimetric datasets exist, these datasets have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets: trichromatic Stokes images and hyperspectral Stokes images. These novel datasets encompass both linear and circular polarization; they introduce multiple spectral channels; and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research. Dataset and code will be publicly available. △ Less

Submitted 30 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Showing 1–50 of 496 results for author: Bak, S