-
Mutually-Aware Feature Learning for Few-Shot Object Counting
Authors:
Yerim Jeon,
Subeen Lee,
Jihwan Kim,
Jae-Pil Heo
Abstract:
Few-shot object counting has garnered significant attention for its practicality as it aims to count target objects in a query image based on given exemplars without the need for additional training. However, there is a shortcoming in the prevailing extract-and-match approach: query and exemplar features lack interaction during feature extraction since they are extracted unaware of each other and…
▽ More
Few-shot object counting has garnered significant attention for its practicality as it aims to count target objects in a query image based on given exemplars without the need for additional training. However, there is a shortcoming in the prevailing extract-and-match approach: query and exemplar features lack interaction during feature extraction since they are extracted unaware of each other and later correlated based on similarity. This can lead to insufficient target awareness of the extracted features, resulting in target confusion in precisely identifying the actual target when multiple class objects coexist. To address this limitation, we propose a novel framework, Mutually-Aware FEAture learning(MAFEA), which encodes query and exemplar features mutually aware of each other from the outset. By encouraging interaction between query and exemplar features throughout the entire pipeline, we can obtain target-aware features that are robust to a multi-category scenario. Furthermore, we introduce a background token to effectively associate the target region of query with exemplars and decouple its background region from them. Our extensive experiments demonstrate that our model reaches a new state-of-the-art performance on the two challenging benchmarks, FSCD-LVIS and FSC-147, with a remarkably reduced degree of the target confusion problem.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Boundary-Recovering Network for Temporal Action Detection
Authors:
Jihwan Kim,
Jaehyun Choi,
Yerim Jeon,
Jae-Pil Heo
Abstract:
Temporal action detection (TAD) is challenging, yet fundamental for real-world video applications. Large temporal scale variation of actions is one of the most primary difficulties in TAD. Naturally, multi-scale features have potential in localizing actions of diverse lengths as widely used in object detection. Nevertheless, unlike objects in images, actions have more ambiguity in their boundaries…
▽ More
Temporal action detection (TAD) is challenging, yet fundamental for real-world video applications. Large temporal scale variation of actions is one of the most primary difficulties in TAD. Naturally, multi-scale features have potential in localizing actions of diverse lengths as widely used in object detection. Nevertheless, unlike objects in images, actions have more ambiguity in their boundaries. That is, small neighboring objects are not considered as a large one while short adjoining actions can be misunderstood as a long one. In the coarse-to-fine feature pyramid via pooling, these vague action boundaries can fade out, which we call 'vanishing boundary problem'. To this end, we propose Boundary-Recovering Network (BRN) to address the vanishing boundary problem. BRN constructs scale-time features by introducing a new axis called scale dimension by interpolating multi-scale features to the same temporal length. On top of scale-time features, scale-time blocks learn to exchange features across scale levels, which can effectively settle down the issue. Our extensive experiments demonstrate that our model outperforms the state-of-the-art on the two challenging benchmarks, ActivityNet-v1.3 and THUMOS14, with remarkably reduced degree of the vanishing boundary problem.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
An Investigation Into Explainable Audio Hate Speech Detection
Authors:
Jinmyeong An,
Wonjun Lee,
Yejin Jeon,
Jungseul Ok,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Research on hate speech has predominantly revolved around detection and interpretation from textual inputs, leaving verbal content largely unexplored. While there has been limited exploration into hate speech detection within verbal acoustic speech inputs, the aspect of interpretability has been overlooked. Therefore, we introduce a new task of explainable audio hate speech detection. Specifically…
▽ More
Research on hate speech has predominantly revolved around detection and interpretation from textual inputs, leaving verbal content largely unexplored. While there has been limited exploration into hate speech detection within verbal acoustic speech inputs, the aspect of interpretability has been overlooked. Therefore, we introduce a new task of explainable audio hate speech detection. Specifically, we aim to identify the precise time intervals, referred to as audio frame-level rationales, which serve as evidence for hate speech classification. Towards this end, we propose two different approaches: cascading and End-to-End (E2E). The cascading approach initially converts audio to transcripts, identifies hate speech within these transcripts, and subsequently locates the corresponding audio time frames. Conversely, the E2E approach processes audio utterances directly, which allows it to pinpoint hate speech within specific time frames. Additionally, due to the lack of explainable audio hate speech datasets that include audio frame-level rationales, we curated a synthetic audio dataset to train our models. We further validated these models on actual human speech utterances and found that the E2E approach outperforms the cascading method in terms of the audio frame Intersection over Union (IoU) metric. Furthermore, we observed that including frame-level rationales significantly enhances hate speech detection accuracy for the E2E approach.
\textbf{Disclaimer} The reader may encounter content of an offensive or hateful nature. However, given the nature of the work, this cannot be avoided.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
Authors:
Casper van Engelenburg,
Fatemeh Mostafavi,
Emanuel Kuhn,
Yuntae Jeon,
Michael Franzen,
Matthias Standfest,
Jan van Gemert,
Seyran Khademi
Abstract:
Diverse and realistic floor plan data are essential for the development of useful computer-aided methods in architectural design. Today's large-scale floor plan datasets predominantly feature simple floor plan layouts, typically representing single-apartment dwellings only. To compensate for the mismatch between current datasets and the real world, we develop \textbf{Modified Swiss Dwellings} (MSD…
▽ More
Diverse and realistic floor plan data are essential for the development of useful computer-aided methods in architectural design. Today's large-scale floor plan datasets predominantly feature simple floor plan layouts, typically representing single-apartment dwellings only. To compensate for the mismatch between current datasets and the real world, we develop \textbf{Modified Swiss Dwellings} (MSD) -- the first large-scale floor plan dataset that contains a significant share of layouts of multi-apartment dwellings. MSD features over 5.3K floor plans of medium- to large-scale building complexes, covering over 18.9K distinct apartments. We validate that existing approaches for floor plan generation, while effective in simpler scenarios, cannot yet seamlessly address the challenges posed by MSD. Our benchmark calls for new research in floor plan machine understanding. Code and data are open.
△ Less
Submitted 24 July, 2024; v1 submitted 14 July, 2024;
originally announced July 2024.
-
On the Robustness of Graph Reduction Against GNN Backdoor
Authors:
Yuxuan Zhu,
Michael Mandulak,
Kerui Wu,
George Slota,
Yuseok Jeon,
Ka-Ho Chow,
Lei Yu
Abstract:
Graph Neural Networks (GNNs) are gaining popularity across various domains due to their effectiveness in learning graph-structured data. Nevertheless, they have been shown to be susceptible to backdoor poisoning attacks, which pose serious threats to real-world applications. Meanwhile, graph reduction techniques, including coarsening and sparsification, which have long been employed to improve the…
▽ More
Graph Neural Networks (GNNs) are gaining popularity across various domains due to their effectiveness in learning graph-structured data. Nevertheless, they have been shown to be susceptible to backdoor poisoning attacks, which pose serious threats to real-world applications. Meanwhile, graph reduction techniques, including coarsening and sparsification, which have long been employed to improve the scalability of large graph computational tasks, have recently emerged as effective methods for accelerating GNN training on large-scale graphs. However, the current development and deployment of graph reduction techniques for large graphs overlook the potential risks of data poisoning attacks against GNNs. It is not yet clear how graph reduction interacts with existing backdoor attacks. This paper conducts a thorough examination of the robustness of graph reduction methods in scalable GNN training in the presence of state-of-the-art backdoor attacks. We performed a comprehensive robustness analysis across six coarsening methods and six sparsification methods for graph reduction, under three GNN backdoor attacks against three GNN architectures. Our findings indicate that the effectiveness of graph reduction methods in mitigating attack success rates varies significantly, with some methods even exacerbating the attacks. Through detailed analyses of triggers and poisoned nodes, we interpret our findings and enhance our understanding of how graph reduction influences robustness against backdoor attacks. These results highlight the critical need for incorporating robustness considerations in graph reduction for GNN training, ensuring that enhancements in computational efficiency do not compromise the security of GNN systems.
△ Less
Submitted 8 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Attention-aware Post-training Quantization without Backpropagation
Authors:
Junhan Kim,
Ho-young Kim,
Eulrang Cho,
Chungman Lee,
Joonyoung Kim,
Yongkweon Jeon
Abstract:
Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated…
▽ More
Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Highest Fusion Performance without Harmful Edge Energy Bursts in Tokamak
Authors:
SangKyeun Kim,
Ricardo Shousha,
SeongMoo Yang,
Qiming Hu,
SangHee Hahn,
Azarakhsh Jalalvand,
Jong-Kyu Park,
Nikolas Christopher Logan,
Andrew Oakleigh Nelson,
Yong-Su Na,
Raffi Nazikian,
Robert Wilcox,
Rongjie Hong,
Terry Rhodes,
Carlos Paz-Soldan,
YoungMu Jeon,
MinWoo Kim,
WongHa Ko,
JongHa Lee,
Alexander Battey,
Alessandro Bortolon,
Joseph Snipes,
Egemen Kolemen
Abstract:
The path of tokamak fusion and ITER is maintaining high-performance plasma to produce sufficient fusion power. This effort is hindered by the transient energy burst arising from the instabilities at the boundary of high-confinement plasmas. The application of 3D magnetic perturbations is the method in ITER and possibly in future fusion power plants to suppress this instability and avoid energy bus…
▽ More
The path of tokamak fusion and ITER is maintaining high-performance plasma to produce sufficient fusion power. This effort is hindered by the transient energy burst arising from the instabilities at the boundary of high-confinement plasmas. The application of 3D magnetic perturbations is the method in ITER and possibly in future fusion power plants to suppress this instability and avoid energy busts damaging the device. Unfortunately, the conventional use of the 3D field in tokamaks typically leads to degraded fusion performance and an increased risk of other plasma instabilities, two severe issues for reactor implementation. In this work, we present an innovative 3D field optimization, exploiting machine learning, real-time adaptability, and multi-device capabilities to overcome these limitations. This integrated scheme is successfully deployed on DIII-D and KSTAR tokamaks, consistently achieving reactor-relevant core confinement and the highest fusion performance without triggering damaging instabilities or bursts while demonstrating ITER-relevant automated 3D optimization for the first time. This is enabled both by advances in the physics understanding of self-organized transport in the plasma edge and by advances in machine-learning technology, which is used to optimize the 3D field spectrum for automated management of a volatile and complex system. These findings establish real-time adaptive 3D field optimization as a crucial tool for ITER and future reactors to maximize fusion performance while simultaneously minimizing damage to machine components.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets
Authors:
Dai Quoc Tran,
Armstrong Aboah,
Yuntae Jeon,
Maged Shoman,
Minsoo Park,
Seunghee Park
Abstract:
This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle…
▽ More
This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle density. Traditional monitoring methods, which rely on static cameras with narrow fields of view, are ineffective in dynamic urban environments, necessitating the installation of multiple cameras, which raises costs. Fisheye lenses, which were recently introduced, provide wide and omnidirectional coverage in a single frame, making them a transformative solution. However, issues such as distorted views and blurriness arise, preventing accurate object detection on these images. Motivated by these challenges, this study proposes a novel approach that combines a ransformer-based image enhancement framework and ensemble learning technique to address these challenges and improve traffic monitoring accuracy, making significant contributions to the future of intelligent traffic management systems. Our proposed methodological framework won 5th place in the 2024 AI City Challenge, Track 4, with an F1 score of 0.5965 on experimental validation data. The experimental results demonstrate the effectiveness, efficiency, and robustness of the proposed system. Our code is publicly available at https://github.com/daitranskku/AIC2024-TRACK4-TEAM15.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation
Authors:
Yejin Jeon,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas…
▽ More
Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and FastSpeech variants show substantial pausing errors when applied to the Korean language, which affects speech perception and naturalness. In order to address the aforementioned issues, we propose a novel framework that incorporates comprehensive modeling of both syntactic and acoustic cues that are associated with pausing patterns. Remarkably, our framework possesses the capability to consistently generate natural speech even for considerably more extended and intricate out-of-domain (OOD) sentences, despite its training on short audio clips. Architectural design choices are validated through comparisons with baseline models and ablation studies using subjective and objective metrics, thus confirming model performance.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Teaching AI the Anatomy Behind the Scan: Addressing Anatomical Flaws in Medical Image Segmentation with Learnable Prior
Authors:
Young Seok Jeon,
Hongfei Yang,
Huazhu Fu,
Mengling Feng
Abstract:
Imposing key anatomical features, such as the number of organs, their shapes and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening the effective receptive field (ERF) size with data-intensive modules, or introducing anatomical constraints that scales poorly to multi-organ segmentation. We intr…
▽ More
Imposing key anatomical features, such as the number of organs, their shapes and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening the effective receptive field (ERF) size with data-intensive modules, or introducing anatomical constraints that scales poorly to multi-organ segmentation. We introduce a novel architecture called the Anatomy-Informed Cascaded Segmentation Network (AIC-Net). AIC-Net incorporates a learnable input termed "Anatomical Prior", which can be adapted to patient-specific anatomy using a differentiable spatial deformation. The deformed prior later guides decoder layers towards more anatomy-informed predictions. We repeat this process at a local patch level to enhance the representation of intricate objects, resulting in a cascaded network structure. AIC-Net is a general method that enhances any existing segmentation models to be more anatomy-aware. We have validated the performance of AIC-Net, with various backbones, on two multi-organ segmentation tasks: abdominal organs and vertebrae. For each respective task, our benchmarks demonstrate improved dice score and Hausdorff distance.
△ Less
Submitted 26 August, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Vector Quantization for Deep-Learning-Based CSI Feedback in Massive MIMO Systems
Authors:
Junyong Shin,
Yujin Kang,
Yo-Seb Jeon
Abstract:
This paper presents a finite-rate deep-learning (DL)-based channel state information (CSI) feedback method for massive multiple-input multiple-output (MIMO) systems. The presented method provides a finite-bit representation of the latent vector based on a vector-quantized variational autoencoder (VQ-VAE) framework while reducing its computational complexity based on shape-gain vector quantization.…
▽ More
This paper presents a finite-rate deep-learning (DL)-based channel state information (CSI) feedback method for massive multiple-input multiple-output (MIMO) systems. The presented method provides a finite-bit representation of the latent vector based on a vector-quantized variational autoencoder (VQ-VAE) framework while reducing its computational complexity based on shape-gain vector quantization. In this method, the magnitude of the latent vector is quantized using a non-uniform scalar codebook with a proper transformation function, while the direction of the latent vector is quantized using a trainable Grassmannian codebook. A multi-rate codebook design strategy is also developed by introducing a codeword selection rule for a nested codebook along with the design of a loss function. Simulation results demonstrate that the proposed method reduces the computational complexity associated with VQ-VAE while improving CSI reconstruction performance under a given feedback overhead.
△ Less
Submitted 12 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication
Authors:
Yongjeong Oh,
Jaehong Jo,
Byonghyo Shim,
Yo-Seb Jeon
Abstract:
In this paper, we present a novel approach for joint activity detection (AD), channel estimation (CE), and data detection (DD) in uplink grant-free non-orthogonal multiple access (NOMA) systems. Our approach employs an iterative and parallel interference removal strategy inspired by parallel interference cancellation (PIC), enhanced with deep learning to jointly tackle the AD, CE, and DD problems.…
▽ More
In this paper, we present a novel approach for joint activity detection (AD), channel estimation (CE), and data detection (DD) in uplink grant-free non-orthogonal multiple access (NOMA) systems. Our approach employs an iterative and parallel interference removal strategy inspired by parallel interference cancellation (PIC), enhanced with deep learning to jointly tackle the AD, CE, and DD problems. Based on this approach, we develop three PIC frameworks, each of which is designed for either coherent or non-coherence schemes. The first framework performs joint AD and CE using received pilot signals in the coherent scheme. Building upon this framework, the second framework utilizes both the received pilot and data signals for CE, further enhancing the performances of AD, CE, and DD in the coherent scheme. The third framework is designed to accommodate the non-coherent scheme involving a small number of data bits, which simultaneously performs AD and DD. Through joint loss functions and interference cancellation modules, our approach supports end-to-end training, contributing to enhanced performances of AD, CE, and DD for both coherent and non-coherent schemes. Simulation results demonstrate the superiority of our approach over traditional techniques, exhibiting enhanced performances of AD, CE, and DD while maintaining lower computational complexity.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Authors:
Yejin Jeon,
Gary Geunbae Lee
Abstract:
This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide…
▽ More
This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide range of scenarios including seen and unseen speakers conversing in seen and unseen lingua, we establish that our proposed model is able to achieve substantial speaker similarity, and is able to generalize to out-of-domain (OOD) cases.
△ Less
Submitted 3 April, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
HearHere: Mitigating Echo Chambers in News Consumption through an AI-based Web System
Authors:
Youngseung Jeon,
Jaehoon Kim,
Sohyun Park,
Yunyong Ko,
Seongeun Ryu,
Sang-Wook Kim,
Kyungsik Han
Abstract:
Considerable efforts are currently underway to mitigate the negative impacts of echo chambers, such as increased susceptibility to fake news and resistance towards accepting scientific evidence. Prior research has presented the development of computer systems that support the consumption of news information from diverse political perspectives to mitigate the echo chamber effect. However, existing…
▽ More
Considerable efforts are currently underway to mitigate the negative impacts of echo chambers, such as increased susceptibility to fake news and resistance towards accepting scientific evidence. Prior research has presented the development of computer systems that support the consumption of news information from diverse political perspectives to mitigate the echo chamber effect. However, existing studies still lack the ability to effectively support the key processes of news information consumption and quantitatively identify a political stance towards the information. In this paper, we present HearHere, an AI-based web system designed to help users accommodate information and opinions from diverse perspectives. HearHere facilitates the key processes of news information consumption through two visualizations. Visualization 1 provides political news with quantitative political stance information, derived from our graph-based political classification model, and users can experience diverse perspectives (Hear). Visualization 2 allows users to express their opinions on specific political issues in a comment form and observe the position of their own opinions relative to pro-liberal and pro-conservative comments presented on a map interface (Here). Through a user study with 94 participants, we demonstrate the feasibility of HearHere in supporting the consumption of information from various perspectives. Our findings highlight the importance of providing political stance information and quantifying users' political status as a means to mitigate political polarization. In addition, we propose design implications for system development, including the consideration of demographics such as political interest and providing users with initiatives.
△ Less
Submitted 29 February, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Follow the Footprints: Self-supervised Traversability Estimation for Off-road Vehicle Navigation based on Geometric and Visual Cues
Authors:
Yurim Jeon,
E In Son,
Seung-Woo Seo
Abstract:
In this study, we address the off-road traversability estimation problem, that predicts areas where a robot can navigate in off-road environments. An off-road environment is an unstructured environment comprising a combination of traversable and non-traversable spaces, which presents a challenge for estimating traversability. This study highlights three primary factors that affect a robot's traver…
▽ More
In this study, we address the off-road traversability estimation problem, that predicts areas where a robot can navigate in off-road environments. An off-road environment is an unstructured environment comprising a combination of traversable and non-traversable spaces, which presents a challenge for estimating traversability. This study highlights three primary factors that affect a robot's traversability in an off-road environment: surface slope, semantic information, and robot platform. We present two strategies for estimating traversability, using a guide filter network (GFN) and footprint supervision module (FSM). The first strategy involves building a novel GFN using a newly designed guide filter layer. The GFN interprets the surface and semantic information from the input data and integrates them to extract features optimized for traversability estimation. The second strategy involves developing an FSM, which is a self-supervision module that utilizes the path traversed by the robot in pre-driving, also known as a footprint. This enables the prediction of traversability that reflects the characteristics of the robot platform. Based on these two strategies, the proposed method overcomes the limitations of existing methods, which require laborious human supervision and lack scalability. Extensive experiments in diverse conditions, including automobiles and unmanned ground vehicles, herbfields, woodlands, and farmlands, demonstrate that the proposed method is compatible for various robot platforms and adaptable to a range of terrains. Code is available at https://github.com/yurimjeon1892/FtFoot.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Authors:
Junhan Kim,
Kyungphil Park,
Chungman Lee,
Ho-young Kim,
Joonyoung Kim,
Yongkweon Jeon
Abstract:
With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are requi…
▽ More
With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are required. As a cost-effective alternative, one-shot PTQ schemes have been proposed. Still, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a very important feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while considering cross-layer dependency to preserve the attention score. Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective
Authors:
Lei Yu,
Meng Han,
Yiming Li,
Changting Lin,
Yao Zhang,
Mingyang Zhang,
Yan Liu,
Haiqin Weng,
Yuseok Jeon,
Ka-Ho Chow,
Stacy Patterson
Abstract:
Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of…
▽ More
Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Network-based Topic Structure Visualization
Authors:
Yeseul Jeon,
Jina Park,
Ick Hoon Jin,
Dongjun Chungc
Abstract:
In the real world, many topics are inter-correlated, making it challenging to investigate their structure and relationships. Understanding the interplay between topics and their relevance can provide valuable insights for researchers, guiding their studies and informing the direction of research. In this paper, we utilize the topic-words distribution, obtained from topic models, as item-response d…
▽ More
In the real world, many topics are inter-correlated, making it challenging to investigate their structure and relationships. Understanding the interplay between topics and their relevance can provide valuable insights for researchers, guiding their studies and informing the direction of research. In this paper, we utilize the topic-words distribution, obtained from topic models, as item-response data to model the structure of topics using a latent space item response model. By estimating the latent positions of topics based on their distances toward words, we can capture the underlying topic structure and reveal their relationships. Visualizing the latent positions of topics in Euclidean space allows for an intuitive understanding of their proximity and associations. We interpret relationships among topics by characterizing each topic based on representative words selected using a newly proposed scoring scheme. Additionally, we assess the maturity of topics by tracking their latent positions using different word sets, providing insights into the robustness of topics. To demonstrate the effectiveness of our approach, we analyze the topic composition of COVID-19 studies during the early stage of its emergence using biomedical literature in the PubMed database. The software and data used in this paper are publicly available at https://github.com/jeon9677/gViz .
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
A Novel Interpretable Fusion Analytic Framework for Investigating Functional Brain Connectivity Differences in Cognitive Impairments
Authors:
Yeseul Jeon,
Jeong-Jae Kim,
SuMin Yu,
Junggu Choi,
Sanghoon Han
Abstract:
Functional magnetic resonance imaging (fMRI) data is characterized by its complexity and high--dimensionality, encompassing signals from various regions of interests (ROIs) that exhibit intricate correlations. Analyzing fMRI data directly proves challenging due to its intricate structure. Nevertheless, ROIs convey crucial information about brain activities through their connections, offering insig…
▽ More
Functional magnetic resonance imaging (fMRI) data is characterized by its complexity and high--dimensionality, encompassing signals from various regions of interests (ROIs) that exhibit intricate correlations. Analyzing fMRI data directly proves challenging due to its intricate structure. Nevertheless, ROIs convey crucial information about brain activities through their connections, offering insights into distinctive brain activity characteristics between different groups. To address this, we propose a cutting-edge interpretable fusion analytic framework that facilitates the identification and understanding of ROI connectivity disparities between two groups, thereby revealing their unique features. Our novel approach encompasses three key steps. Firstly, we construct ROI functional connectivity networks (FCNs) to effectively manage fMRI data. Secondly, employing the FCNs, we utilize a self--attention deep learning model for binary classification, generating an attention distribution that encodes group differences. Lastly, we employ a latent space item-response model to extract group representative ROI features, visualizing these features on the group summary FCNs. We validate the effectiveness of our framework by analyzing four types of cognitive impairments, showcasing its capability to identify significant ROIs contributing to the differences between the two disease groups. This novel interpretable fusion analytic framework holds immense potential for advancing our understanding of cognitive impairments and could pave the way for more targeted therapeutic interventions.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection
Authors:
Chanho Lee,
Jinsu Son,
Hyounguk Shon,
Yunho Jeon,
Junmo Kim
Abstract:
Rotation-equivariance is an essential yet challenging property in oriented object detection. While general object detectors naturally leverage robustness to spatial shifts due to the translation-equivariance of the conventional CNNs, achieving rotation-equivariance remains an elusive goal. Current detectors deploy various alignment techniques to derive rotation-invariant features, but still rely o…
▽ More
Rotation-equivariance is an essential yet challenging property in oriented object detection. While general object detectors naturally leverage robustness to spatial shifts due to the translation-equivariance of the conventional CNNs, achieving rotation-equivariance remains an elusive goal. Current detectors deploy various alignment techniques to derive rotation-invariant features, but still rely on high capacity models and heavy data augmentation with all possible rotations. In this paper, we introduce a Fully Rotation-Equivariant Oriented Object Detector (FRED), whose entire process from the image to the bounding box prediction is strictly equivariant. Specifically, we decouple the invariant task (object classification) and the equivariant task (object localization) to achieve end-to-end equivariance. We represent the bounding box as a set of rotation-equivariant vectors to implement rotation-equivariant localization. Moreover, we utilized these rotation-equivariant vectors as offsets in the deformable convolution, thereby enhancing the existing advantages of spatial adaptation. Leveraging full rotation-equivariance, our FRED demonstrates higher robustness to image-level rotation compared to existing methods. Furthermore, we show that FRED is one step closer to non-axis aligned learning through our experiments. Compared to state-of-the-art methods, our proposed method delivers comparable performance on DOTA-v1.0 and outperforms by 1.5 mAP on DOTA-v1.5, all while significantly reducing the model parameters to 16%.
△ Less
Submitted 22 December, 2023;
originally announced January 2024.
-
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Authors:
Yejin Jeon,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that mode…
▽ More
Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that models decoupled speaker attributes as deviations from the complete audio representation by utilizing the subtraction operation. By eliminating superfluous content information from the speaker representation, our negation scheme not only mitigates content leakage, thereby enhancing synthesis robustness, but also improves speaker fidelity. In addition, to facilitate the learning of diverse speaker attributes, we leverage multi-stream Transformers, which retain multiple hypotheses and instigate a training paradigm akin to ensemble learning. To unify these hypotheses and realize the final speaker representation, we employ attention pooling. Finally, in light of the imperative to generate target text utterances in the desired voice, we adopt adaptive layer normalizations to effectively fuse the previously generated speaker representation with the target text representations, as opposed to mere concatenation of the text and audio modalities. Extensive experiments and validations substantiate the efficacy of our proposed approach in preserving and harnessing speaker-specific attributes vis-`a-vis alternative baseline models.
△ Less
Submitted 5 March, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
DSBplot: Indels in DNA Double-strand Break Repair Experiments
Authors:
Tejasvi Channagiri,
Margherita Maria Ferrari,
Youngkyu Jeon,
Penghao Xu,
Francesca Storici,
Nataša Jonoska
Abstract:
Double-strand breaks (DSBs) in DNA are naturally occurring destructive events in all organisms that may lead to genome instability. Cells employ various repair methods known as non-homologous end joining (NHEJ), microhomology mediated end joining (MMEJ), and homology-directed recombination (HDR). These repair processes may lead to DNA sequence variations (e.g., nucleotide insertions, deletions, an…
▽ More
Double-strand breaks (DSBs) in DNA are naturally occurring destructive events in all organisms that may lead to genome instability. Cells employ various repair methods known as non-homologous end joining (NHEJ), microhomology mediated end joining (MMEJ), and homology-directed recombination (HDR). These repair processes may lead to DNA sequence variations (e.g., nucleotide insertions, deletions, and substitutions) at the location of the break. Studying DNA DSB repair processes often involves the use of high throughput sequencing assays to precisely quantify the sequence variations near the break with software tools. Often methods of assessing and visualizing these data have not taken into account the full complexity of the sequencing data, such as the frequency, type, and position of the sequence variations in a single comprehensive representation. Here we present a method that allows visualization of the overall variation pattern as well as comparison of these patterns among experimental setups.
△ Less
Submitted 11 January, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Authors:
Jihyun Lee,
Yejin Jeon,
Wonjun Lee,
Yunsu Kim,
Gary Geunbae Lee
Abstract:
Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi…
▽ More
Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audio dataset, and test them on actual human speech data. To facilitate evaluation tailored to audio modalities, we introduce a novel PhonemeF1 to capture pronunciation similarity. Experimental results showed that models trained solely on synthetic datasets can generalize their performance to human voice data. By eliminating the dependency on human speech data collection, these insights pave the way for significant practical advancements in audio-based DST. Data and code are available at https://github.com/JihyunLee1/E2E-DST.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Prior-Aware Robust Beam Alignment for Low-SNR Millimeter-Wave Communications
Authors:
Jihun Park,
Yongjeong Oh,
Jaewon Yun,
Seonjung Kim,
Yo-Seb Jeon
Abstract:
This paper presents a robust beam alignment technique for millimeter-wave communications in low signal-to-noise ratio (SNR) environments. The core strategy of our technique is to repeatedly transmit the most probable beam candidates to reduce beam misalignment probability induced by noise. Specifically, for a given beam training overhead, both the selection of candidates and the number of repetiti…
▽ More
This paper presents a robust beam alignment technique for millimeter-wave communications in low signal-to-noise ratio (SNR) environments. The core strategy of our technique is to repeatedly transmit the most probable beam candidates to reduce beam misalignment probability induced by noise. Specifically, for a given beam training overhead, both the selection of candidates and the number of repetitions for each beam candidate are optimized based on channel prior information. To achieve this, a deep neural network is employed to learn the prior probability of the optimal beam at each location. The beam misalignment probability is then analyzed based on the channel prior, forming the basis for an optimization problem aimed at minimizing the analyzed beam misalignment probability. A closed-form solution is derived for a special case with two beam candidates, and an efficient algorithm is developed for general cases with multiple beam candidates. Simulation results using the DeepMIMO dataset demonstrate the superior performance of our technique in dynamic low-SNR communication environments when compared to existing beam alignment techniques.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
On Exact Inversion of DPM-Solvers
Authors:
Seongmin Hong,
Kyeonghyun Lee,
Suh Yoon Jeon,
Hyewon Bae,
Se Young Chun
Abstract:
Diffusion probabilistic models (DPMs) are a key component in modern generative models. DPM-solvers have achieved reduced latency and enhanced quality significantly, but have posed challenges to find the exact inverse (i.e., finding the initial noise from the given image). Here we investigate the exact inversions for DPM-solvers and propose algorithms to perform them when samples are generated by t…
▽ More
Diffusion probabilistic models (DPMs) are a key component in modern generative models. DPM-solvers have achieved reduced latency and enhanced quality significantly, but have posed challenges to find the exact inverse (i.e., finding the initial noise from the given image). Here we investigate the exact inversions for DPM-solvers and propose algorithms to perform them when samples are generated by the first-order as well as higher-order DPM-solvers. For each explicit denoising step in DPM-solvers, we formulated the inversions using implicit methods such as gradient descent or forward step method to ensure the robustness to large classifier-free guidance unlike the prior approach using fixed-point iteration. Experimental results demonstrated that our proposed exact inversion methods significantly reduced the error of both image and noise reconstructions, greatly enhanced the ability to distinguish invisible watermarks and well prevented unintended background changes consistently during image editing. Project page: \url{https://smhongok.github.io/inv-dpm.html}.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Authors:
Yujin Jeon,
Eunsue Choi,
Youngchan Kim,
Yunseong Moon,
Khalid Omer,
Felix Heide,
Seung-Hwan Baek
Abstract:
Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing da…
▽ More
Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although spectro-polarimetric datasets exist, these datasets have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets: trichromatic Stokes images and hyperspectral Stokes images. These novel datasets encompass both linear and circular polarization; they introduce multiple spectral channels; and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research. Dataset and code will be publicly available.
△ Less
Submitted 30 November, 2023; v1 submitted 29 November, 2023;
originally announced November 2023.
-
The Seoul National University AGN Monitoring Project III: H$β$ lag measurements of 32 luminous AGNs and the high-luminosity end of the size--luminosity relation
Authors:
Jong-Hak Woo,
Shu Wang,
Suvendu Rakshit,
Hojin Cho,
Donghoon Son,
Vardha N. Bennert,
Elena Gallo,
Edmund Hodges-Kluck,
Tommaso Treu,
Aaron J. Barth,
Wanjin Cho,
Adi Foord,
Jaehyuk Geum,
Hengxiao Guo,
Yashashree Jadhav,
Yiseul Jeon,
Kyle M. Kabasares,
Won-Suk Kang,
Changseok Kim,
Minjin Kim,
Tae-Woo Kim,
Huynh Anh N. Le,
Matthew A. Malkan,
Amit Kumar Mandal,
Daeseong Park
, et al. (6 additional authors not shown)
Abstract:
We present the main results from a long-term reverberation mapping campaign carried out for the Seoul National University Active Galactic Nuclei (AGN) Monitoring Project. High-quality data were obtained during 2015-2021 for 32 luminous AGNs (i.e., continuum luminosity in the range of $10^{44-46}$ erg s$^{-1}$) at a regular cadence, of 20-30 days for spectroscopy and 3-5 days for photometry. We obt…
▽ More
We present the main results from a long-term reverberation mapping campaign carried out for the Seoul National University Active Galactic Nuclei (AGN) Monitoring Project. High-quality data were obtained during 2015-2021 for 32 luminous AGNs (i.e., continuum luminosity in the range of $10^{44-46}$ erg s$^{-1}$) at a regular cadence, of 20-30 days for spectroscopy and 3-5 days for photometry. We obtain time lag measurements between the variability in the H$β$ emission and the continuum for 32 AGNs; twenty-five of those have the best lag measurements based on our quality assessment, examining correlation strength, and the posterior lag distribution. Our study significantly increases the current sample of reverberation-mapped AGNs, particularly at the moderate to high luminosity end. Combining our results with literature measurements, we derive a H$β$ broad line region size--luminosity relation with a shallower slope than reported in the literature. For a given luminosity, most of our measured lags are shorter than the expectation, implying that single-epoch black hole mass estimators based on previous calibrations could suffer large systematic uncertainties.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Joint Source-Channel Coding for Channel-Adaptive Digital Semantic Communications
Authors:
Joohyuk Park,
Yongjeong Oh,
Seonjung Kim,
Yo-Seb Jeon
Abstract:
In this paper, we propose a novel joint source-channel coding (JSCC) approach for channel-adaptive digital semantic communications. In semantic communication systems with digital modulation and demodulation, robust design of JSCC encoder and decoder becomes challenging not only due to the unpredictable dynamics of channel conditions but also due to diverse modulation orders. To address this challe…
▽ More
In this paper, we propose a novel joint source-channel coding (JSCC) approach for channel-adaptive digital semantic communications. In semantic communication systems with digital modulation and demodulation, robust design of JSCC encoder and decoder becomes challenging not only due to the unpredictable dynamics of channel conditions but also due to diverse modulation orders. To address this challenge, we first develop a new demodulation method which assesses the uncertainty of the demodulation output to improve the robustness of the digital semantic communication system. We then devise a robust training strategy which enhances the robustness and flexibility of the JSCC encoder and decoder against diverse channel conditions and modulation orders. To this end, we model the relationship between the encoder's output and decoder's input using binary symmetric erasure channels and then sample the parameters of these channels from diverse distributions. We also develop a channel-adaptive modulation technique for an inference phase, in order to reduce the communication latency while maintaining task performance. In this technique, we adaptively determine modulation orders for the latent variables based on channel conditions. Using simulations, we demonstrate the superior performance of the proposed JSCC approach for image classification, reconstruction, and retrieval tasks compared to existing JSCC approaches.
△ Less
Submitted 18 March, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Uncertainty Quantification using Simulation Output: Batching as an Inferential Device
Authors:
Yongseok Jeon,
Yi Chu,
Raghu Pasupathy,
Sara Shashaani
Abstract:
We present batching as an omnibus device for uncertainty quantification using simulation output. We consider the classical context of a simulationist performing uncertainty quantification on an estimator $θ_n$ (of an unknown fixed quantity $θ$) using only the output data $(Y_1,Y_2,\ldots,Y_n)$ gathered from a simulation. By uncertainty quantification, we mean approximating the sampling distributio…
▽ More
We present batching as an omnibus device for uncertainty quantification using simulation output. We consider the classical context of a simulationist performing uncertainty quantification on an estimator $θ_n$ (of an unknown fixed quantity $θ$) using only the output data $(Y_1,Y_2,\ldots,Y_n)$ gathered from a simulation. By uncertainty quantification, we mean approximating the sampling distribution of the error $θ_n-θ$ toward: (A) estimating an assessment functional $ψ$, e.g., bias, variance, or quantile; or (B) constructing a $(1-α)$-confidence region on $θ$. We argue that batching is a remarkably simple and effective device for this purpose, and is especially suited for handling dependent output data such as what one frequently encounters in simulation contexts. We demonstrate that if the number of batches and the extent of their overlap are chosen appropriately, batching retains bootstrap's attractive theoretical properties of strong consistency and higher-order accuracy. For constructing confidence regions, we characterize two limiting distributions associated with a Studentized statistic. Our extensive numerical experience confirms theoretical insight, especially about the effects of batch size and batch overlap.
△ Less
Submitted 26 August, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
SplitMAC: Wireless Split Learning over Multiple Access Channels
Authors:
Seonjung Kim,
Yongjeong Oh,
Yo-Seb Jeon
Abstract:
This paper presents a novel split learning (SL) framework, referred to as SplitMAC, which reduces the latency of SL by leveraging simultaneous uplink transmission over multiple access channels. The key strategy is to divide devices into multiple groups and allow the devices within the same group to simultaneously transmit their smashed data and device-side models over the multiple access channels.…
▽ More
This paper presents a novel split learning (SL) framework, referred to as SplitMAC, which reduces the latency of SL by leveraging simultaneous uplink transmission over multiple access channels. The key strategy is to divide devices into multiple groups and allow the devices within the same group to simultaneously transmit their smashed data and device-side models over the multiple access channels. The optimization problem of device grouping to minimize SL latency is formulated, and the benefit of device grouping in reducing the uplink latency of SL is theoretically derived. By examining a two-device grouping case, two asymptotically-optimal algorithms are devised for device grouping in low and high signal-to-noise ratio (SNR) scenarios, respectively, while providing proofs of their optimality. By merging these algorithms, a near-optimal device grouping algorithm is proposed to cover a wide range of SNR. Our SL framework is also extended to consider practical fading channels and to support a general group size. Simulation results demonstrate that our SL framework with the proposed device grouping algorithm is superior to existing SL frameworks in reducing SL latency.
△ Less
Submitted 19 March, 2024; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Artemis: HE-Aware Training for Efficient Privacy-Preserving Machine Learning
Authors:
Yeonsoo Jeon,
Mattan Erez,
Michael Orshansky
Abstract:
Privacy-Preserving ML (PPML) based on Homomorphic Encryption (HE) is a promising foundational privacy technology. Making it more practical requires lowering its computational cost, especially, in handling modern large deep neural networks. Model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is.
We propose Artemis, a highl…
▽ More
Privacy-Preserving ML (PPML) based on Homomorphic Encryption (HE) is a promising foundational privacy technology. Making it more practical requires lowering its computational cost, especially, in handling modern large deep neural networks. Model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is.
We propose Artemis, a highly effective DNN pruning technique for HE-based inference. We judiciously investigate two HE-aware pruning strategies (positional and diagonal) to reduce the number of Rotation operations, which dominate compute time in HE convolution. We find that Pareto-optimal solutions are based fully on diagonal pruning. Artemis' benefits come from coupling DNN training, driven by a novel group Lasso regularization objective, with pruning to maximize HE-specific cost reduction (dominated by the Rotation operations). We show that Artemis improves on prior HE-oriented pruning and can achieve a 1.2-6x improvement when targeting modern convolutional models (ResNet18 and ResNet18) across three datasets.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Hierarchical Network Data Analytics Framework for B5G Network Automation: Design and Implementation
Authors:
Youbin Jeon,
Sangheon Pack
Abstract:
5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduc…
▽ More
5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduct both inference and training tasks, and thus it is difficult to provide the analytics results to NFs in a timely manner for an increased number of analytics requests. In this article, we propose a hierarchical network data analytics framework (H-NDAF) where inference tasks are distributed to multiple leaf NWDAFs and training tasks are conducted at the root NWDAF. Extensive simulation results using open-source software (i.e., free5GC) demonstrate that H-NDAF can provide sufficiently accurate analytics and faster analytics provision time compared to the conventional NWDAF.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Skip-Connected Neural Networks with Layout Graphs for Floor Plan Auto-Generation
Authors:
Yuntae Jeon,
Dai Quoc Tran,
Seunghee Park
Abstract:
With the advent of AI and computer vision techniques, the quest for automated and efficient floor plan designs has gained momentum. This paper presents a novel approach using skip-connected neural networks integrated with layout graphs. The skip-connected layers capture multi-scale floor plan information, and the encoder-decoder networks with GNN facilitate pixel-level probability-based generation…
▽ More
With the advent of AI and computer vision techniques, the quest for automated and efficient floor plan designs has gained momentum. This paper presents a novel approach using skip-connected neural networks integrated with layout graphs. The skip-connected layers capture multi-scale floor plan information, and the encoder-decoder networks with GNN facilitate pixel-level probability-based generation. Validated on the MSD dataset, our approach achieved a 93.9 mIoU score in the 1st CVAAD workshop challenge. Code and pre-trained models are publicly available at https://github.com/yuntaeJ/SkipNet-FloorPlanGe.
△ Less
Submitted 25 September, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Communication-Efficient Federated Learning over Capacity-Limited Wireless Networks
Authors:
Jaewon Yun,
Yongjeong Oh,
Yo-Seb Jeon,
H. Vincent Poor
Abstract:
In this paper, a communication-efficient federated learning (FL) framework is proposed for improving the convergence rate of FL under a limited uplink capacity. The central idea of the proposed framework is to transmit the values and positions of the top-$S$ entries of a local model update for uplink transmission. A lossless encoding technique is considered for transmitting the positions of these…
▽ More
In this paper, a communication-efficient federated learning (FL) framework is proposed for improving the convergence rate of FL under a limited uplink capacity. The central idea of the proposed framework is to transmit the values and positions of the top-$S$ entries of a local model update for uplink transmission. A lossless encoding technique is considered for transmitting the positions of these entries, while a linear transformation followed by the Lloyd-Max scalar quantization is considered for transmitting their values. For an accurate reconstruction of the top-$S$ values, a linear minimum mean squared error method is developed based on the Bussgang decomposition. Moreover, an error feedback strategy is introduced to compensate for both compression and reconstruction errors. The convergence rate of the proposed framework is analyzed for a non-convex loss function with consideration of the compression and reconstruction errors. From the analytical result, the key parameters of the proposed framework are optimized for maximizing the convergence rate for the given capacity. Simulation results on the MNIST and CIFAR-10 datasets demonstrate that the proposed framework outperforms state-of-the-art FL frameworks in terms of classification accuracy under the limited uplink capacity.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Communication-Efficient Split Learning via Adaptive Feature-Wise Compression
Authors:
Yongjeong Oh,
Jaeho Lee,
Christopher G. Brinton,
Yo-Seb Jeon
Abstract:
This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i…
▽ More
This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
The Seoul National University AGN Monitoring Project IV: H$α$ reverberation mapping of 6 AGNs and the H$α$ Size-Luminosity Relation
Authors:
Hojin Cho,
Jong-Hak Woo,
Shu Wang,
Donghoon Son,
Jaejin Shin,
Suvendu Rakshit,
Aaron J. Barth,
Vardha N. Bennert,
Elena Gallo,
Edmund Hodges-Kluck,
Tommaso Treu,
Hyun-Jin Bae,
Wanjin Cho,
Adi Foord,
Jaehyuk Geum,
Yashashree Jadhav,
Yiseul Jeon,
Kyle M. Kabasares,
Daeun Kang,
Wonseok Kang,
Changseok Kim,
Donghwa Kim,
Minjin Kim,
Taewoo Kim,
Huynh Anh N. Le
, et al. (7 additional authors not shown)
Abstract:
The broad line region (BLR) size-luminosity relation has paramount importance for estimating the mass of black holes in active galactic nuclei (AGNs). Traditionally, the size of the H$β$ BLR is often estimated from the optical continuum luminosity at 5100\angstrom{} , while the size of the H$α$ BLR and its correlation with the luminosity is much less constrained. As a part of the Seoul National Un…
▽ More
The broad line region (BLR) size-luminosity relation has paramount importance for estimating the mass of black holes in active galactic nuclei (AGNs). Traditionally, the size of the H$β$ BLR is often estimated from the optical continuum luminosity at 5100\angstrom{} , while the size of the H$α$ BLR and its correlation with the luminosity is much less constrained. As a part of the Seoul National University AGN Monitoring Project (SAMP) which provides six-year photometric and spectroscopic monitoring data, we present our measurements of the H$α$ lags of 6 high-luminosity AGNs. Combined with the measurements for 42 AGNs from the literature, we derive the size-luminosity relations of H$α$ BLR against broad H$α$ and 5100\angstrom{} continuum luminosities. We find the slope of the relations to be $0.61\pm0.04$ and $0.59\pm0.04$, respectively, which are consistent with the \hb{} size-luminosity relation. Moreover, we find a linear relation between the 5100\angstrom{} continuum luminosity and the broad H$α$ luminosity across 7 orders of magnitude. Using these results, we propose a new virial mass estimator based on the H$α$ broad emission line, finding that the previous mass estimates based on the scaling relations in the literature are overestimated by up to 0.7 dex at masses lower than $10^7$~M$_{\odot}$.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Neural 360$^\circ$ Structured Light with Learned Metasurfaces
Authors:
Eunsue Choi,
Gyeongtae Kim,
Jooyeong Yun,
Yujin Jeon,
Junsuk Rho,
Seung-Hwan Baek
Abstract:
Structured light has proven instrumental in 3D imaging, LiDAR, and holographic light projection. Metasurfaces, comprised of sub-wavelength-sized nanostructures, facilitate 180$^\circ$ field-of-view (FoV) structured light, circumventing the restricted FoV inherent in traditional optics like diffractive optical elements. However, extant metasurface-facilitated structured light exhibits sub-optimal p…
▽ More
Structured light has proven instrumental in 3D imaging, LiDAR, and holographic light projection. Metasurfaces, comprised of sub-wavelength-sized nanostructures, facilitate 180$^\circ$ field-of-view (FoV) structured light, circumventing the restricted FoV inherent in traditional optics like diffractive optical elements. However, extant metasurface-facilitated structured light exhibits sub-optimal performance in downstream tasks, due to heuristic pattern designs such as periodic dots that do not consider the objectives of the end application. In this paper, we present neural 360$^\circ$ structured light, driven by learned metasurfaces. We propose a differentiable framework, that encompasses a computationally-efficient 180$^\circ$ wave propagation model and a task-specific reconstructor, and exploits both transmission and reflection channels of the metasurface. Leveraging a first-order optimizer within our differentiable framework, we optimize the metasurface design, thereby realizing neural 360$^\circ$ structured light. We have utilized neural 360$^\circ$ structured light for holographic light projection and 3D imaging. Specifically, we demonstrate the first 360$^\circ$ light projection of complex patterns, enabled by our propagation model that can be computationally evaluated 50,000$\times$ faster than the Rayleigh-Sommerfeld propagation. For 3D imaging, we improve depth-estimation accuracy by 5.09$\times$ in RMSE compared to the heuristically-designed structured light. Neural 360$^\circ$ structured light promises robust 360$^\circ$ imaging and display for robotics, extended-reality systems, and human-computer interactions.
△ Less
Submitted 27 June, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
MIMO Detection under Hardware Impairments: Learning with Noisy Labels
Authors:
Jinman Kwon,
Seunghyeon Jeon,
Yo-Seb Jeon,
H. Vincent Poor
Abstract:
This paper considers a data detection problem in multiple-input multiple-output (MIMO) communication systems with hardware impairments. To address challenges posed by nonlinear and unknown distortion in received signals, two learning-based detection methods, referred to as model-driven and data-driven, are presented. The model-driven method employs a generalized Gaussian distortion model to approx…
▽ More
This paper considers a data detection problem in multiple-input multiple-output (MIMO) communication systems with hardware impairments. To address challenges posed by nonlinear and unknown distortion in received signals, two learning-based detection methods, referred to as model-driven and data-driven, are presented. The model-driven method employs a generalized Gaussian distortion model to approximate the conditional distribution of the distorted received signal. By using the outputs of coarse data detection as noisy training data, the model-driven method avoids the need for additional training overhead beyond traditional pilot overhead for channel estimation. An expectation-maximization algorithm is devised to accurately learn the parameters of the distortion model from noisy training data. To resolve a model mismatch problem in the model-driven method, the data-driven method employs a deep neural network (DNN) for approximating a-posteriori probabilities for each received signal. This method uses the outputs of the model-driven method as noisy labels and therefore does not require extra training overhead. To avoid the overfitting problem caused by noisy labels, a robust DNN training algorithm is devised, which involves a warm-up period, sample selection, and loss correction. Simulation results demonstrate that the two proposed methods outperform existing solutions with the same overhead under various hardware impairment scenarios.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Multimedia Distribution Process Tracking for Android and iOS
Authors:
Yu-Min Jeon,
Won-Mu Heo,
Jong-Min Kim,
Kyounggon Kim
Abstract:
The crime of illegally filming and distributing images or videos worldwide is increasing day by day. With the increasing penetration rate of smartphones, there has been a rise in crimes involving secretly taking pictures of people's bodies and distributing them through messengers. However, little research has been done on these related issue. The crime of distributing media using the world's popul…
▽ More
The crime of illegally filming and distributing images or videos worldwide is increasing day by day. With the increasing penetration rate of smartphones, there has been a rise in crimes involving secretly taking pictures of people's bodies and distributing them through messengers. However, little research has been done on these related issue. The crime of distributing media using the world's popular messengers, WhatsApp and Telegram, is continuously increasing. It is also common to see criminals distributing illegal footage through various messengers to avoid being caught in the investigation network. As these crimes increase, there will continue to be a need for professional investigative personnel, and the time required for criminal investigations will continue to increase. In this paper, we propose a multimedia forensic method for tracking footprints by checking the media information that changes when images and videos shot with a smartphone are transmitted through instant messengers. We have selected 11 of the world's most popular instant messengers and two secure messengers. In addition, we selected the most widely used Android and iOS operating systems for smartphones. Through this study, we were able to confirm that it is possible to trace footprints related to the distribution of instant messengers by analyzing transmitted images and videos. Thus, it was possible to determine which messengers were used to distribute the video when it was transmitted through multiple messengers.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
VisDA 2022 Challenge: Domain Adaptation for Industrial Waste Sorting
Authors:
Dina Bashkirova,
Samarth Mishra,
Diala Lteif,
Piotr Teterwak,
Donghyun Kim,
Fadi Alladkani,
James Akl,
Berk Calli,
Sarah Adel Bargal,
Kate Saenko,
Daehan Kim,
Minseok Seo,
YoungJin Jeon,
Dong-Geol Choi,
Shahaf Ettedgui,
Raja Giryes,
Shady Abu-Hussein,
Binhui Xie,
Shuang Li
Abstract:
Label-efficient and reliable semantic segmentation is essential for many real-life applications, especially for industrial settings with high visual diversity, such as waste sorting. In industrial waste sorting, one of the biggest challenges is the extreme diversity of the input stream depending on factors like the location of the sorting facility, the equipment available in the facility, and the…
▽ More
Label-efficient and reliable semantic segmentation is essential for many real-life applications, especially for industrial settings with high visual diversity, such as waste sorting. In industrial waste sorting, one of the biggest challenges is the extreme diversity of the input stream depending on factors like the location of the sorting facility, the equipment available in the facility, and the time of year, all of which significantly impact the composition and visual appearance of the waste stream. These changes in the data are called ``visual domains'', and label-efficient adaptation of models to such domains is needed for successful semantic segmentation of industrial waste. To test the abilities of computer vision models on this task, we present the VisDA 2022 Challenge on Domain Adaptation for Industrial Waste Sorting. Our challenge incorporates a fully-annotated waste sorting dataset, ZeroWaste, collected from two real material recovery facilities in different locations and seasons, as well as a novel procedurally generated synthetic waste sorting dataset, SynthWaste. In this competition, we aim to answer two questions: 1) can we leverage domain adaptation techniques to minimize the domain gap? and 2) can synthetic data augmentation improve performance on this task and help adapt to changing data distributions? The results of the competition show that industrial waste detection poses a real domain adaptation problem, that domain generalization techniques such as augmentations, ensembling, etc., improve the overall performance on the unlabeled target domain examples, and that leveraging synthetic data effectively remains an open problem. See https://ai.bu.edu/visda-2022/
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Focus or Not: A Baseline for Anomaly Event Detection On the Open Public Places with Satellite Images
Authors:
Yongjin Jeon,
Youngtack Oh,
Doyoung Jeong,
Hyunguk Choi,
Junsik Kim
Abstract:
In recent years, monitoring the world wide area with satellite images has been emerged as an important issue.
Site monitoring task can be divided into two independent tasks; 1) Change Detection and 2) Anomaly Event Detection.
Unlike to change detection research is actively conducted based on the numerous datasets(\eg LEVIR-CD, WHU-CD, S2Looking, xView2 and etc...) to meet up the expectations o…
▽ More
In recent years, monitoring the world wide area with satellite images has been emerged as an important issue.
Site monitoring task can be divided into two independent tasks; 1) Change Detection and 2) Anomaly Event Detection.
Unlike to change detection research is actively conducted based on the numerous datasets(\eg LEVIR-CD, WHU-CD, S2Looking, xView2 and etc...) to meet up the expectations of industries or governments, research on AI models for detecting anomaly events is passively and rarely conducted.
In this paper, we introduce a novel satellite imagery dataset(AED-RS) for detecting anomaly events on the open public places.
AED-RS Dataset contains satellite images of normal and abnormal situations of 8 open public places from all over the world.
Each places are labeled with different criteria based on the difference of characteristics of each places.
With this dataset, we introduce a baseline model for our dataset TB-FLOW, which can be trained in weakly-supervised manner and shows reasonable performance on the AED-RS Dataset compared with the other NF(Normalizing-Flow) based anomaly detection models. Our dataset and code will be publicly open in \url{https://github.com/SIAnalytics/RS_AnomalyDetection.git}.
△ Less
Submitted 4 April, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
A Provably Secure Strong PUF based on LWE: Construction and Implementation
Authors:
Xiaodan Xi,
Ge Li,
Ye Wang,
Yeonsoo Jeon,
Michael Orshansky
Abstract:
We construct a strong PUF with provable security against ML attacks on both classical and quantum computers. The security is guaranteed by the cryptographic hardness of learning decryption functions of public-key cryptosystems, and the hardness of the learning-with-errors (LWE) problem defined on integer lattices. We call our construction the lattice PUF.
We construct lattice PUF with a physical…
▽ More
We construct a strong PUF with provable security against ML attacks on both classical and quantum computers. The security is guaranteed by the cryptographic hardness of learning decryption functions of public-key cryptosystems, and the hardness of the learning-with-errors (LWE) problem defined on integer lattices. We call our construction the lattice PUF.
We construct lattice PUF with a physically obfuscated key and an LWE decryption function block. To allow deployments in different scenarios, we demonstrate designs with different latency-area trade-offs. A compact design uses a highly serialized LFSR and LWE decryption function, while a latency-optimized design uses an unrolled LFSR and a parallel datapath.
We prototype lattice PUF designs with $2^{136}$ challenge-response pairs (CRPs) on a Spartan 6 FPGA. In addition to theoretical security guarantee, we evaluate empirical resistance to the various leading ML techniques: the prediction error remains above $49.76\%$ after $1$ million training CRPs. The resource-efficient design requires only $45$ slices for the PUF logic proper, and $351$ slices for a fuzzy extractor. The latency-optimized design achieves a $148X$ reduction in latency, at a $10X$ increase in PUF hardware utilization. The mean uniformity of PUF responses is $49.98\%$, the mean uniqueness is $50.00\%$, and the mean reliability is $1.26\%$.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
KHAN: Knowledge-Aware Hierarchical Attention Networks for Accurate Political Stance Prediction
Authors:
Yunyong Ko,
Seongeun Ryu,
Soeun Han,
Youngseung Jeon,
Jaehoon Kim,
Sohyun Park,
Kyungsik Han,
Hanghang Tong,
Sang-Wook Kim
Abstract:
The political stance prediction for news articles has been widely studied to mitigate the echo chamber effect -- people fall into their thoughts and reinforce their pre-existing beliefs. The previous works for the political stance problem focus on (1) identifying political factors that could reflect the political stance of a news article and (2) capturing those factors effectively. Despite their e…
▽ More
The political stance prediction for news articles has been widely studied to mitigate the echo chamber effect -- people fall into their thoughts and reinforce their pre-existing beliefs. The previous works for the political stance problem focus on (1) identifying political factors that could reflect the political stance of a news article and (2) capturing those factors effectively. Despite their empirical successes, they are not sufficiently justified in terms of how effective their identified factors are in the political stance prediction. Motivated by this, in this work, we conduct a user study to investigate important factors in political stance prediction, and observe that the context and tone of a news article (implicit) and external knowledge for real-world entities appearing in the article (explicit) are important in determining its political stance. Based on this observation, we propose a novel knowledge-aware approach to political stance prediction (KHAN), employing (1) hierarchical attention networks (HAN) to learn the relationships among words and sentences in three different levels and (2) knowledge encoding (KE) to incorporate external knowledge for real-world entities into the process of political stance prediction. Also, to take into account the subtle and important difference between opposite political stances, we build two independent political knowledge graphs (KG) (i.e., KG-lib and KG-con) by ourselves and learn to fuse the different political knowledge. Through extensive evaluations on three real-world datasets, we demonstrate the superiority of DASH in terms of (1) accuracy, (2) efficiency, and (3) effectiveness.
△ Less
Submitted 4 April, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Exploring the magnetic properties of individual barcode nanowires using wide-field diamond microscopy
Authors:
Jungbae Yoon,
Jun Hwan Moon,
Jugyeong Jeong,
Yu Jin Kim,
Kihwan Kim,
Hee Seong Kang,
Yoo Sang Jeon,
Eunsoo Oh,
Sun Hwa Lee,
Kihoon Han,
Dongmin Lee,
Chul-Ho Lee,
Young Keun Kim,
Donghun Lee
Abstract:
Barcode magnetic nanowires typically comprise a multilayer magnetic structure in a single body with more than one segment type. Interestingly, owing to selective functionalization and novel interactions between the layers, barcode magnetic nanowires have attracted significant attention, particularly in the field of bioengineering. However, an analysis of their magnetic properties at the individual…
▽ More
Barcode magnetic nanowires typically comprise a multilayer magnetic structure in a single body with more than one segment type. Interestingly, owing to selective functionalization and novel interactions between the layers, barcode magnetic nanowires have attracted significant attention, particularly in the field of bioengineering. However, an analysis of their magnetic properties at the individual nanowire level remains challenging. With this background, herein, we investigated the characterization of magnetic nanowires at room temperature under ambient conditions based on magnetic images obtained via wide-field quantum microscopy with nitrogen-vacancy centers in diamond. Consequently, we could extract critical magnetic properties, such as the saturation magnetization and coercivity, of single nanowires by comparing the experimental results with those of micromagnetic simulations. This study opens up the possibility for a versatile characterization method suited to individual magnetic nanowires.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Statistical Power Analysis for Designing Bulk, Single-Cell, and Spatial Transcriptomics Experiments: Review, Tutorial, and Perspectives
Authors:
Hyeongseon Jeon,
Juan Xie,
Yeseul Jeon,
Kyeong Joo Jung,
Arkobrato Gupta,
Won Chang,
Dongjun Chung
Abstract:
Gene expression profiling technologies have been used in various applications such as cancer biology. The development of gene expression profiling has expanded the scope of target discovery in transcriptomic studies, and each technology produces data with distinct characteristics. In order to guarantee biologically meaningful findings using transcriptomic experiments, it is important to consider v…
▽ More
Gene expression profiling technologies have been used in various applications such as cancer biology. The development of gene expression profiling has expanded the scope of target discovery in transcriptomic studies, and each technology produces data with distinct characteristics. In order to guarantee biologically meaningful findings using transcriptomic experiments, it is important to consider various experimental factors in a systematic way through statistical power analysis. In this paper, we review and discuss the power analysis for three types of gene expression profiling technologies from a practical standpoint, including bulk RNA-seq, single-cell RNA-seq, and high-throughput spatial transcriptomics. Specifically, we describe the existing power analysis tools for each research objective for each of the bulk RNA-seq and scRNA-seq experiments, along with recommendations. On the other hand, since there are no power analysis tools for high-throughput spatial transcriptomics at this point, we instead investigate the factors that can influence power analysis.
△ Less
Submitted 7 January, 2023;
originally announced January 2023.
-
Self-Pair: Synthesizing Changes from Single Source for Object Change Detection in Remote Sensing Imagery
Authors:
Minseok Seo,
Hakjin Lee,
Yongjin Jeon,
Junghoon Seo
Abstract:
For change detection in remote sensing, constructing a training dataset for deep learning models is difficult due to the requirements of bi-temporal supervision. To overcome this issue, single-temporal supervision which treats change labels as the difference of two semantic masks has been proposed. This novel method trains a change detector using two spatially unrelated images with corresponding s…
▽ More
For change detection in remote sensing, constructing a training dataset for deep learning models is difficult due to the requirements of bi-temporal supervision. To overcome this issue, single-temporal supervision which treats change labels as the difference of two semantic masks has been proposed. This novel method trains a change detector using two spatially unrelated images with corresponding semantic labels such as building. However, training on unpaired datasets could confuse the change detector in the case of pixels that are labeled unchanged but are visually significantly different. In order to maintain the visual similarity in unchanged area, in this paper, we emphasize that the change originates from the source image and show that manipulating the source image as an after-image is crucial to the performance of change detection. Extensive experiments demonstrate the importance of maintaining visual information between pre- and post-event images, and our method outperforms existing methods based on single-temporal supervision. code is available at https://github.com/seominseok0429/Self-Pair-for-Change-Detection.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Genie: Show Me the Data for Quantization
Authors:
Yongkweon Jeon,
Chungman Lee,
Ho-young Kim
Abstract:
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($μ$ and $σ$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill kn…
▽ More
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($μ$ and $σ$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called Genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach. The code is available at \url{https://github.com/SamsungLabs/Genie}.
△ Less
Submitted 8 August, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
1st Place Solution to NeurIPS 2022 Challenge on Visual Domain Adaptation
Authors:
Daehan Kim,
Minseok Seo,
YoungJin Jeon,
Dong-Geol Choi
Abstract:
The Visual Domain Adaptation(VisDA) 2022 Challenge calls for an unsupervised domain adaptive model in semantic segmentation tasks for industrial waste sorting. In this paper, we introduce the SIA_Adapt method, which incorporates several methods for domain adaptive models. The core of our method in the transferable representation from large-scale pre-training. In this process, we choose a network a…
▽ More
The Visual Domain Adaptation(VisDA) 2022 Challenge calls for an unsupervised domain adaptive model in semantic segmentation tasks for industrial waste sorting. In this paper, we introduce the SIA_Adapt method, which incorporates several methods for domain adaptive models. The core of our method in the transferable representation from large-scale pre-training. In this process, we choose a network architecture that differs from the state-of-the-art for domain adaptation. After that, self-training using pseudo-labels helps to make the initial adaptation model more adaptable to the target domain. Finally, the model soup scheme helped to improve the generalization performance in the target domain. Our method SIA_Adapt achieves 1st place in the VisDA2022 challenge. The code is available on https: //github.com/DaehanKim-Korea/VisDA2022_Winner_Solution.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
DriveFuzz: Discovering Autonomous Driving Bugs through Driving Quality-Guided Fuzzing
Authors:
Seulbae Kim,
Major Liu,
Junghwan "John" Rhee,
Yuseok Jeon,
Yonghwi Kwon,
Chung Hwan Kim
Abstract:
Autonomous driving has become real; semi-autonomous driving vehicles in an affordable price range are already on the streets, and major automotive vendors are actively developing full self-driving systems to deploy them in this decade. Before rolling the products out to the end-users, it is critical to test and ensure the safety of the autonomous driving systems, consisting of multiple layers inte…
▽ More
Autonomous driving has become real; semi-autonomous driving vehicles in an affordable price range are already on the streets, and major automotive vendors are actively developing full self-driving systems to deploy them in this decade. Before rolling the products out to the end-users, it is critical to test and ensure the safety of the autonomous driving systems, consisting of multiple layers intertwined in a complicated way. However, while safety-critical bugs may exist in any layer and even across layers, relatively little attention has been given to testing the entire driving system across all the layers. Prior work mainly focuses on white-box testing of individual layers and preventing attacks on each layer.
In this paper, we aim at holistic testing of autonomous driving systems that have a whole stack of layers integrated in their entirety. Instead of looking into the individual layers, we focus on the vehicle states that the system continuously changes in the driving environment. This allows us to design DriveFuzz, a new systematic fuzzing framework that can uncover potential vulnerabilities regardless of their locations. DriveFuzz automatically generates and mutates driving scenarios based on diverse factors leveraging a high-fidelity driving simulator. We build novel driving test oracles based on the real-world traffic rules to detect safety-critical misbehaviors, and guide the fuzzer towards such misbehaviors through driving quality metrics referring to the physical states of the vehicle.
DriveFuzz has discovered 30 new bugs in various layers of two autonomous driving systems (Autoware and CARLA Behavior Agent) and three additional bugs in the CARLA simulator. We further analyze the impact of these bugs and how an adversary may exploit them as security vulnerabilities to cause critical accidents in the real world.
△ Less
Submitted 25 October, 2022;
originally announced November 2022.