Skip to main content

Showing 1–50 of 1,974 results for author: Kim, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13437  [pdf, other

    cs.CV

    FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions

    Authors: Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak

    Abstract: Robust semantic segmentation under adverse conditions is crucial in real-world applications. To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a novel feature restoration framework for source-free domain adaptation (SFDA) of semantic segmentation to adverse conditions. FREST alternates two steps: (1) lear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.13303  [pdf, other

    cs.LG

    Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Zhe Tang, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Wi-Fi fingerprinting is widely applied for indoor localization due to the widespread availability of Wi-Fi devices. However, traditional methods are not ideal for multi-building and multi-floor environments due to the scalability issues. Therefore, more and more researchers have employed deep learning techniques to enable scalable indoor localization. This paper introduces a novel semi-supervised… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures, under preparation for a journal publication

  3. arXiv:2407.13288  [pdf, other

    cs.LG

    Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

    Authors: Sihao Li, Kyeong Soo Kim, Zhe Tang, Graduate, Jeremy S. Smith

    Abstract: In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localizati… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 5 figures, under review for journal publication

  4. arXiv:2407.12987  [pdf, other

    cs.CV

    ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos

    Authors: Hyolim Kang, Jeongseok Hyun, Joungbin An, Youngjae Yu, Seon Joo Kim

    Abstract: Online Temporal Action Localization (On-TAL) is a critical task that aims to instantaneously identify action instances in untrimmed streaming videos as soon as an action concludes -- a major leap from frame-based Online Action Detection (OAD). Yet, the challenge of detecting overlapping actions is often overlooked even though it is a common scenario in streaming videos. Current methods that can ad… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  5. arXiv:2407.12345  [pdf, other

    cs.CV

    VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

    Authors: Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim

    Abstract: Predicting future trajectories for other road agents is an essential task for autonomous vehicles. Established trajectory prediction methods primarily use agent tracks generated by a detection and tracking system and HD map as inputs. In this work, we propose a novel method that also incorporates visual input from surround-view cameras, allowing the model to utilize visual cues such as human gazes… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  6. arXiv:2407.11668  [pdf, other

    cs.CV

    Learning to Make Keypoints Sub-Pixel Accurate

    Authors: Shinjeong Kim, Marc Pollefeys, Daniel Barath

    Abstract: This work addresses the challenge of sub-pixel accuracy in detecting 2D local features, a cornerstone problem in computer vision. Despite the advancements brought by neural network-based methods like SuperPoint and ALIKED, these modern approaches lag behind classical ones such as SIFT in keypoint localization accuracy due to their lack of sub-pixel precision. We propose a novel network that enhanc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: The European Conference on Computer Vision (2024)

  7. arXiv:2407.11451  [pdf, other

    cs.LG cs.CV

    Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

    Authors: Jaehoon Hahm, Junho Lee, Sunghyun Kim, Joonseok Lee

    Abstract: The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: Forty-first International Conference on Machine Learning (ICML 2024)

  8. arXiv:2407.11375  [pdf, other

    cs.CV

    Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain

    Authors: Hyeon Bae Kim, Yong Hyun Ahn, Seong Tae Kim

    Abstract: Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the mode… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  9. arXiv:2407.11368  [pdf

    cs.CL

    Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

    Authors: Sojung Lucia Kim, Taehong Jang, Joonmo Ahn

    Abstract: This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ACL2024 submitted

  10. arXiv:2407.11347  [pdf, other

    cs.CV

    I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

    Authors: Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim

    Abstract: We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  11. arXiv:2407.10789  [pdf, other

    cs.RO

    Tailoring Solution Accuracy for Fast Whole-body Model Predictive Control of Legged Robots

    Authors: Charles Khazoom, Seungwoo Hong, Matthew Chignoli, Elijah Stanger-Jones, Sangbae Kim

    Abstract: Thanks to recent advancements in accelerating non-linear model predictive control (NMPC), it is now feasible to deploy whole-body NMPC at real-time rates for humanoid robots. However, enforcing inequality constraints in real time for such high-dimensional systems remains challenging due to the need for additional iterations. This paper presents an implementation of whole-body NMPC for legged robot… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  12. arXiv:2407.10542  [pdf, other

    cs.CV cs.AI

    3D Geometric Shape Assembly via Efficient Point Cloud Matching

    Authors: Nahyuk Lee, Juhong Min, Junha Lee, Seungwook Kim, Kanghee Lee, Jaesik Park, Minsu Cho

    Abstract: Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matchin… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

  13. arXiv:2407.10461  [pdf, ps, other

    cs.IT

    Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights

    Authors: Seyong Kim, Jinseok Choi, Wonjae Shin, Namyoon Lee, Jeonghun Park

    Abstract: To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.10164  [pdf, other

    cs.CV

    LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

    Authors: Sanmin Kim, Youngseok Kim, Sihwan Hwang, Hyeonjun Jeong, Dongsuk Kum

    Abstract: Recent advancements in camera-based 3D object detection have introduced cross-modal knowledge distillation to bridge the performance gap with LiDAR 3D detectors, leveraging the precise geometric information in LiDAR point clouds. However, existing cross-modal knowledge distillation methods tend to overlook the inherent imperfections of LiDAR, such as the ambiguity of measurements on distant or occ… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  15. arXiv:2407.09033  [pdf, other

    cs.CV

    Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

    Authors: Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim

    Abstract: In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. T… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.08892  [pdf, other

    cs.CL cs.LG

    Characterizing Prompt Compression Methods for Long Context Inference

    Authors: Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami

    Abstract: Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks thro… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Es-FoMo @ ICML 2024

  17. arXiv:2407.07024  [pdf, other

    cs.CV cs.AI

    Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

    Authors: Jeongseok Hyun, Su Ho Han, Hyolim Kang, Joon-Young Lee, Seon Joo Kim

    Abstract: The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TA… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  18. arXiv:2407.06851  [pdf, other

    cs.CL

    Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders

    Authors: Jinseok Kim, Jaewon Jung, Sangyeop Kim, Sohyung Park, Sungzoon Cho

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawback… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ACL 2024 KnowledgeableLMs workshop paper

  19. arXiv:2407.06204  [pdf, other

    cs.LG cs.CL

    A Survey on Mixture of Experts

    Authors: Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, Jiayi Huang

    Abstract: Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  20. Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition

    Authors: Seungju Kim, Meounggun Jo

    Abstract: Large Language Models (LLMs) have shown promise in Automated Essay Scoring (AES), but their zero-shot and few-shot performance often falls short compared to state-of-the-art models and human raters. However, fine-tuning LLMs for each specific task is impractical due to the variety of essay prompts and rubrics used in real-world educational contexts. This study proposes a novel approach combining L… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 16 pages, 3 figures, Learning @ Scale 2024

  21. arXiv:2407.05186  [pdf, other

    cs.CY cs.SI

    Understanding Political Communication and Political Communicators on Twitch

    Authors: Sangyeon Kim

    Abstract: As new technologies rapidly reshape patterns of political communication, platforms like Twitch are transforming how people consume political information. This entertainment-oriented live streaming platform allows us to observe the impact of technologies such as ``live-streaming'' and ``streaming-chat'' on political communication. Despite its entertainment focus, Twitch hosts a variety of political… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  22. arXiv:2407.04597  [pdf, other

    cs.CV cs.AI

    Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection

    Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi

    Abstract: In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures, 5 tables

  23. arXiv:2407.04280  [pdf, other

    cs.CL cs.SD eess.AS

    LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech

    Authors: Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim

    Abstract: Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for INTERSPEECH 2024

  24. arXiv:2407.04192  [pdf, other

    cs.LG

    KAN-ODEs: Kolmogorov-Arnold Network Ordinary Differential Equations for Learning Dynamical Systems and Hidden Physics

    Authors: Benjamin C. Koenig, Suyong Kim, Sili Deng

    Abstract: Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a Neural Ordinary Differential Equation framework, generalizing their use to the time-dependent and grid-sensitive cases often seen in scientific machine learning applications. The proposed… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 12 pages, 5 figures plus 1 appendix figure, 1 table plus 1 appendix table. B.C.K. and S.K. contributed equally to this work

    ACM Class: I.6.5; G.1.7

  25. arXiv:2407.03563  [pdf, other

    eess.AS cs.CL cs.LG eess.IV

    Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

    Authors: Sungnyun Kim, Kangwook Jang, Sangmin Bae, Hoirin Kim, Se-Young Yun

    Abstract: Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  26. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  27. arXiv:2407.02750  [pdf, other

    cs.CL

    Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

    Authors: Younghun Lee, Sungchul Kim, Ryan A. Rossi, Tong Yu, Xiang Chen

    Abstract: Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Redu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Workshop on Long-Context Foundation Models, Vienna, Austria 2024. arXiv admin note: substantial text overlap with arXiv:2402.14195

  28. Addressing Prediction Delays in Time Series Forecasting: A Continuous GRU Approach with Derivative Regularization

    Authors: Sheo Yon Jhin, Seojin Kim, Noseong Park

    Abstract: Time series forecasting has been an essential field in many different application areas, including economic analysis, meteorology, and so forth. The majority of time series forecasting models are trained using the mean squared error (MSE). However, this training based on MSE causes a limitation known as prediction delay. The prediction delay, which implies the ground-truth precedes the prediction,… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: KDD 2024 accepted paper

  29. arXiv:2407.01073  [pdf, other

    cs.RO

    No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

    Authors: Soojin Woo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  30. arXiv:2407.00693  [pdf, other

    cs.AI cs.CL cs.LG

    BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

    Authors: Gihun Lee, Minchan Jeong, Yujin Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

    Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: under review

  31. arXiv:2407.00006  [pdf, other

    cs.DC cs.CE math.NA

    Adaptive and Parallel Multiscale Framework for Modeling Cohesive Failure in Engineering Scale Systems

    Authors: Sion Kim, Ezra Kissel, Karel Matous

    Abstract: The high computational demands of multiscale modeling necessitate advanced parallel and adaptive strategies. To address this challenge, we introduce an adaptive method that utilizes two microscale models based on an offline database for multiscale modeling of curved interfaces (e.g., adhesive layers). This database employs nonlinear classifiers, developed using Support Vector Machines from microsc… ▽ More

    Submitted 18 April, 2024; originally announced July 2024.

  32. arXiv:2406.19848  [pdf, other

    cs.RO

    3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints

    Authors: Yoonkyu Yoo, Donghwi Jung, Seong-Woo Kim

    Abstract: In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic stru… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  33. arXiv:2406.19575  [pdf

    cs.HC cs.DB cs.PF

    AR-PPF: Advanced Resolution-Based Pixel Preemption Data Filtering for Efficient Time-Series Data Analysis

    Authors: Taewoong Kim, Kukjin Choi, Sungjun Kim

    Abstract: With the advent of automation, many manufacturing industries have transitioned to data-centric methodologies, giving rise to an unprecedented influx of data during the manufacturing process. This data has become instrumental in analyzing the quality of manufacturing process and equipment. Engineers and data analysts, in particular, require extensive time-series data for seasonal cycle analysis. Ho… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 7pages, preprint, '24 Samsung Best Paper Awards

  34. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  35. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  36. arXiv:2406.18551  [pdf, other

    cs.CV cs.GR

    GFFE: G-buffer Free Frame Extrapolation for Low-latency Real-time Rendering

    Authors: Songyin Wu, Deepak Vembar, Anton Sochenov, Selvakumar Panneer, Sungye Kim, Anton Kaplanyan, Ling-Qi Yan

    Abstract: Real-time rendering has been embracing ever-demanding effects, such as ray tracing. However, rendering such effects in high resolution and high frame rate remains challenging. Frame extrapolation methods, which don't introduce additional latency as opposed to frame interpolation methods such as DLSS 3 and FSR 3, boost the frame rate by generating future frames based on previous frames. However, it… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  37. arXiv:2406.17869  [pdf, other

    cs.CV

    Burst Image Super-Resolution with Base Frame Selection

    Authors: Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho

    Abstract: Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: CVPR2024W NTIRE accepted

  38. arXiv:2406.17254  [pdf, other

    cs.CV

    Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

    Authors: Youngmin Kim, Saejin Kim, Hoyeon Moon, Youngjae Yu, Junhyug Noh

    Abstract: Scalp diseases and alopecia affect millions of people around the world, underscoring the urgent need for early diagnosis and management of the disease. However, the development of a comprehensive AI-based diagnosis system encompassing these conditions remains an underexplored domain due to the challenges associated with data imbalance and the costly nature of labeling. To address these issues, we… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE Transactions on Medical Imaging (Under Review)

  39. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  40. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  41. arXiv:2406.16695  [pdf, other

    cs.CV

    Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

    Authors: Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-Hwa Kim, Seungryong Kim

    Abstract: Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  42. arXiv:2406.16175  [pdf, other

    cs.SI cs.CY physics.soc-ph

    The Persistence of Contrarianism on Twitter: Mapping users' sharing habits for the Ukraine war, COVID-19 vaccination, and the 2022 Midterm Elections

    Authors: David Axelrod, Sangyeon Kim, John Paolillo

    Abstract: Empirical studies of online disinformation emphasize matters of public concern such as the COVID-19 pandemic, foreign election interference, and the Russo-Ukraine war, largely in studies that treat the topics separately. Comparatively fewer studies attempt to relate such disparate topics and address the extent to which they share behaviors. In this study, we compare three samples of Twitter data o… ▽ More

    Submitted 28 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  43. arXiv:2406.16042  [pdf, other

    cs.CV

    Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

    Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: The project page is available at https://ku-cvlab.github.io/Diff-ID/

  44. arXiv:2406.15664  [pdf, other

    stat.ML cs.LG

    Flat Posterior Does Matter For Bayesian Transfer Learning

    Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

    Abstract: The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  45. arXiv:2406.15102  [pdf, other

    cs.CV cs.LG

    HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

    Authors: Seonggon Kim, Eunhyeok Park

    Abstract: With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  46. arXiv:2406.13214  [pdf, other

    cs.LG

    Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

    Authors: Sangwoo Seo, Sungwon Kim, Jihyeong Jung, Yoonho Lee, Chanyoung Park

    Abstract: Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal grap… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  47. arXiv:2406.12904  [pdf, other

    cs.LG physics.comp-ph physics.optics

    Meent: Differentiable Electromagnetic Simulator for Machine Learning

    Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

    Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: under review

  48. arXiv:2406.12632  [pdf, other

    eess.IV cs.CV

    Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

    Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

    Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

  49. arXiv:2406.12095  [pdf, other

    cs.CV cs.AI cs.RO

    DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

    Authors: Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

    Abstract: We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  50. arXiv:2406.11280  [pdf, other

    cs.CV

    i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

    Authors: Daechul Ahn, Yura Choi, San Kim, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in le… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Technical report