Zum Hauptinhalt springen

Showing 1–50 of 238 results for author: Cho, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14930  [pdf, other

    cs.CV

    CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring

    Authors: Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon

    Abstract: Video deblurring aims to enhance the quality of restored results in motion-blurred videos by effectively gathering information from adjacent video frames to compensate for the insufficient data in a single blurred frame. However, when faced with consecutively severe motion blur situations, frame-based video deblurring methods often fail to find accurate temporal correspondence among neighboring vi… ▽ More

    Submitted 28 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted in ECCV2024

  2. arXiv:2408.14916  [pdf, other

    cs.CV

    Towards Real-world Event-guided Low-light Video Enhancement and Deblurring

    Authors: Taewoo Kim, Jaeseok Jeong, Hoonhee Cho, Yuhwan Jeong, Kuk-Jin Yoon

    Abstract: In low-light conditions, capturing videos with frame-based cameras often requires long exposure times, resulting in motion blur and reduced visibility. While frame-based motion deblurring and low-light enhancement have been studied, they still pose significant challenges. Event cameras have emerged as a promising solution for improving image quality in low-light environments and addressing motion… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted in ECCV2024

  3. arXiv:2408.14841  [pdf, other

    cs.CV cs.AI

    Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

    Authors: Suhee Yoon, Sanghyu Yoon, Hankook Lee, Ye Seul Sim, Sungik Choi, Kyungeun Lee, Hye-Seung Cho, Woohyung Lim

    Abstract: Out-of-distribution (OOD) detection, which determines whether a given sample is part of the in-distribution (ID), has recently shown promising results through training with synthetic OOD datasets. Nonetheless, existing methods often produce outliers that are considerably distant from the ID, showing limited efficacy for capturing subtle distinctions between ID and OOD. To address these issues, we… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  4. arXiv:2408.09320  [pdf, other

    cs.HC cs.SD eess.AS

    Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality

    Authors: Hyunsung Cho, Alexander Wang, Divya Kartik, Emily Liying Xie, Yukang Yan, David Lindlbauer

    Abstract: Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimi… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: UIST 2024

    ACM Class: H.5.1; H.5.2; H.5.5

  5. arXiv:2408.09049  [pdf, other

    cs.CL cs.AI cs.HC

    Language Models Show Stable Value Orientations Across Diverse Role-Plays

    Authors: Bruce W. Lee, Yeongheon Lee, Hyunsoo Cho

    Abstract: We demonstrate that large language models (LLMs) exhibit consistent value orientations despite adopting diverse personas, revealing a persistent inertia in their responses that remains stable across the variety of roles they are prompted to assume. To systematically explore this phenomenon, we introduce the role-play-at-scale methodology, which involves prompting LLMs with randomized, diverse pers… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.06276  [pdf, other

    cs.CL

    Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation

    Authors: Jieyong Kim, Hyunseo Kim, Hyunjin Cho, SeongKu Kang, Buru Chang, Jinyoung Yeo, Dongha Lee

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of tasks, generating significant interest in their application to recommendation systems. However, existing methods have not fully capitalized on the potential of LLMs, often constrained by limited input information or failing to fully utilize their advanced reasoning capabilities. To… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  7. arXiv:2408.01084  [pdf, other

    cs.CL

    Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

    Authors: Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

    Abstract: When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge a gap between external knowledge and LLM's parametric knowledge. Recent research has been developed to amplify contextual knowledge over the parametric knowledge of LLM with contrastive decoding approaches. While these approaches could yield truthful responses w… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  8. arXiv:2407.17770  [pdf, other

    cs.CL

    BotEval: Facilitating Interactive Human Evaluation

    Authors: Hyundong Cho, Thamme Gowda, Yuyang Huang, Zixun Lu, Tianli Tong, Jonathan May

    Abstract: Following the rapid progress in natural language processing (NLP) models, language models are applied to increasingly more complex interactive tasks such as negotiations and conversation moderations. Having human evaluators directly interact with these NLP models is essential for adequately evaluating the performance on such interactive tasks. We develop BotEval, an easily customizable, open-sourc… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: ACL 2024 SDT, 10 pages

  9. arXiv:2407.17453  [pdf, other

    cs.CV

    $VILA^2$: VILA Augmented VILA

    Authors: Yunhao Fang, Ligeng Zhu, Yao Lu, Yan Wang, Pavlo Molchanov, Jang Hyun Cho, Marco Pavone, Song Han, Hongxu Yin

    Abstract: Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). While model architectures and training infrastructures advance rapidly, data curation remains under-explored. When data quantity and quality become a bottleneck, existing work either directly crawls more raw data from the Internet that does not have a guarantee of data quality or distills… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  10. arXiv:2407.11216  [pdf, other

    cs.CV

    Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras

    Authors: Hoonhee Cho, Sung-Hoon Yoon, Hyeokjun Kweon, Kuk-Jin Yoon

    Abstract: Event cameras excel in capturing high-contrast scenes and dynamic objects, offering a significant advantage over traditional frame-based cameras. Despite active research into leveraging event cameras for semantic segmentation, generating pixel-wise dense semantic maps for such challenging scenarios remains labor-intensive. As a remedy, we present EV-WSSS: a novel weakly supervised approach for eve… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  11. arXiv:2407.10831  [pdf, other

    cs.CV

    Temporal Event Stereo via Joint Learning with Stereoscopic Flow

    Authors: Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon

    Abstract: Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the t… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  12. arXiv:2407.10733  [pdf, other

    cs.CV

    Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture

    Authors: Dong-Hee Kim, Sungduk Cho, Hyeonwoo Cho, Chanmin Park, Jinyoung Kim, Won Hwa Kim

    Abstract: In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenge… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 27 pages, 5 figures

  13. arXiv:2407.10703  [pdf, other

    cs.CV

    Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation

    Authors: Yuhwan Jeong, Hoonhee Cho, Kuk-Jin Yoon

    Abstract: Event cameras with high dynamic range ensure scene capture even in low-light conditions. However, night events exhibit patterns different from those captured during the day. This difference causes performance degradation when applying night events to a model trained solely on day events. This limitation persists due to a lack of annotated night events. To overcome the limitation, we aim to allevia… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  14. arXiv:2407.05713  [pdf, other

    cs.CV cs.AI

    Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge

    Authors: Hyunjin Cho, Dong Un Kang, Se Young Chun

    Abstract: Short-term object interaction anticipation is an important task in egocentric video analysis, including precise predictions of future interactions and their timings as well as the categories and positions of the involved active objects. To alleviate the complexity of this task, our proposed method, SOIA-DOD, effectively decompose it into 1) detecting active object and 2) classifying interaction an… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 4 pages

  15. arXiv:2407.03890  [pdf, other

    cs.RO

    Addressing Relative Pose Impact on UWB Localization: Dataset Introduction and Analysis

    Authors: Jun Hyeok Choe, Inwook Shim

    Abstract: UWB has recently gained new attention as an auxiliary sensor in the field of robot localization due to its compactness and ease of distance measurement. Consequently, various UWB-related localization and dataset research have increased. Despite this broad interest, there is a lack of UWB datasets that thoroughly analyze the performance of UWB ranging measurement. To address this issue, our paper i… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 4 pages

  16. arXiv:2406.16535  [pdf, other

    cs.CL cs.AI cs.LG

    Token-based Decision Criteria Are Suboptimal in In-context Learning

    Authors: Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue

    Abstract: In-Context Learning (ICL) typically utilizes classification criteria from probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation. To address this problem, we propose Hidden Calibration, which renounces token probabilities and u… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 21 pages, 14 figures, 8 tables

  17. arXiv:2406.16275  [pdf, other

    cs.CL

    Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

    Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

    Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 13 tables, under review

  18. arXiv:2406.15225  [pdf, other

    cs.AI cs.RO eess.SP

    Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

    Authors: Jiyong Oh, Syed M. Raza, Lusungu J. Mwasinga, Moonseong Kim, Hyunseung Choo

    Abstract: Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures, Published in the 2024 IEEE Network Operations and Management Symposium (NOMS 2024)

  19. arXiv:2406.07923  [pdf, other

    cs.SD cs.AI eess.AS

    CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting

    Authors: Sichen Jin, Youngmoon Jung, Seungjin Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho

    Abstract: This paper introduces a novel approach for streaming openvocabulary keyword spotting (KWS) with text-based keyword enrollment. For every input frame, the proposed method finds the optimal alignment ending at the frame using connectionist temporal classification (CTC) and aggregates the frame-level acoustic embedding (AE) to obtain higher-level (i.e., character, word, or phrase) AE that aligns with… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.06111  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

    Authors: Hyunjae Cho, Junhyeok Lee, Wonbin Jung

    Abstract: Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent alia… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  21. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  22. arXiv:2406.05314  [pdf, other

    eess.AS cs.AI eess.SP

    Relational Proxy Loss for Audio-Text based Keyword Spotting

    Authors: Youngmoon Jung, Seungjin Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho

    Abstract: In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes text input during the enrollment phase and audio input during actual usage, we call this task audio-text based KWS. To enable this task, both acoustic and text encoders are typically trained using deep… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, Accepted by Interspeech 2024

  23. arXiv:2406.02596  [pdf, other

    cs.LG cs.AI

    Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

    Authors: Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, Clare Lyle

    Abstract: This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  24. arXiv:2406.01468  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding Token Probability Encoding in Output Embeddings

    Authors: Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue

    Abstract: In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in outp… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages, 17 figures, 3 tables

  25. arXiv:2405.20671  [pdf, other

    cs.LG cs.AI cs.CL

    Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

    Authors: Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun

    Abstract: Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absol… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 73 pages, 20 figures, 90 tables

  26. arXiv:2405.15737  [pdf

    cs.SE

    More Insight from Being More Focused: Analysis of Clustered Market Apps

    Authors: Maleknaz Nayebi, Homayoon Farrahi, Ada Lee, Henry Cho, Guenther Ruhe

    Abstract: The increasing attraction of mobile apps has inspired researchers to analyze apps from different perspectives. As with any software product, apps have different attributes such as size, content maturity, rating, category, or number of downloads. Current research studies mostly consider sampling across all apps. This often results in comparisons of apps being quite different in nature and category… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Authors pre-print

  27. arXiv:2405.07414  [pdf, other

    cs.LG cs.AI

    Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

    Authors: Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

    Abstract: The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self… ▽ More

    Submitted 13 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 18 pages (including supplementary materials)

  28. arXiv:2405.03685  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Language-Image Models with 3D Understanding

    Authors: Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

    Abstract: Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Project page: https://janghyuncho.github.io/Cube-LLM

  29. Multi-intent-aware Session-based Recommendation

    Authors: Minjin Choi, Hye-young Kim, Hyunsouk Cho, Jongwuk Lee

    Abstract: Session-based recommendation (SBR) aims to predict the following item a user will interact with during an ongoing session. Most existing SBR models focus on designing sophisticated neural-based encoders to learn a session representation, capturing the relationship among session items. However, they tend to focus on the last item, neglecting diverse user intents that may exist within a session. Thi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: SIGIR 2024. 5 pages

  30. arXiv:2404.17598  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    Revealing and Utilizing In-group Favoritism for Graph-based Collaborative Filtering

    Authors: Hoin Jung, Hyunsoo Cho, Myungje Choi, Joowon Lee, Jung Ho Park, Myungjoo Kang

    Abstract: When it comes to a personalized item recommendation system, It is essential to extract users' preferences and purchasing patterns. Assuming that users in the real world form a cluster and there is common favoritism in each cluster, in this work, we introduce Co-Clustering Wrapper (CCW). We compute co-clusters of users and items with co-clustering algorithms and add CF subnetworks for each cluster… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 7 pages, 6 figures

  31. arXiv:2404.10355  [pdf, other

    cs.AR

    AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs

    Authors: Sungjun Cho, Beomjun Kim, Hyunuk Cho, Gyeongseob Seo, Onur Mutlu, Myungsuk Kim, Jisung Park

    Abstract: This work investigates a new erase scheme in NAND flash memory to improve the lifetime and performance of modern solid-state drives (SSDs). In NAND flash memory, an erase operation applies a high voltage (e.g., > 20 V) to flash cells for a long time (e.g., > 3.5 ms), which degrades cell endurance and potentially delays user I/O requests. While a large body of prior work has proposed various techni… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at Proceedings of the 29th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

  32. arXiv:2404.09717  [pdf, other

    cs.CL cs.AI cs.LG

    Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model

    Authors: Hyunsoo Cho

    Abstract: Many recent studies endeavor to improve open-source language models through imitation learning, and re-training on the synthetic instruction data from state-of-the-art proprietary models like ChatGPT and GPT-4. However, the innate nature of synthetic data inherently contains noisy data, giving rise to a substantial presence of low-quality data replete with erroneous responses, and flawed reasoning… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Under review @ *ACL

  33. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  34. arXiv:2403.18771  [pdf, other

    cs.CL

    CheckEval: Robust Evaluation Framework using Large Language Model via Checklist

    Authors: Yukyung Lee, Joonghoon Kim, Jaehee Kim, Hyowon Cho, Pilsung Kang

    Abstract: We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: HEAL at CHI 2024

  35. arXiv:2403.17377  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

    Authors: Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim

    Abstract: Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project page is available at https://ku-cvlab.github.io/Perturbed-Attention-Guidance

  36. arXiv:2403.09022  [pdf, ps, other

    cs.IT eess.SP

    Smart Resource Allocation at mmWave/THz Frequencies with Cooperative Rate-Splitting

    Authors: Hyesang Cho, Junil Choi

    Abstract: In this paper, we propose algorithms to minimize the energy consumption in millimeter wave/terahertz multi-user downlink communication systems. To ensure coverage in blockage-vulnerable high frequency systems, we consider cooperative rate-splitting (CRS) and transmission over multiple time blocks, where via CRS, multiple users cooperate to assist a blocked user. Moreover, we show that transmission… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures, accepted to IEEE Transactions on Wireless Communications (TWC)

  37. MineXR: Mining Personalized Extended Reality Interfaces

    Authors: Hyunsung Cho, Yukang Yan, Kashyap Todi, Mark Parent, Missie Smith, Tanya R. Jonker, Hrvoje Benko, David Lindlbauer

    Abstract: Extended Reality (XR) interfaces offer engaging user experiences, but their effective design requires a nuanced understanding of user behavior and preferences. This knowledge is challenging to obtain without the widespread adoption of XR devices. We introduce MineXR, a design mining workflow and data analysis platform for collecting and analyzing personalized XR user interaction and experience dat… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 17 pages, 18 figures, Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

    ACM Class: H.5.2

  38. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  39. arXiv:2403.02966  [pdf, other

    cs.CL cs.AI cs.LG

    Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

    Authors: Sungho Ko, Hyunjin Cho, Hyungjoo Chae, Jinyoung Yeo, Dongha Lee

    Abstract: Recent studies have investigated utilizing Knowledge Graphs (KGs) to enhance Quesetion Answering (QA) performance of Large Language Models (LLMs), yet structured KG verbalization remains challengin. Existing methods, such as triple-form or free-form textual conversion of triple-form facts, encounter several issues. These include reduced evidence density due to duplicated entities or relationships,… ▽ More

    Submitted 19 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  40. Personalizing Smart Home Privacy Protection With Individuals' Regulatory Focus: Would You Preserve or Enhance Your Information Privacy?

    Authors: Reza Ghaiumy Anaraky, Yao Li, Hichang Cho, Danny Yuxing Huang, Kaileigh A. Byrne, Bart Knijnenburg, Oded Nov

    Abstract: In this study, we explore the effectiveness of persuasive messages endorsing the adoption of a privacy protection technology (IoT Inspector) tailored to individuals' regulatory focus (promotion or prevention). We explore if and how regulatory fit (i.e., tuning the goal-pursuit mechanism to individuals' internal regulatory focus) can increase persuasion and adoption. We conducted a between-subject… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Journal ref: ACM Conference on Human Factors in Computing Systems (CHI2024)

  41. arXiv:2402.17323  [pdf, other

    cs.CV

    SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

    Authors: Junsu Kim, Hoseong Cho, Jihyeon Kim, Yihalem Yimolal Tiruneh, Seungryul Baek

    Abstract: In the field of class incremental learning (CIL), generative replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the complexities of scenes involving multiple labels. In this pape… ▽ More

    Submitted 7 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accept to CVPR 2024. The camera-ready version

  42. arXiv:2402.17275  [pdf, other

    cs.CV

    One-Shot Structure-Aware Stylized Image Synthesis

    Authors: Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong

    Abstract: While GAN-based models have been successful in image stylization tasks, they often struggle with structure preservation while stylizing a wide range of input images. Recently, diffusion models have been adopted for image stylization but still lack the capability to maintain the original quality of input images. Building on this, we propose OSASIS: a novel one-shot stylization method that is robust… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  43. arXiv:2402.15180  [pdf, other

    cs.LG cs.CL cs.CR

    Break the Breakout: Reinventing LM Defense Against Jailbreak Attacks with Self-Refinement

    Authors: Heegyu Kim, Sehyun Yuk, Hyunsouk Cho

    Abstract: Caution: This paper includes offensive words that could potentially cause unpleasantness. Language models (LMs) are vulnerable to exploitation for adversarial misuse. Training LMs for safety alignment is extensive and makes it hard to respond to fast-developing attacks immediately, such as jailbreaks. We propose self-refine with formatting that achieves outstanding safety even in non-safety-aligne… ▽ More

    Submitted 26 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: under review

  44. arXiv:2402.14395  [pdf, other

    cs.CV

    Semantic Image Synthesis with Unconditional Generator

    Authors: Jungwoo Chae, Hyunin Cho, Sooyeon Go, Kyungmook Choi, Youngjung Uh

    Abstract: Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2023, Project Page: https://hhyunn2.github.io/SIS_UncondG/

  45. arXiv:2402.13211  [pdf, other

    cs.CL

    Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

    Authors: Dongjin Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have sug… ▽ More

    Submitted 5 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  46. arXiv:2402.10475  [pdf, other

    math.OC cs.LG

    Fundamental Benefit of Alternating Updates in Minimax Optimization

    Authors: Jaewook Lee, Hanseul Cho, Chulhee Yun

    Abstract: The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice… ▽ More

    Submitted 15 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024 (Spotlight). 76 pages, 2 figures. Additional experiments (quadratic game, GAN) and proofs

  47. Making a prototype of Seoul historical sites chatbot using Langchain

    Authors: Jae Young Suh, Minsoo Kwak, Soo Yong Kim, Hyoungseo Cho

    Abstract: In this paper, we are going to share a draft of the development of a conversational agent created to disseminate information about historical sites located in the Seoul. The primary objective of the agent is to increase awareness among visitors who are not familiar with Seoul, about the presence and precise locations of valuable cultural heritage sites. It aims to promote a basic understanding of… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 4 pages, 4 figures, draft

  48. arXiv:2402.04625  [pdf, other

    cs.CV

    Noise Map Guidance: Inversion with Spatial Context for Real Image Editing

    Authors: Hansam Cho, Jonghyun Lee, Seoung Bum Kim, Tae-Hyun Oh, Yonghyun Jeong

    Abstract: Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images. However, their application to editing real images often encounters hurdles primarily due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity. Null-text Inversion (NTI) has made strides in this area, but it fails to c… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  49. arXiv:2402.03277  [pdf, other

    cs.IR

    Event-based Product Carousel Recommendation with Query-Click Graph

    Authors: Luyi Ma, Nimesh Sinha, Parth Vajge, Jason HD Cho, Sushant Kumar, Kannan Achan

    Abstract: Many current recommender systems mainly focus on the product-to-product recommendations and user-to-product recommendations even during the time of events rather than modeling the typical recommendations for the target event (e.g., festivals, seasonal activities, or social activities) without addressing the multiple aspects of the shopping demands for the target event. Product recommendations for… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures, 2021 IEEE International Conference on Big Data (Big Data)

  50. arXiv:2401.17005  [pdf, other

    cs.AR

    SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation

    Authors: Wontak Han, Hyunjun Cho, Donghyuk Kim, Joo-Young Kim

    Abstract: Text generation is a compelling sub-field of natural language processing, aiming to generate human-readable text from input words. In particular, the decoder-only generative models, such as generative pre-trained transformer (GPT), are widely used for text generation, with two major computational stages: summarization and generation. Unlike the summarization stage, which can process the input toke… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures