Skip to main content

Showing 1–50 of 1,206 results for author: Kim, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13166  [pdf, other

    cs.HC cs.IR

    Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

    Authors: Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim

    Abstract: With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored h… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to LLM4Eval @ SIGIR 2024 - The First Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval

  2. arXiv:2407.12514  [pdf, other

    cs.CL

    On Initializing Transformers with Pre-trained Embeddings

    Authors: Ha Young Kim, Niranjan Balasubramanian, Byungkon Kang

    Abstract: It has become common practice now to use random initialization schemes, rather than the pre-trained embeddings, when training transformer based models from scratch. Indeed, we find that pre-trained word embeddings from GloVe, and some sub-word embeddings extracted from language models such as T5 and mT5 fare much worse compared to random initialization. This is counter-intuitive given the well-kno… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  3. arXiv:2407.11347  [pdf, other

    cs.CV

    I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM

    Authors: Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim

    Abstract: We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  4. arXiv:2407.10960  [pdf, other

    cs.LG cs.CL cs.DC

    Fast Matrix Multiplications for Lookup Table-Quantized LLMs

    Authors: Han Guo, William Brandon, Radostin Cholakov, Jonathan Ragan-Kelley, Eric P. Xing, Yoon Kim

    Abstract: The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. Howe… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.10164  [pdf, other

    cs.CV

    LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

    Authors: Sanmin Kim, Youngseok Kim, Sihwan Hwang, Hyeonjun Jeong, Dongsuk Kum

    Abstract: Recent advancements in camera-based 3D object detection have introduced cross-modal knowledge distillation to bridge the performance gap with LiDAR 3D detectors, leveraging the precise geometric information in LiDAR point clouds. However, existing cross-modal knowledge distillation methods tend to overlook the inherent imperfections of LiDAR, such as the ambiguity of measurements on distant or occ… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  6. arXiv:2407.09005  [pdf, other

    cs.CV cs.AI eess.IV

    Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset

    Authors: Yongjin Kim, Jinbum Park, Sanha Kang, Hanguen Kim

    Abstract: The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light r… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 11 pages, 9 figures, whitepaper

  7. arXiv:2407.08872  [pdf, other

    cs.CV

    Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

    Authors: Linh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim, Du Yong Kim, Namkoo Ha, Moongu Jeon

    Abstract: This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause rea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.07517  [pdf, other

    eess.IV cs.CV

    Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction

    Authors: Yumin Kim, Gayoon Choi, Seong Jae Hwang

    Abstract: Reducing scan time in Positron Emission Tomography (PET) imaging while maintaining high-quality images is crucial for minimizing patient discomfort and radiation exposure. Due to the limited size of datasets and distribution discrepancy across scanners in medical imaging, fine-tuning in a parameter-efficient and effective manner is on the rise. Motivated by the potential of Parameter-Efficient Fin… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  9. arXiv:2407.07413  [pdf, other

    cs.CL

    KpopMT: Translation Dataset with Terminology for Kpop Fandom

    Authors: JiWoo Kim, Yunsu Kim, JinYeong Bak

    Abstract: While machines learn from existing corpora, humans have the unique capability to establish and accept new language systems. This makes human form unique language systems within social groups. Aligning with this, we focus on a gap remaining in addressing translation challenges within social groups, where in-group members utilize unique terminologies. We propose KpopMT dataset, which aims to fill th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: accepted to LoresMT 2024

  10. arXiv:2407.07071  [pdf, other

    cs.CL cs.AI cs.LG

    Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

    Authors: Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James Glass

    Abstract: When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: The source code is available at https://github.com/voidism/Lookback-Lens

  11. arXiv:2407.05551  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Read, Watch and Scream! Sound Generation from Text and Video

    Authors: Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee

    Abstract: Multimodal generative models have shown impressive advances with the help of powerful diffusion models. Despite the progress, generating sound solely from text poses challenges in ensuring comprehensive scene depiction and temporal alignment. Meanwhile, video-to-sound generation limits the flexibility to prioritize sound synthesis for specific objects within the scene. To tackle these challenges,… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Project page: https://naver-ai.github.io/rewas

  12. arXiv:2407.04833  [pdf, other

    cs.CV cs.AI

    3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition

    Authors: Younggun Kim, Beomsik Cho, Seonghoon Ryoo, Soomok Lee

    Abstract: Adapting deep learning networks for point cloud data recognition in self-driving vehicles faces challenges due to the variability in datasets and sensor technologies, emphasizing the need for adaptive techniques to maintain accuracy across different conditions. In this paper, we introduce the 3D Adaptive Structural Convolution Network (3D-ASCN), a cutting-edge framework for 3D point cloud recognit… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

    ACM Class: I.2.10; I.5.1

  13. arXiv:2407.04271  [pdf, other

    cs.CV cs.AI cs.LG

    Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts

    Authors: Hyunsu Kim, Yegon Kim, Hongseok Yang, Juho Lee

    Abstract: Group Equivariant CNNs (G-CNNs) have shown promising efficacy in various tasks, owing to their ability to capture hierarchical features in an equivariant manner. However, their equivariance is fixed to the symmetry of the whole group, limiting adaptability to diverse partial symmetries in real-world datasets, such as limited rotation symmetry of handwritten digit images and limited color-shift sym… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: ICML2024

  14. arXiv:2407.00693  [pdf, other

    cs.AI cs.CL cs.LG

    BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

    Authors: Gihun Lee, Minchan Jeong, Yujin Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

    Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: under review

  15. arXiv:2406.19276  [pdf, other

    cs.CL

    VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation

    Authors: Yixiao Song, Yekyung Kim, Mohit Iyyer

    Abstract: Existing metrics for evaluating the factuality of long-form text, such as FACTSCORE (Min et al., 2023) and SAFE (Wei et al., 2024), decompose an input text into "atomic claims" and verify each against a knowledge base like Wikipedia. These metrics are not suitable for most generation tasks because they assume that every claim is verifiable (i.e., can plausibly be proven true or false). We address… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  16. arXiv:2406.19102  [pdf, other

    cs.CL cs.AI cs.IR

    Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs

    Authors: Lokesh Mishra, Sohayl Dhibi, Yusik Kim, Cesar Berrospi Ramis, Shubham Gupta, Michele Dolfi, Peter Staar

    Abstract: Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted at the NLP4Climate workshop in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  17. arXiv:2406.18459  [pdf, other

    cs.CV

    DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

    Authors: Younghyun Kim, Geunmin Hwang, Junyu Zhang, Eunbyung Park

    Abstract: Recent surge in large-scale generative models has spurred the development of vast fields in computer vision. In particular, text-to-image diffusion models have garnered widespread adoption across diverse domain due to their potential for high-fidelity image generation. Nonetheless, existing large-scale diffusion models are confined to generate images of up to 1K resolution, which is far from meeti… ▽ More

    Submitted 11 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  18. arXiv:2406.17254  [pdf, other

    cs.CV

    Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

    Authors: Youngmin Kim, Saejin Kim, Hoyeon Moon, Youngjae Yu, Junhyug Noh

    Abstract: Scalp diseases and alopecia affect millions of people around the world, underscoring the urgent need for early diagnosis and management of the disease. However, the development of a comprehensive AI-based diagnosis system encompassing these conditions remains an underexplored domain due to the challenges associated with data imbalance and the costly nature of labeling. To address these issues, we… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE Transactions on Medical Imaging (Under Review)

  19. arXiv:2406.17102  [pdf, other

    cs.LG cs.CY

    Achieving Fairness Across Local and Global Models in Federated Learning

    Authors: Disha Makhija, Xing Han, Joydeep Ghosh, Yejin Kim

    Abstract: Achieving fairness across diverse clients in Federated Learning (FL) remains a significant challenge due to the heterogeneity of the data and the inaccessibility of sensitive attributes from clients' private datasets. This study addresses this issue by introducing \texttt{EquiFL}, a novel approach designed to enhance both local and global fairness in federated learning environments. \texttt{EquiFL… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  20. arXiv:2406.16275  [pdf, other

    cs.CL

    Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

    Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

    Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 13 tables, under review

  21. arXiv:2406.15045  [pdf, other

    cs.CL

    Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction

    Authors: Jinge Wu, Zhaolong Wu, Abul Hasan, Yunsoo Kim, Jason P. Y. Cheung, Teng Zhang, Honghan Wu

    Abstract: This study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, dec… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.12904  [pdf, other

    cs.LG physics.comp-ph physics.optics

    Meent: Differentiable Electromagnetic Simulator for Machine Learning

    Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

    Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: under review

  23. arXiv:2406.12311  [pdf, other

    cs.LG

    Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

    Authors: Dongwon Jo, Taesu Kim, Yulhwa Kim, Jae-Joon Kim

    Abstract: Binarization, which converts weight parameters to binary values, has emerged as an effective strategy to reduce the size of large language models (LLMs). However, typical binarization techniques significantly diminish linguistic effectiveness of LLMs. To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS). Unlike conventional methods, BinaryMoS empl… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.11784  [pdf, other

    cs.CL cs.AI

    MDCR: A Dataset for Multi-Document Conditional Reasoning

    Authors: Peter Baile Chen, Yi Zhang, Chunwei Liu, Sejal Gupta, Yoon Kim, Michael Cafarella

    Abstract: The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned cond… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11313  [pdf, other

    cs.CV

    Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection

    Authors: Yecheol Kim, Junho Lee, Changsoo Park, Hyoung won Kim, Inho Lim, Christopher Chang, Jun Won Choi

    Abstract: 3D object detection is crucial for applications like autonomous driving and robotics. However, in real-world environments, variations in sensor data distribution due to sensor upgrades, weather changes, and geographic differences can adversely affect detection performance. Semi-Supervised Domain Adaptation (SSDA) aims to mitigate these challenges by transferring knowledge from a source domain, abu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The code is available at: https://github.com/rasd3/TODA

  26. arXiv:2406.11210  [pdf, other

    cs.CV

    Zero-Shot Scene Change Detection

    Authors: Kyusik Cho, Dong Yeop Kim, Euntai Kim

    Abstract: We present a novel, training-free approach to scene change detection. Our method leverages tracking models, which inherently perform change detection between consecutive frames of video by identifying common objects and detecting new or missing objects. Specifically, our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Preprint. Under review

  27. arXiv:2406.10920  [pdf, other

    math.OC cs.AI cs.LG math.NA

    Hamilton-Jacobi Based Policy-Iteration via Deep Operator Learning

    Authors: Jae Yong Lee, Yeoneung Kim

    Abstract: The framework of deep operator network (DeepONet) has been widely exploited thanks to its capability of solving high dimensional partial differential equations. In this paper, we incorporate DeepONet with a recently developed policy iteration scheme to numerically solve optimal control problems and the corresponding Hamilton--Jacobi--Bellman (HJB) equations. A notable feature of our approach is th… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

    MSC Class: 68T20; 68U07; 35F21; 49L12; 49L25

  28. arXiv:2406.10521  [pdf, other

    cs.LG cs.AI

    MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

    Authors: Yaobin Ling, Xiaoqian Jiang, Yejin Kim

    Abstract: In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data can address this, but existing models typically require substantial amounts of data to train effectively, contradicting our objective to solve data scarcity. To… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  29. arXiv:2406.09103  [pdf, other

    cs.CL

    Chain-of-Though (CoT) prompting strategies for medical error detection and correction

    Authors: Zhaolong Wu, Abul Hasan, Jinge Wu, Yunsoo Kim, Jason P. Y. Cheung, Teng Zhang, Honghan Wu

    Abstract: This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: accepted as NAACL workshop

  30. arXiv:2406.08528  [pdf, other

    cs.CV cs.LG

    Adaptive Teaching with Shared Classifier for Knowledge Distillation

    Authors: Jaeyeon Jang, Young-Ik Kim, Jisu Lim, Hyeonseong Lee

    Abstract: Knowledge distillation (KD) is a technique used to transfer knowledge from an overparameterized teacher network to a less-parameterized student network, thereby minimizing the incurred performance loss. KD methods can be categorized into offline and online approaches. Offline KD leverages a powerful pretrained teacher network, while online KD allows the teacher network to be adjusted dynamically t… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  31. arXiv:2406.08292  [pdf, other

    cs.CV

    Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

    Authors: Dongsu Zhang, Francis Williams, Zan Gojcic, Karsten Kreis, Sanja Fidler, Young Min Kim, Amlan Kar

    Abstract: We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Gener… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 as highlight

  32. arXiv:2406.06484  [pdf, ps, other

    cs.LG cs.CL

    Parallelizing Linear Transformers with the Delta Rule over Sequence Length

    Authors: Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim

    Abstract: Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers especially on tasks that require in-context retrieval. While more expressive variants of linear transformers which replace the additive outer-product updat… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Preprint

  33. arXiv:2406.06331  [pdf, other

    cs.CL cs.AI

    MedExQA: Medical Question Answering Benchmark with Multiple Explanations

    Authors: Yunsoo Kim, Jinge Wu, Yusuf Abdulle, Honghan Wu

    Abstract: This paper introduces MedExQA, a novel benchmark in medical question-answering, to evaluate large language models' (LLMs) understanding of medical knowledge through explanations. By constructing datasets across five distinct medical specialties that are underrepresented in current datasets and further incorporating multiple explanations for each question-answer pair, we address a major gap in curr… ▽ More

    Submitted 3 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 BioNLP Workshop

  34. arXiv:2406.04625  [pdf, other

    cs.CL cs.AI

    Key-Element-Informed sLLM Tuning for Document Summarization

    Authors: Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

    Abstract: Remarkable advances in large language models (LLMs) have enabled high-quality text summarization. However, this capability is currently accessible only through LLMs of substantial size or proprietary LLMs with usage fees. In response, smaller-scale LLMs (sLLMs) of easy accessibility and low costs have been extensively studied, yet they often suffer from missing key information and entities, i.e.,… ▽ More

    Submitted 25 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  35. arXiv:2406.02989  [pdf, other

    cs.RO cs.AI

    Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

    Authors: Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm, Jeongsoo Park, Jemin Hwangbo

    Abstract: For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves m… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Robotics and Automation Letters (RA-L), First two authors contributed equally

  36. arXiv:2406.02657  [pdf, other

    cs.CL cs.AI cs.LG

    Block Transformer: Global-to-Local Language Modeling for Fast Inference

    Authors: Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik Jo, Yireun Kim, Tal Schuster, Adam Fisch, James Thorne, Se-Young Yun

    Abstract: This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inferenc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures, 5 tables

  37. arXiv:2406.01920  [pdf, other

    cs.CV cs.AI

    CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

    Authors: Junho Kim, Hyunjun Kim, Yeonju Kim, Yong Man Ro

    Abstract: Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://ivy-lvlm.github.io/CODE/

  38. Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

    Authors: Jungmin Yun, Mihyeon Kim, Youngbin Kim

    Abstract: Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning a… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2023 Findings

  39. arXiv:2406.00303  [pdf, other

    cs.CL cs.AI

    Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

    Authors: Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

    Abstract: The evaluation of summary quality encompasses diverse dimensions such as consistency, coherence, relevance, and fluency. However, existing summarization methods often target a specific dimension, facing challenges in generating well-balanced summaries across multiple dimensions. In this paper, we propose multi-objective reinforcement learning tailored to generate balanced summaries across all four… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  40. arXiv:2405.20649  [pdf, other

    cs.CL cs.LG

    Reward-based Input Construction for Cross-document Relation Extraction

    Authors: Byeonghu Na, Suhyeon Jo, Yeongmin Kim, Il-Chul Moon

    Abstract: Relation extraction (RE) is a fundamental task in natural language processing, aiming to identify relations between target entities in text. While many RE methods are designed for a single sentence or document, cross-document RE has emerged to address relations across multiple long documents. Given the nature of long documents in cross-document RE, extracting document embeddings is challenging due… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted at ACL 2024 main conference

  41. arXiv:2405.20574  [pdf, other

    cs.CL cs.AI

    Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

    Authors: Chanjun Park, Hyeonwoo Kim, Dahyun Kim, Seonghwan Cho, Sanghoon Kim, Sukyung Lee, Yungi Kim, Hwalsuk Lee

    Abstract: This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted at ACL 2024 Main

  42. arXiv:2405.20216  [pdf, other

    cs.CV cs.AI cs.LG

    Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback

    Authors: Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee

    Abstract: The generation of high-quality human images through text-to-image (T2I) methods is a significant yet challenging task. Distinct from general image generation, human image synthesis must satisfy stringent criteria related to human pose, anatomy, and alignment with textual prompts, making it particularly difficult to achieve realistic results. Recent advancements in T2I generation based on diffusion… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 28 pages, 18 figures

  43. arXiv:2405.19961  [pdf, other

    cs.LG

    Collective Variable Free Transition Path Sampling with Generative Flow Network

    Authors: Kiyoung Seong, Seonghyun Park, Seonghwan Kim, Woo Youn Kim, Sungsoo Ahn

    Abstract: Understanding transition paths between meta-stable states in molecular systems is fundamental for material design and drug discovery. However, sampling these paths via unbiased molecular dynamics simulations is computationally prohibitive due to the high energy barriers between the meta-stable states. Recent machine learning approaches are often restricted to simple systems or rely on collective v… ▽ More

    Submitted 18 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, 2 tables

  44. arXiv:2405.19691  [pdf, other

    cs.HC

    Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing

    Authors: Minsun Kim, SeonGyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Hyungseung Lim, Jieun Han, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong, Tak Yeon Lee

    Abstract: While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  45. arXiv:2405.19380  [pdf, other

    stat.ML cs.LG eess.SY

    Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

    Authors: Yeoneung Kim, Gihun Kim, Insoon Yang

    Abstract: We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby acc… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 61 pages, 6 figures

  46. arXiv:2405.19183  [pdf, other

    cs.RO

    Conditional Latent ODEs for Motion Prediction in Autonomous Driving

    Authors: Khang Truong Giang, Yongjae Kim, Andrea Finazzi

    Abstract: This paper addresses imitation learning for motion prediction problem in autonomous driving, especially in multi-agent setting. Different from previous methods based on GAN, we present the conditional latent ordinary differential equation (cLODE) to leverage both the generative strength of conditional VAE and the continuous representation of neural ODE. Our network architecture is inspired from th… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Term Project for AI816

  47. arXiv:2405.18148  [pdf, other

    cs.CV cs.AI

    Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation

    Authors: JuneHyoung Kwon, Eunju Lee, Yunsung Cho, YoungBin Kim

    Abstract: Weakly supervised semantic segmentation (WSSS) employing weak forms of labels has been actively studied to alleviate the annotation cost of acquiring pixel-level labels. However, classifiers trained on biased datasets tend to exploit shortcut features and make predictions based on spurious correlations between certain backgrounds and objects, leading to a poor generalization performance. In this p… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to WACV 2024

  48. arXiv:2405.17880  [pdf, other

    cs.LG

    Diffusion Rejection Sampling

    Authors: Byeonghu Na, Yeongmin Kim, Minsang Park, Donghyeok Shin, Wanmo Kang, Il-Chul Moon

    Abstract: Recent advances in powerful pre-trained diffusion models encourage the development of methods to improve the sampling performance under well-trained diffusion models. This paper introduces Diffusion Rejection Sampling (DiffRS), which uses a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep. The proposed method can be viewed as a mechanism tha… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  49. arXiv:2405.17111  [pdf, other

    cs.LG

    Diffusion Bridge AutoEncoders for Unsupervised Representation Learning

    Authors: Yeongmin Kim, Kwanghyeon Lee, Minsang Park, Byeonghu Na, Il-Chul Moon

    Abstract: Diffusion-based representation learning has achieved substantial attention due to its promising capabilities in latent representation and sample generation. Recent studies have employed an auxiliary encoder to identify a corresponding representation from a sample and to adjust the dimensionality of a latent variable z. Meanwhile, this auxiliary structure invokes information split problem because t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  50. arXiv:2405.16861  [pdf, other

    q-bio.BM cs.LG physics.bio-ph

    NCIDiff: Non-covalent Interaction-generative Diffusion Model for Improving Reliability of 3D Molecule Generation Inside Protein Pocket

    Authors: Joongwon Lee, Wonho Zhung, Woo Youn Kim

    Abstract: Advancements in deep generative modeling have changed the paradigm of drug discovery. Among such approaches, target-aware methods that exploit 3D structures of protein pockets were spotlighted for generating ligand molecules with their plausible binding modes. While docking scores superficially assess the quality of generated ligands, closer inspection of the binding structures reveals the inconsi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.