Skip to main content

Showing 1–50 of 206 results for author: Kang, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11562  [pdf, other

    cs.RO cs.AI cs.LG

    RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

    Authors: Fatemeh Zargarbashi, Jin Cheng, Dongho Kang, Robert Sumner, Stelian Coros

    Abstract: This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle th… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 15 pages

  2. arXiv:2407.11406  [pdf, other

    cs.CL

    Revisiting the Impact of Pursuing Modularity for Code Generation

    Authors: Deokyeong Kang, Ki Jung Seo, Taeuk Kim

    Abstract: Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impa… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  3. arXiv:2407.10193  [pdf, other

    cs.CV

    GRAPE: Generalizable and Robust Multi-view Facial Capture

    Authors: Jing Li, Di Kang, Zhenyu He

    Abstract: Deep learning-based multi-view facial capture methods have shown impressive accuracy while being several orders of magnitude faster than a traditional mesh registration pipeline. However, the existing systems (e.g. TEMPEH) are strictly restricted to inference on the data captured by the same camera array used to capture their training data. In this study, we aim to improve the generalization abili… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  4. arXiv:2407.05713  [pdf, other

    cs.CV cs.AI

    Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge

    Authors: Hyunjin Cho, Dong Un Kang, Se Young Chun

    Abstract: Short-term object interaction anticipation is an important task in egocentric video analysis, including precise predictions of future interactions and their timings as well as the categories and positions of the involved active objects. To alleviate the complexity of this task, our proposed method, SOIA-DOD, effectively decompose it into 1) detecting active object and 2) classifying interaction an… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 4 pages

  5. arXiv:2407.04280  [pdf, other

    cs.CL cs.SD eess.AS

    LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech

    Authors: Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim

    Abstract: Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for INTERSPEECH 2024

  6. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  7. arXiv:2407.02918  [pdf, other

    cs.CV eess.IV

    Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction

    Authors: Jiaxin Guo, Jiangliu Wang, Di Kang, Wenzhen Dong, Wenting Wang, Yun-hui Liu

    Abstract: Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3D… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  8. arXiv:2406.18675  [pdf, other

    cs.HC cs.AI cs.CL

    Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants

    Authors: Minhwa Lee, Zae Myung Kim, Vivek Khetan, Dongyeop Kang

    Abstract: Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific wri… ▽ More

    Submitted 15 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to CHI 2024 In2Writing Workshop

  9. arXiv:2406.11280  [pdf, other

    cs.CV

    i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

    Authors: Daechul Ahn, Yura Choi, San Kim, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in le… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Technical report

  10. arXiv:2406.06874  [pdf, other

    cs.AI cs.HC cs.RO

    Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback

    Authors: Chenliang Li, Siliang Zeng, Zeyi Liao, Jiaxiang Li, Dongyeop Kang, Alfredo Garcia, Mingyi Hong

    Abstract: Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a s… ▽ More

    Submitted 19 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2406.01637  [pdf, other

    cs.MA cs.AI

    Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

    Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang

    Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). I… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  12. arXiv:2405.17764  [pdf, other

    cs.CL cs.AI math.ST

    On the Sequence Evaluation based on Stochastic Processes

    Authors: Tianhao Zhang, Zhexiao Lin, Zhecheng Sheng, Chen Jiang, Dongyeop Kang

    Abstract: Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-bas… ▽ More

    Submitted 15 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  13. arXiv:2405.00130  [pdf, other

    eess.IV cs.CV cs.LG

    A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention

    Authors: Amarjeet Kumar, Hongxu Jiang, Muhammad Imran, Cyndi Valdes, Gabriela Leon, Dahyun Kang, Parvathi Nataraj, Yuyin Zhou, Michael D. Weiss, Wei Shao

    Abstract: Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is f… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  14. arXiv:2404.19026  [pdf, other

    cs.CV

    MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

    Authors: Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

    Abstract: Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gau… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project page: https://conallwang.github.io/MeGA_Pages/

  15. arXiv:2404.11557  [pdf, other

    cs.RO

    Spatio-Temporal Motion Retargeting for Quadruped Robots

    Authors: Taerim Yoon, Dongho Kang, Seungmin Kim, Minsung Ahn, Stelian Coros, Sungjoon Choi

    Abstract: This work introduces a motion retargeting approach for legged robots, which aims to create motion controllers that imitate the fine behavior of animals. Our approach, namely spatio-temporal motion retargeting (STMR), guides imitation learning procedures by transferring motion from source to target, effectively bridging the morphological disparities by ensuring the feasibility of imitation on the t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 34 pages, 7 figures, videos/code available at https://terry97-guel.github.io/STMR-RL.github.io/

  16. arXiv:2404.09451  [pdf, other

    cs.CV

    Contrastive Mean-Shift Learning for Generalized Category Discovery

    Authors: Sua Choi, Dahyun Kang, Minsu Cho

    Abstract: We address the problem of generalized category discovery (GCD) that aims to partition a partially labeled collection of images; only a small part of the collection is labeled and the total number of target classes is unknown. To address this generalized image clustering problem, we revisit the mean-shift algorithm, i.e., a classic, powerful technique for mode seeking, and incorporate it into a con… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  17. arXiv:2404.09127  [pdf, other

    cs.CL

    Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation

    Authors: Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang

    Abstract: Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose decisions and confidences not only stem from intrinsic beliefs but can also be adjusted through daily observations, existing calibration methods for LLMs focus on estim… ▽ More

    Submitted 10 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted at ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  18. arXiv:2404.08144  [pdf, other

    cs.CR cs.AI

    LLM Agents can Autonomously Exploit One-day Vulnerabilities

    Authors: Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang

    Abstract: LLMs have becoming increasingly powerful, both in their benign and malicious uses. With the increase in capabilities, researchers have been increasingly interested in their ability to exploit cybersecurity vulnerabilities. In particular, recent work has conducted preliminary studies on the ability of LLM agents to autonomously hack websites. However, these studies are limited to simple vulnerabili… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  19. arXiv:2404.04544  [pdf, other

    cs.CV cs.AI

    BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

    Authors: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

    Abstract: Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Project page: https://janeyeon.github.io/beyond-scene

  20. arXiv:2404.04500  [pdf, other

    cs.CR cs.AI cs.CY cs.LG

    Trustless Audits without Revealing Data or Models

    Authors: Suppakit Waiwitlikhit, Ion Stoica, Yi Sun, Tatsunori Hashimoto, Daniel Kang

    Abstract: There is an increasing conflict between business incentives to hide models and data as trade secrets, and the societal need for algorithmic transparency. For example, a rightsholder wishing to know whether their copyrighted works have been used during training must convince the model provider to allow a third party to audit the model and data. Finding a mutually agreeable third party is difficult,… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  21. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  22. arXiv:2403.17611  [pdf, other

    cs.CL cs.AI

    Denoising Table-Text Retrieval for Open-Domain Question Answering

    Authors: Deokhyung Kang, Baikjin Jung, Yunsu Kim, Gary Geunbae Lee

    Abstract: In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in table-text open-domain question answering have two common challenges: firstly, their retrievers can be affected by false-positive labels in training datasets; secondly, they may struggle to provide appropriate evidence for questions that require… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  23. arXiv:2403.15803  [pdf, other

    eess.IV cs.CV

    Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations

    Authors: Ruige Zong, Tao Wang, Chunwang Li, Xinlin Zhang, Yuanbin Chen, Longxuan Zhao, Qixuan Li, Qinquan Gao, Dezhi Kang, Fuxin Lin, Tong Tong

    Abstract: Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  24. arXiv:2403.10555  [pdf, other

    cs.LG cs.AI cs.CV physics.ao-ph

    KARINA: An Efficient Deep Learning Model for Global Weather Forecast

    Authors: Minjong Cheon, Yo-Hwan Choi, Seon-Yu Kang, Yumi Choi, Jeong-Gil Lee, Daehyun Kang

    Abstract: Deep learning-based, data-driven models are gaining prevalence in climate research, particularly for global weather prediction. However, training the global weather data at high resolution requires massive computational resources. Therefore, we present a new model named KARINA to overcome the substantial computational demands typical of this field. This model achieves forecasting accuracy comparab… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  25. arXiv:2403.09611  [pdf, other

    cs.CV cs.CL cs.LG

    MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

    Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

    Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  26. arXiv:2403.08627  [pdf, other

    stat.ML cs.CE cs.LG

    Multifidelity linear regression for scientific machine learning from scarce data

    Authors: Elizabeth Qian, Dayoung Kang, Vignesh Sella, Anirban Chaudhuri

    Abstract: Machine learning (ML) methods, which fit to data the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems for which traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data on which to train ML models is expensive, and the… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  27. arXiv:2403.08187  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

    Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

    Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 12 pages, 2 figures

    ACM Class: I.2.7

  28. arXiv:2403.04893  [pdf, other

    cs.AI

    A Safe Harbor for AI Evaluation and Red Teaming

    Authors: Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

    Abstract: Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensio… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  29. arXiv:2403.02691  [pdf, other

    cs.CL cs.CR

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    Authors: Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang

    Abstract: Recent work has embodied LLMs as agents, allowing them to access tools, perform actions, and interact with external content (e.g., emails or websites). However, external content introduces the risk of indirect prompt injection (IPI) attacks, where malicious instructions are embedded within the content processed by LLMs, aiming to manipulate these agents into executing detrimental actions against u… ▽ More

    Submitted 25 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 28 pages, 5 figures, 9 tables

  30. arXiv:2402.14146  [pdf, other

    cs.CL

    Reinforcement Learning with Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation

    Authors: Karin de Langis, Ryan Koo, Dongyeop Kang

    Abstract: Style is an integral component of text that expresses a diverse set of information, including interpersonal dynamics (e.g. formality) and the author's emotions or attitudes (e.g. disgust). Humans often employ multiple styles simultaneously. An open question is how large language models can be explicitly controlled so that they weave together target styles when generating text: for example, to prod… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  31. arXiv:2402.13211  [pdf, other

    cs.CL

    Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

    Authors: Dongjin Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have sug… ▽ More

    Submitted 5 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  32. arXiv:2402.12509  [pdf, other

    cs.RO

    Talk Through It: End User Directed Manipulation Learning

    Authors: Carl Winge, Adam Imdieke, Bahaa Aldeeb, Dongyeop Kang, Karthik Desingh

    Abstract: Training generalist robot agents is an immensely difficult feat due to the requirement to perform a huge range of tasks in many different environments. We propose selectively training robots based on end-user preferences instead. Given a factory model that lets an end user instruct a robot to perform lower-level actions (e.g. 'Move left'), we show that end users can collect demonstrations using la… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  33. arXiv:2402.12255  [pdf, other

    cs.CL

    Shallow Synthesis of Knowledge in GPT-Generated Texts: A Case Study in Automatic Related Work Composition

    Authors: Anna Martin-Boyle, Aahan Tyagi, Marti A. Hearst, Dongyeop Kang

    Abstract: Numerous AI-assisted scholarly applications have been developed to aid different stages of the research process. We present an analysis of AI-assisted scholarly writing generated with ScholaCite, a tool we built that is designed for organizing literature and composing Related Work sections for academic papers. Our evaluation method focuses on the analysis of citation graphs to assess the structura… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages, 5 figures, submitted to ACL 2024

  34. arXiv:2402.11532  [pdf, other

    cs.CL

    Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

    Authors: Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang

    Abstract: Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks. In this work, we propose a novel concept of compositional instruct… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  35. arXiv:2402.11058  [pdf, other

    cs.CV cs.CL

    II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering

    Authors: Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim

    Abstract: Visual Question Answering (VQA) often involves diverse reasoning scenarios across Vision and Language (V&L). Most prior VQA studies, however, have merely focused on assessing the model's overall accuracy without evaluating it on different reasoning cases. Furthermore, some recent works observe that conventional Chain-of-Thought (CoT) prompting fails to generate effective reasoning for VQA, especia… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 Findings

  36. arXiv:2402.10586  [pdf, other

    cs.CL cs.AI

    Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

    Authors: Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang

    Abstract: With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel metho… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 26 pages, accepted at ACL 2024 (Main)

  37. arXiv:2402.08185  [pdf, other

    cs.AI cs.CV physics.ao-ph

    Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5

    Authors: Minjong Cheon, Daehyun Kang, Yo-Hwan Choi, Seon-Yu Kang

    Abstract: Modern deep learning techniques, which mimic traditional numerical weather prediction (NWP) models and are derived from global atmospheric reanalysis data, have caused a significant revolution within a few years. In this new paradigm, our research introduces a novel strategy that deviates from the common dependence on high-resolution data, which is often constrained by computational resources, and… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  38. arXiv:2402.06664  [pdf, other

    cs.CR cs.AI

    LLM Agents can Autonomously Hack Websites

    Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang

    Abstract: In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves. As a result, these LLMs can now function autonomously as agents. With the rise in capabilities of these agents, recent work has speculated on how LLM agents would affect cybersecurity. However, not much is known abou… ▽ More

    Submitted 15 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  39. arXiv:2402.03746  [pdf, other

    cs.CV

    Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

    Authors: Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume an… ▽ More

    Submitted 17 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  40. arXiv:2401.17807  [pdf, other

    cs.CV cs.GR

    Advances in 3D Generation: A Survey

    Authors: Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, Ying Shan

    Abstract: Generating 3D models lies at the core of computer graphics and has been the focus of decades of research. With the emergence of advanced neural representations and generative models, the field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models. The rapid growth of this field makes it difficult to stay abreast of all recent devel… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 33 pages, 12 figures

  41. arXiv:2401.16553  [pdf, other

    cs.CL cs.AI

    SelectLLM: Can LLMs Select Important Instructions to Annotate?

    Authors: Ritik Sachin Parkar, Jaehyung Kim, Jong Inn Park, Dongyeop Kang

    Abstract: Instruction tuning benefits from large and diverse datasets, however creating such datasets involves a high cost of human labeling. While synthetic datasets generated by large language models (LLMs) have partly solved this issue, they often contain low-quality data. One effective solution is selectively annotating unlabelled instructions, especially given the relative ease of acquiring unlabeled i… ▽ More

    Submitted 17 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: First Authors: Ritik Sachin Parkar and Jaehyung Kim | Second Author: Jong Inn Park | PI: Dongyeop Kang

  42. arXiv:2401.14828  [pdf, other

    cs.CV

    TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

    Authors: Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

    Abstract: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D b… ▽ More

    Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Accpeted by Siggraph 2024 & ACM Transactions on Graphics

  43. arXiv:2401.14698  [pdf, other

    cs.CL cs.AI

    Under the Surface: Tracking the Artifactuality of LLM-Generated Data

    Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee, Zae Myung Kim, Shirley Anugrah Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, Dongyeop Kang

    Abstract: This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant c… ▽ More

    Submitted 30 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Core Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee and Zae Myung Kim | Project lead : Debarati Das | PI : Dongyeop Kang

  44. arXiv:2312.16893  [pdf, other

    cs.CL cs.AI

    BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

    Authors: Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang

    Abstract: Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their ca… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)

  45. arXiv:2312.08961  [pdf, other

    cs.RO

    Contact-Implicit MPC: Controlling Diverse Quadruped Motions Without Pre-Planned Contact Modes or Trajectories

    Authors: Gijeong Kim, Dongyun Kang, Joon-Ha Kim, Seungwoo Hong, Hae-Won Park

    Abstract: This paper presents a contact-implicit model predictive control (MPC) framework for the real-time discovery of multi-contact motions, without predefined contact mode sequences or foothold positions. This approach utilizes the contact-implicit differential dynamic programming (DDP) framework, merging the hard contact model with a linear complementarity constraint. We propose the analytical gradient… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 22 pages, 19 figures, submitted to International Journal of Robotics Research (IJRR)

  46. arXiv:2312.07399  [pdf, other

    cs.CL cs.AI

    Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

    Authors: Taeyoon Kwon, Kai Tzu-iunn Ong, Dongjin Kang, Seungjun Moon, Jeong Ryong Lee, Dosik Hwang, Yongsik Sim, Beomseok Sohn, Dongha Lee, Jinyoung Yeo

    Abstract: Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a "reasoning-aware" diagnosis framew… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  47. arXiv:2311.18654  [pdf, other

    cs.CV cs.AI

    Detailed Human-Centric Text Description-Driven Large Scene Synthesis

    Authors: Gwanghyun Kim, Dong Un Kang, Hoigi Seo, Hayeon Kim, Se Young Chun

    Abstract: Text-driven large scene image synthesis has made significant progress with diffusion models, but controlling it is challenging. While using additional spatial controls with corresponding texts has improved the controllability of large scene synthesis, it is still challenging to faithfully reflect detailed text descriptions without user-provided controls. Here, we propose DetText2Scene, a novel tex… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  48. arXiv:2311.09862  [pdf, other

    cs.CL cs.SI

    Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

    Authors: Debarati Das, Ishaan Gupta, Jaideep Srivastava, Dongyeop Kang

    Abstract: Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints. This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, coupled with prompts to approximate a graph's global connectiv… ▽ More

    Submitted 13 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  49. arXiv:2311.09799  [pdf, other

    cs.CL

    How Far Can We Extract Diverse Perspectives from Large Language Models?

    Authors: Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang

    Abstract: Collecting diverse human opinions is costly and challenging. This leads to a recent trend in collaborative efforts between humans and Large Language Models (LLMs) for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate… ▽ More

    Submitted 18 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  50. arXiv:2311.07215  [pdf, other

    cs.CL cs.SE

    Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback

    Authors: Seungjun Moon, Hyungjoo Chae, Yongho Song, Taeyoon Kwon, Dongjin Kang, Kai Tzu-iunn Ong, Seung-won Hwang, Jinyoung Yeo

    Abstract: Code editing is an essential step towards reliable program synthesis to automatically correct critical errors generated from code LLMs. Recent studies have demonstrated that closed-source LLMs (i.e., ChatGPT and GPT-4) are capable of generating corrective feedback to edit erroneous inputs. However, it remains challenging for open-source code LLMs to generate feedback for code editing, since these… ▽ More

    Submitted 23 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Work in progress