Skip to main content

Showing 1–50 of 156 results for author: Jang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09541  [pdf, other

    cs.CL cs.AI cs.CV

    MATE: Meet At The Embedding -- Connecting Images with Long Texts

    Authors: Young Kyun Jang, Junmo Kang, Yong Jae Lee, Donghyun Kim

    Abstract: While advancements in Vision Language Models (VLMs) have significantly improved the alignment of visual and textual data, these models primarily focus on aligning images with short descriptive captions. This focus limits their ability to handle complex text interactions, particularly with longer texts such as lengthy captions or documents, which have not been extensively explored yet. In this pape… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  2. arXiv:2407.01158  [pdf, other

    cs.CL

    Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

    Authors: Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee

    Abstract: Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlin… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress. Resources are available at https://github.com/youngerous/qtree

  3. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.10809  [pdf, other

    cs.CL cs.AI

    Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations

    Authors: Yoonna Jang, Suhyune Son, Jeongwoo Lee, Junyoung Son, Yuna Hur, Jungwoo Lim, Hyeonseok Moon, Kisu Yang, Heuiseok Lim

    Abstract: Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, e… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at EMNLP 2023

  5. arXiv:2406.10296  [pdf, other

    cs.CL cs.AI cs.CY

    CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer

    Authors: Heeseok Jung, Jaesang Yoo, Yohaan Yoon, Yeonju Jang

    Abstract: Knowledge tracing (KT), wherein students' problem-solving histories are used to estimate their current levels of knowledge, has attracted significant interest from researchers. However, most existing KT models were developed with an ID-based paradigm, which exhibits limitations in cold-start performance. These limitations can be mitigated by leveraging the vast quantities of external knowledge pos… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.08176  [pdf, other

    cs.CV cs.RO

    Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment

    Authors: Taekbeom Lee, Youngseok Jang, H. Jin Kim

    Abstract: Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: RA-L. 8 pages, 8 figures, 4 tables

  7. arXiv:2405.16012  [pdf, other

    cs.LG

    Pessimistic Backward Policy for GFlowNets

    Authors: Hyosoon Jang, Yunhui Jang, Minsu Kim, Jinkyoo Park, Sungsoo Ahn

    Abstract: This paper studies Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function through the trajectory of state transitions. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories, which may lead to a large gap between the estimated flow and the (known) reward valu… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2405.14726  [pdf, other

    cs.CV

    Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval

    Authors: Young Kyun Jang, Donghyun Kim, Ser-nam Lim

    Abstract: ``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models, such as Vision-Language Pre-training (VLP) mod… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2405.14715  [pdf, other

    cs.CV cs.AI

    Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

    Authors: Young Kyun Jang, Ser-nam Lim

    Abstract: Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  10. arXiv:2405.11614  [pdf, other

    cs.CV eess.IV

    Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation

    Authors: Sangyeop Yeo, Yoojin Jang, Jaejun Yoo

    Abstract: In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  11. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  12. arXiv:2405.02066  [pdf, other

    cs.CV eess.IV

    WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

    Authors: Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim

    Abstract: The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representat… ▽ More

    Submitted 11 July, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  13. arXiv:2405.00571  [pdf, other

    cs.CV cs.AI

    Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval

    Authors: Young Kyun Jang, Dat Huynh, Ashish Shah, Wen-Kai Chen, Ser-Nam Lim

    Abstract: Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  14. arXiv:2404.15516  [pdf, other

    cs.CV cs.AI

    Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

    Authors: Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim

    Abstract: Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the o… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 15 pages

  15. arXiv:2404.11916  [pdf, other

    cs.CL cs.AI

    SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up

    Authors: Nakyeong Yang, Junseok Kim, Jiwon Moon, Yunah Jang, Kyomin Jung

    Abstract: Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 6 pages

  16. arXiv:2404.05726  [pdf, other

    cs.CV

    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

    Authors: Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

    Abstract: With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective… ▽ More

    Submitted 24 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024. Project Page https://boheumd.github.io/MA-LMM/

  17. arXiv:2403.14238  [pdf, other

    cs.CL cs.AI

    Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection

    Authors: Kyungjae Lee, Dasol Hwang, Sunghyun Park, Youngsoo Jang, Moontae Lee

    Abstract: Despite the promise of RLHF in aligning LLMs with human preferences, it often leads to superficial alignment, prioritizing stylistic changes over improving downstream performance of LLMs. Underspecified preferences could obscure directions to align the models. Lacking exploration restricts identification of desirable outputs to improve the models. To overcome these challenges, we propose a novel f… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 22 pages, 5 figures, Submitted to ACL 2024

  18. arXiv:2403.05814  [pdf, other

    cs.CL cs.AI

    MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs

    Authors: Yerin Hwang, Yongil Kim, Yunah Jang, Jeesoo Bang, Hyunkyung Bae, Kyomin Jung

    Abstract: Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions.… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 20 pages

  19. arXiv:2402.11057  [pdf, other

    cs.CV

    Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos

    Authors: Shijia Feng, Michael Wray, Brian Sullivan, Youngkyoon Jang, Casimir Ludwig, Iain Gilchrist, Walterio Mayol-Cuevas

    Abstract: Determining when people are struggling from video enables a finer-grained understanding of actions and opens opportunities for building intelligent support visual interfaces. In this paper, we present a new dataset with three assembly activities and corresponding performance baselines for the determination of struggle from video. Three real-world problem-solving activities including assembling plu… ▽ More

    Submitted 28 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  20. arXiv:2402.05965  [pdf, other

    cs.LG eess.SP

    Hybrid Neural Representations for Spherical Data

    Authors: Hyomin Kim, Yunhui Jang, Jaeho Lee, Sungsoo Ahn

    Abstract: In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of h… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 8 figures

  21. arXiv:2401.17343  [pdf, other

    cs.CV cs.AI

    YTCommentQA: Video Question Answerability in Instructional Videos

    Authors: Saelyne Yang, Sunghyun Park, Yunseok Jang, Moontae Lee

    Abstract: Instructional videos provide detailed how-to guides for various tasks, with viewers often posing questions regarding the content. Addressing these questions is vital for comprehending the content, yet receiving immediate answers is difficult. While numerous computational models have been developed for Video Question Answering (Video QA) tasks, they are primarily trained on questions generated base… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: AAAI 2024

  22. arXiv:2401.16808  [pdf, other

    cs.LG cs.AI

    Encoding Temporal Statistical-space Priors via Augmented Representation

    Authors: Insu Choi, Woosung Koh, Gimin Kang, Yuntae Jang, Woo Chang Kim

    Abstract: Modeling time series data remains a pervasive issue as the temporal dimension is inherent to numerous domains. Despite significant strides in time series forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and lack of data continue challenging practitioners. In response, we leverage a simple representation augmentation technique to overcome these challenges. Our augmented rep… ▽ More

    Submitted 3 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: pre-print

  23. arXiv:2401.10032  [pdf, other

    eess.AS cs.AI eess.SP

    FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

    Authors: Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated co… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  24. arXiv:2401.08637  [pdf, other

    cs.DC cs.LG

    Synergy: Towards On-Body AI via Tiny AI Accelerator Collaboration on Wearables

    Authors: Taesik Gong, Si Young Jang, Utku Günay Acer, Fahim Kawsar, Chulhong Min

    Abstract: The advent of tiny artificial intelligence (AI) accelerators enables AI to run at the extreme edge, offering reduced latency, lower power cost, and improved privacy. When integrated into wearable devices, these accelerators open exciting opportunities, allowing various AI apps to run directly on the body. We present Synergy that provides AI apps with best-effort performance via system-driven holis… ▽ More

    Submitted 2 July, 2024; v1 submitted 11 December, 2023; originally announced January 2024.

  25. arXiv:2312.15985  [pdf, ps, other

    cs.LG cs.IT

    Discrete Messages Improve Communication Efficiency among Isolated Intelligent Agents

    Authors: Hang Chen, Yuchuan Jang, Weijie Zhou, Cristian Meo, Ziwei Chen, Dianbo Liu

    Abstract: Individuals, despite having varied life experiences and learning processes, can communicate effectively through languages. This study aims to explore the efficiency of language as a communication medium. We put forth two specific hypotheses: First, discrete messages are more effective than continuous ones when agents have diverse personal experiences. Second, communications using multiple discrete… ▽ More

    Submitted 28 December, 2023; v1 submitted 26 December, 2023; originally announced December 2023.

  26. arXiv:2312.03777  [pdf, other

    cs.CV

    On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

    Authors: Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang, Ser-Nam Lim

    Abstract: Recent advances in instruction tuning have led to the development of State-of-the-Art Large Multimodal Models (LMMs). Given the novelty of these models, the impact of visual adversarial attacks on LMMs has not been thoroughly examined. We conduct a comprehensive study of the robustness of various LMMs against different adversarial attacks, evaluated across tasks including image classification, ima… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  27. arXiv:2312.02230  [pdf, other

    cs.LG cs.AI

    A Simple and Scalable Representation for Graph Generation

    Authors: Yunhui Jang, Seul Lee, Sungsoo Ahn

    Abstract: Recently, there has been a surge of interest in employing neural networks for graph generation, a fundamental statistical learning problem with critical applications like molecule design and community analysis. However, most approaches encounter significant limitations when generating large-scale graphs. This is due to their requirement to output the full adjacency matrices whose size grows quadra… ▽ More

    Submitted 26 March, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: International Conference on Learning Representations (ICLR) 2024

  28. arXiv:2311.13326  [pdf, other

    cs.LG cs.AI q-fin.PM

    Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

    Authors: Woosung Koh, Insu Choi, Yuntae Jang, Gimin Kang, Woo Chang Kim

    Abstract: Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum… ▽ More

    Submitted 12 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: AAAI 2024 AI4TS Workshop Oral

  29. arXiv:2311.09820  [pdf, other

    cs.IR

    IterCQR: Iterative Conversational Query Reformulation with Retrieval Guidance

    Authors: Yunah Jang, Kang-il Lee, Hyunkyung Bae, Hwanhee Lee, Kyomin Jung

    Abstract: Conversational search aims to retrieve passages containing essential information to answer queries in a multi-turn conversation. In conversational search, reformulating context-dependent conversational queries into stand-alone forms is imperative to effectively utilize off-the-shelf retrievers. Previous methodologies for conversational query reformulation frequently depend on human-annotated rewri… ▽ More

    Submitted 8 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  30. arXiv:2311.08439  [pdf, other

    eess.IV cs.CV cs.LG

    A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography

    Authors: Jaeik Jeon, Jiyeon Kim, Yeonggul Jang, Yeonyee E. Yoon, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee, Hyuk-Jae Chang

    Abstract: Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to proc… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  31. arXiv:2310.19581  [pdf, other

    eess.AS cs.CV cs.SD

    Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

    Authors: Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Project page with demo: https://mm.kaist.ac.kr/projects/avdiffuss/

  32. arXiv:2310.16757  [pdf, ps, other

    cs.AR

    All-rounder: A flexible DNN accelerator with diverse data format support

    Authors: Seock-Hwan Noh, Seungpyo Lee, Banseok Shin, Sehun Park, Yongjoo Jang, Jaeha Kung

    Abstract: Recognizing the explosive increase in the use of DNN-based applications, several industrial companies developed a custom ASIC (e.g., Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with it. The ASIC performs operations of the inference or training process of DNN models which are requested by users. Since the DNN models have different data formats and typ… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  33. arXiv:2310.08897  [pdf, other

    eess.IV cs.CV cs.LG

    Self supervised convolutional kernel based handcrafted feature harmonization: Enhanced left ventricle hypertension disease phenotyping on echocardiography

    Authors: Jina Lee, Youngtaek Hong, Dawun Jeong, Yeonggul Jang, Jaeik Jeon, Sihyeon Jeong, Taekgeun Jung, Yeonyee E. Yoon, Inki Moon, Seung-Ah Lee, Hyuk-Jae Chang

    Abstract: Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricul… ▽ More

    Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 11 pages, 7 figures

  34. arXiv:2310.03952  [pdf, other

    cs.CV

    ILSH: The Imperial Light-Stage Head Dataset for Human Head View Synthesis

    Authors: Jiali Zheng, Youngkyoon Jang, Athanasios Papaioannou, Christos Kampouris, Rolandos Alexandros Potamias, Foivos Paraperas Papantoniou, Efstathios Galanakis, Ales Leonardis, Stefanos Zafeiriou

    Abstract: This paper introduces the Imperial Light-Stage Head (ILSH) dataset, a novel light-stage-captured human head dataset designed to support view synthesis academic challenges for human heads. The ILSH dataset is intended to facilitate diverse approaches, such as scene-specific or generic neural rendering, multiple-view geometry, 3D vision, and computer graphics, to further advance the development of p… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: ICCV 2023 Workshop, 9 pages, 6 figures

  35. arXiv:2309.12306  [pdf, other

    cs.CV cs.SD eess.AS

    TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

    Authors: Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You Jin Kim, Youngjoon Jang, Joon Son Chung

    Abstract: The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full se… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  36. arXiv:2309.12304  [pdf, other

    cs.CV

    SlowFast Network for Continuous Sign Language Recognition

    Authors: Junseok Ahn, Youngjoon Jang, Joon Son Chung

    Abstract: The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two-pathway SlowFast network, where each pathway operates at distinct temporal resolutions to separately capture spatial (hand shapes, facial expressions) and dynamic (movements) information. In addition, we introduce two distinct… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  37. arXiv:2309.10339  [pdf, other

    cs.CL

    KoBigBird-large: Transformation of Transformer for Korean Language Understanding

    Authors: Kisu Yang, Yoonna Jang, Taewoo Lee, Jinwoo Seong, Hyungjin Lee, Hwanseok Jang, Heuiseok Lim

    Abstract: This work presents KoBigBird-large, a large size of Korean BigBird that achieves state-of-the-art performance and allows long sequence processing for Korean language understanding. Without further pretraining, we only transform the architecture and extend the positional encoding with our proposed Tapered Absolute Positional Encoding Representations (TAPER). In experiments, KoBigBird-large shows st… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted at IJCNLP-AACL 2023

  38. arXiv:2309.02740  [pdf, other

    cs.CL cs.AI

    Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training

    Authors: Brian Cho, Youngbin Jang, Jaewoong Yoon

    Abstract: Neural based approaches to automatic evaluation of subjective responses have shown superior performance and efficiency compared to traditional rule-based and feature engineering oriented solutions. However, it remains unclear whether the suggested neural solutions are sufficient replacements of human raters as we find recent works do not properly account for rubric items that are essential for aut… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 13 pages

    ACM Class: I.2.7

  39. arXiv:2308.16483  [pdf, other

    eess.SP cs.HC cs.LG

    Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features

    Authors: Jaeik Jeon, Seongmin Ha, Yeonggul Jang, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee Hyuk-Jae Chang

    Abstract: In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou… ▽ More

    Submitted 23 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

  40. arXiv:2306.08013  [pdf, other

    cs.LG cs.AI cs.CV

    TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models

    Authors: Pum Jun Kim, Yoojin Jang, Jisu Kim, Jaejun Yoo

    Abstract: We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Frechet Inception Distance (FID), and the variants of Precision and Recall (P&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  41. arXiv:2306.02728  [pdf, other

    cs.CV

    Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

    Authors: Minjoon Jung, Youwon Jang, Seongho Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang

    Abstract: Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query. This task is prone to suffer the weak visual-textual alignment problem innate in video datasets. Due to the ambiguity, a query does not fully cover the relevant details of the corresponding moment, or the moment may contain misaligned and irrelevant frames, potentially limiting furth… ▽ More

    Submitted 19 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Our code is available at https://github.com/minjoong507/BM-DETR

  42. arXiv:2305.19125  [pdf, other

    cs.LG cs.AI cs.SI

    Graph Generation with $K^2$-trees

    Authors: Yunhui Jang, Dongwoo Kim, Sungsoo Ahn

    Abstract: Generating graphs from a target distribution is a significant challenge across many domains, including drug discovery and social network analysis. In this work, we introduce a novel graph generation method leveraging $K^2$-tree representation, originally designed for lossless graph compression. The $K^2$-tree representation {encompasses inherent hierarchy while enabling compact graph generation}.… ▽ More

    Submitted 26 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: International Conference on Learning Representations (ICLR) 2024

  43. arXiv:2305.14541  [pdf, other

    cs.IT

    Adversarial Channels with O(1)-Bit Partial Feedback

    Authors: Eric Ruzomberka, Yongkyu Jang, David J. Love, H. Vincent Poor

    Abstract: We consider point-to-point communication over $q$-ary adversarial channels with partial noiseless feedback. In this setting, a sender Alice transmits $n$ symbols from a $q$-ary alphabet over a noisy forward channel to a receiver Bob, while Bob sends feedback to Alice over a noiseless reverse channel. In the forward channel, an adversary can inject both symbol errors and erasures up to an error fra… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  44. arXiv:2305.13902  [pdf, other

    cs.RO

    Design and Operation of Autonomous Wheelchair Towing Robot

    Authors: Hyunwoo Kang, Jaeho Shin, Jaewook Shin, Youngseok Jang, Seung Jae Lee

    Abstract: In this study, a new concept of a wheelchair-towing robot for the facile electrification of manual wheelchairs is introduced. The development of this concept includes the design of towing robot hardware and an autonomous driving algorithm to ensure the safe transportation of patients to their intended destinations inside the hospital. We developed a novel docking mechanism to facilitate easy docki… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Submitted to Intelligent Service Robotics

  45. arXiv:2305.10975  [pdf, other

    eess.IV cs.AI cs.CV

    Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation

    Authors: Syed Samiul Alam, Samiul Based Shuvo, Shams Nafisa Ali, Fardeen Ahmed, Arbil Chakma, Yeong Min Jang

    Abstract: Ocular Toxoplasmosis (OT), is a common eye infection caused by T. gondii that can cause vision problems. Diagnosis is typically done through a clinical examination and imaging, but these methods can be complicated and costly, requiring trained personnel. To address this issue, we have created a benchmark study that evaluates the effectiveness of existing pre-trained networks using transfer learnin… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  46. arXiv:2304.09507  [pdf, other

    eess.IV cs.CV

    Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network

    Authors: Yeong Il Jang, Keuntek Lee, Gu Yong Park, Seyun Kim, Nam Ik Cho

    Abstract: There have been many image denoisers using deep neural networks, which outperform conventional model-based methods by large margins. Recently, self-supervised methods have attracted attention because constructing a large real noise dataset for supervised training is an enormous burden. The most representative self-supervised denoisers are based on blind-spot networks, which exclude the receptive f… ▽ More

    Submitted 28 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to ICCV 2023

  47. arXiv:2304.04027  [pdf, other

    eess.IV cs.CV cs.LG

    NeBLa: Neural Beer-Lambert for 3D Reconstruction of Oral Structures from Panoramic Radiographs

    Authors: Sihwa Park, Seongjun Kim, Doeyoung Kwon, Yohan Jang, In-Seok Song, Seung Jun Baek

    Abstract: Panoramic radiography (Panoramic X-ray, PX) is a widely used imaging modality for dental examination. However, PX only provides a flattened 2D image, lacking in a 3D view of the oral structure. In this paper, we propose NeBLa (Neural Beer-Lambert) to estimate 3D oral structures from real-world PX. NeBLa tackles full 3D reconstruction for varying subjects (patients) where each reconstruction is bas… ▽ More

    Submitted 6 February, 2024; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: 18 pages, 16 figures, Accepted to AAAI 2024

  48. arXiv:2304.03275  [pdf, other

    cs.CV

    That's What I Said: Fully-Controllable Talking Face Generation

    Authors: Youngjoon Jang, Kyeongha Rho, Jong-Bin Woo, Hyeongkeun Lee, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities. The second is to navigate a multimodal motion space that only represents motion-related features while eliminating identity information. To disentan… ▽ More

    Submitted 18 September, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

  49. arXiv:2303.13733  [pdf, other

    cs.SE

    SmartMark: Software Watermarking Scheme for Smart Contracts

    Authors: Taeyoung Kim, Yunhee Jang, Chanjong Lee, Hyungjoon Koo, Hyoungshick Kim

    Abstract: Smart contracts are self-executing programs on a blockchain to ensure immutable and transparent agreements without the involvement of intermediaries. Despite the growing popularity of smart contracts for many blockchain platforms like Ethereum, smart contract developers cannot prevent copying their smart contracts from competitors due to the absence of technical means available. However, applying… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: This paper is accepted for publication in ICSE 2023

  50. arXiv:2303.11771  [pdf, other

    cs.CV

    Self-Sufficient Framework for Continuous Sign Language Recognition

    Authors: Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Myungchul Kim, Dong-Jin Kim, In So Kweon, Joon Son Chung

    Abstract: The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations. To this end, we propose (1) Divide and Focus Convolution (DFConv) which extracts both ma… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.