Skip to main content

Showing 1–10 of 10 results for author: Kim, C W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12246  [pdf, other

    cs.LG cs.CL cs.CV

    TroL: Traversal of Layers for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro

    Abstract: Large language and vision models (LLVMs) have been driven by the generalization power of large language models (LLMs) and the advent of visual instruction tuning. Along with scaling them up directly, these models enable LLVMs to showcase powerful vision language (VL) performances by covering diverse tasks via natural language instructions. However, existing open-source LLVMs that perform comparabl… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Code is available in https://github.com/ByungKwanLee/TroL

  2. arXiv:2406.07867  [pdf, other

    cs.CV cs.AI cs.HC

    Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

    Authors: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro

    Abstract: In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corp… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  3. arXiv:2405.15574  [pdf, other

    cs.CV

    Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Chae Won Kim, Beomchan Park, Yong Man Ro

    Abstract: The rapid development of large language and vision models (LLVMs) has been driven by advances in visual instruction tuning. Recently, open-source LLVMs have curated high-quality visual instruction tuning datasets and utilized additional vision encoders or multiple computer vision models in order to narrow the performance gap with powerful closed-source LLVMs. These advancements are attributed to m… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Code is available in https://github.com/ByungKwanLee/Meteor

  4. arXiv:2403.07508  [pdf, other

    cs.CV

    MoAI: Mixture of All Intelligence for Large Language and Vision Models

    Authors: Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

    Abstract: The rise of large language models (LLMs) and instruction tuning has led to the current trend of instruction-tuned large language and vision models (LLVMs). This trend involves either meticulously curating numerous instruction tuning datasets tailored to specific objectives or enlarging LLVMs to manage vast amounts of vision language (VL) data. However, current LLVMs have disregarded the detailed a… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: ECCV 2024. Code available: https://github.com/ByungKwanLee/MoAI

  5. arXiv:2403.04212  [pdf, other

    cs.CL

    Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

    Authors: Seunghee Han, Se Jin Park, Chae Won Kim, Yong Man Ro

    Abstract: Providing emotional support through dialogue systems is becoming increasingly important in today's world, as it can support both mental health and social interactions in many conversation scenarios. Previous works have shown that using persona is effective for generating empathetic and supportive responses. They have often relied on pre-provided persona rather than inferring them during conversati… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP2024

  6. arXiv:2402.11248  [pdf, other

    cs.CV

    CoLLaVO: Crayon Large Language and Vision mOdel

    Authors: Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

    Abstract: The remarkable success of Large Language Models (LLMs) and instruction tuning drives the evolution of Vision Language Models (VLMs) towards a versatile general-purpose model. Yet, it remains unexplored whether current VLMs genuinely possess quality object-level image understanding capabilities determined from 'what objects are in the image?' or 'which object corresponds to a specified bounding box… ▽ More

    Submitted 2 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings. Code available: https://github.com/ByungKwanLee/CoLLaVO

  7. arXiv:2307.12409  [pdf, other

    cs.LG math.OC

    A Machine Learning Approach to Two-Stage Adaptive Robust Optimization

    Authors: Dimitris Bertsimas, Cheol Woo Kim

    Abstract: We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multi… ▽ More

    Submitted 7 December, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

  8. arXiv:2307.12405  [pdf, other

    cs.LG

    Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach

    Authors: Dimitris Bertsimas, Cheol Woo Kim

    Abstract: We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an op… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  9. arXiv:2303.08670  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

    Authors: Minsu Kim, Chae Won Kim, Yong Man Ro

    Abstract: Forced alignment refers to a technology that time-aligns a given transcription with a corresponding speech. However, as the forced alignment technologies have developed using speech audio, they might fail in alignment when the input speech audio is noise-corrupted or is not accessible. We focus on that there is another component that the speech can be inferred from, the speech video (i.e., talking… ▽ More

    Submitted 26 February, 2023; originally announced March 2023.

    Comments: Accepted in AAAI2023

  10. arXiv:1802.05412  [pdf, other

    cs.CR cs.CL

    NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls

    Authors: Chan Woo Kim

    Abstract: As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis, the method of analyzing potentially malicious files, has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfusc… ▽ More

    Submitted 19 May, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: 8 pages, Intel International Science and Engineering Fair Project - SOFT006T