Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Jang, Y K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09541  [pdf, other

    cs.CL cs.AI cs.CV

    MATE: Meet At The Embedding -- Connecting Images with Long Texts

    Authors: Young Kyun Jang, Junmo Kang, Yong Jae Lee, Donghyun Kim

    Abstract: While advancements in Vision Language Models (VLMs) have significantly improved the alignment of visual and textual data, these models primarily focus on aligning images with short descriptive captions. This focus limits their ability to handle complex text interactions, particularly with longer texts such as lengthy captions or documents, which have not been extensively explored yet. In this pape… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  2. arXiv:2405.14726  [pdf, other

    cs.CV

    Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval

    Authors: Young Kyun Jang, Donghyun Kim, Ser-nam Lim

    Abstract: ``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models, such as Vision-Language Pre-training (VLP) mod… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  3. arXiv:2405.14715  [pdf, other

    cs.CV cs.AI

    Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

    Authors: Young Kyun Jang, Ser-nam Lim

    Abstract: Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2405.00571  [pdf, other

    cs.CV cs.AI

    Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval

    Authors: Young Kyun Jang, Dat Huynh, Ashish Shah, Wen-Kai Chen, Ser-Nam Lim

    Abstract: Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  5. arXiv:2404.15516  [pdf, other

    cs.CV cs.AI

    Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

    Authors: Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim

    Abstract: Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the o… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 15 pages

  6. arXiv:2404.05726  [pdf, other

    cs.CV

    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

    Authors: Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

    Abstract: With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective… ▽ More

    Submitted 24 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024. Project Page https://boheumd.github.io/MA-LMM/

  7. arXiv:2312.03777  [pdf, other

    cs.CV

    On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

    Authors: Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang, Ser-Nam Lim

    Abstract: Recent advances in instruction tuning have led to the development of State-of-the-Art Large Multimodal Models (LMMs). Given the novelty of these models, the impact of visual adversarial attacks on LMMs has not been thoroughly examined. We conduct a comprehensive study of the robustness of various LMMs against different adversarial attacks, evaluated across tasks including image classification, ima… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  8. arXiv:2112.08816  [pdf, other

    cs.CV cs.IR

    Deep Hash Distillation for Image Retrieval

    Authors: Young Kyun Jang, Geonmo Gu, Byungsoo Ko, Isaac Kang, Nam Ik Cho

    Abstract: In hash-based image retrieval systems, degraded or transformed inputs usually generate different codes from the original, deteriorating the retrieval accuracy. To mitigate this issue, data augmentation can be applied during training. However, even if augmented samples of an image are similar in real feature space, the quantization can scatter them far away in Hamming space. This results in represe… ▽ More

    Submitted 13 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: ECCV2022

  9. arXiv:2109.02244  [pdf, other

    cs.CV

    Self-supervised Product Quantization for Deep Unsupervised Image Retrieval

    Authors: Young Kyun Jang, Nam Ik Cho

    Abstract: Supervised deep learning-based hash and vector quantization are enabling fast and large-scale image retrieval systems. By fully exploiting label annotations, they are achieving outstanding retrieval performances compared to the conventional methods. However, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle t… ▽ More

    Submitted 12 January, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: ICCV 2021

  10. arXiv:2107.05025  [pdf, other

    cs.CV cs.IR

    Similarity Guided Deep Face Image Retrieval

    Authors: Young Kyun Jang, Nam Ik Cho

    Abstract: Face image retrieval, which searches for images of the same identity from the query input face image, is drawing more attention as the size of the image database increases rapidly. In order to conduct fast and accurate retrieval, a compact hash code-based methods have been proposed, and recently, deep face image hashing methods with supervised classification training have shown outstanding perform… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

    Comments: 10 pages, 9 figures

  11. arXiv:2002.11281  [pdf, other

    cs.CV

    Generalized Product Quantization Network for Semi-supervised Image Retrieval

    Authors: Young Kyun Jang, Nam Ik Cho

    Abstract: Image retrieval methods that employ hashing or vector quantization have achieved great success by taking advantage of deep learning. However, these approaches do not meet expectations unless expensive label information is sufficient. To resolve this issue, we propose the first quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network. We design a nov… ▽ More

    Submitted 11 June, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: 10 pages, 10 figures, Computer Vision and Pattern Recognition (CVPR) 2020 accpeted paper