Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Yoon, S H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.18341  [pdf

    cs.CL cs.AI

    CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images

    Authors: Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon

    Abstract: Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain… ▽ More

    Submitted 14 January, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

  2. arXiv:2212.12170  [pdf, other

    cs.HC cs.GR

    Sense of Embodiment Inducement for People with Reduced Lower-body Mobility and Sensations with Partial-Visuomotor Stimulation

    Authors: Hyuckjin Jang, Taehei Kim, Seo Young Oh, Jeongmi Lee, Sunghee Lee, Sang Ho Yoon

    Abstract: To induce the Sense of Embodiment~(SoE) on the virtual 3D avatar during a Virtual Reality~(VR) walking scenario, VR interfaces have employed the visuotactile or visuomotor approaches. However, people with reduced lower-body mobility and sensation~(PRLMS) who are incapable of feeling or moving their legs would find this task extremely challenging. Here, we propose an upper-body motion tracking-base… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

    Journal ref: ACM SIGGRAPH 2022 Emerging Technologies

  3. arXiv:2211.11381  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    LISA: Localized Image Stylization with Audio via Implicit Neural Representation

    Authors: Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

    Abstract: We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a pa… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  4. arXiv:2208.14114  [pdf, other

    cs.CV

    Robust Sound-Guided Image Manipulation

    Authors: Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

    Abstract: Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in provi… ▽ More

    Submitted 24 April, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: text overlap with arXiv:2112.00007

  5. arXiv:2204.09273  [pdf, other

    cs.CV cs.AI

    Sound-Guided Semantic Video Generation

    Authors: Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

    Abstract: The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound… ▽ More

    Submitted 21 October, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  6. arXiv:2112.00007  [pdf, other

    cs.GR cs.CV cs.LG cs.SD eess.AS

    Sound-Guided Semantic Image Manipulation

    Authors: Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

    Abstract: The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a fra… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.