Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Yoon, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Sujin Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  2. arXiv:2401.15938  [pdf, other

    cs.CV eess.SY

    Motion-induced error reduction for high-speed dynamic digital fringe projection system

    Authors: Sanghoon Jeon, Hyo-Geon Lee, Jae-Sung Lee, Bo-Min Kang, Byung-Wook Jeon, Jun Young Yoon, Jae-Sang Hyun

    Abstract: In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, whic… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 9 pages, 7 figures

  3. arXiv:2401.13921  [pdf, other

    eess.AS cs.SD

    Intelli-Z: Toward Intelligible Zero-Shot TTS

    Authors: Sunghee Jung, Won Jang, Jaesam Yoon, Bongwan Kim

    Abstract: Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  4. arXiv:2310.08598  [pdf, other

    eess.IV cs.AI cs.CV

    Domain Generalization for Medical Image Analysis: A Survey

    Authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk

    Abstract: Medical image analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, deploying DL models for MedIA in real-world situations remains challenging due to their failure to generalize across the distributional gap bet… ▽ More

    Submitted 15 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  5. arXiv:2308.07947  [pdf

    physics.med-ph eess.IV physics.optics

    Targeted Multispectral Filter Array Design for Endoscopic Cancer Detection in the Gastrointestinal Tract

    Authors: Michaela Taylor-Williams, Ran Tao, Travis W Sawyer, Dale J Waterhouse, Jonghee Yoon, Sarah E Bohndiek

    Abstract: Colour differences between healthy and diseased tissue in the gastrointestinal tract are detected visually by clinicians during white light endoscopy (WLE); however, the earliest signs of disease are often just a slightly different shade of pink compared to healthy tissue. Here, we propose to target alternative colours for imaging to improve contrast using custom multispectral filter arrays (MSFAs… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 29 pages

  6. arXiv:2306.10058  [pdf, other

    cs.LG cs.CL eess.AS

    EM-Network: Oracle Guided Self-distillation for Sequence Learning

    Authors: Ji Won Yoon, Sunghwan Ahn, Hyeonseung Lee, Minchan Kim, Seok Min Kim, Nam Soo Kim

    Abstract: We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the t… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  7. arXiv:2306.08463  [pdf, other

    eess.AS

    MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization

    Authors: Ji Won Yoon, Seok Min Kim, Nam Soo Kim

    Abstract: Self-supervised learning (SSL) has shown significant progress in speech processing tasks. However, despite the intrinsic randomness in the Transformer structure, such as dropout variants and layer-drop, improving the model-level consistency remains under-explored in the speech SSL literature. To address this, we propose a new pre-training method that uses consistency regularization to improve Data… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: INTERSPEECH 2023

  8. arXiv:2211.15075  [pdf, other

    eess.AS cs.SD

    Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

    Authors: Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim

    Abstract: Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist temporal classification (CTC)-based model has attracted research interest due to its non-autoregressive nature. However, such CTC models require a heavy comput… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted by 2022 SLT Workshop

  9. arXiv:2210.05524  [pdf, other

    cs.RO eess.SY

    A Learning-Based Estimation and Control Framework for Contact-Intensive Tight-Tolerance Tasks

    Authors: Bukun Son, Hyelim Choi, Jaemin Yoon, Dongjun Lee

    Abstract: We present a two-stage framework that integrates a learning-based estimator and a controller, designed to address contact-intensive tasks. The estimator leverages a Bayesian particle filter with a mixture density network (MDN) structure, effectively handling multi-modal issues arising from contact information. The controller combines a self-supervised and reinforcement learning (RL) approach, stra… ▽ More

    Submitted 1 August, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  10. arXiv:2208.12332  [pdf, other

    cs.CV cs.LG eess.IV

    2nd Place Solutions for UG2+ Challenge 2022 -- D$^{3}$Net for Mitigating Atmospheric Turbulence from Images

    Authors: Sunder Ali Khowaja, Ik Hyun Lee, Jiseok Yoon

    Abstract: This technical report briefly introduces to the D$^{3}$Net proposed by our team "TUK-IKLAB" for Atmospheric Turbulence Mitigation in $UG2^{+}$ Challenge at CVPR 2022. In the light of test and validation results on textual images to improve text recognition performance and hot-air balloon images for image enhancement, we can say that the proposed method achieves state-of-the-art performance. Furthe… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: 4 pages, 4 figures

  11. arXiv:2207.13223  [pdf, other

    cs.LG eess.IV

    XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning

    Authors: Ahmad Wisnu Mulyadi, Wonsik Jung, Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

    Abstract: Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-le… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  12. arXiv:2206.09074  [pdf, other

    cs.LG eess.SP

    Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact

    Authors: Arnab Dey, Mononito Goswami, Joo Heung Yoon, Gilles Clermont, Michael Pinsky, Marilyn Hravnak, Artur Dubrawski

    Abstract: A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact.… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted at American Medical Informatics Association (AMIA) Annual Symposium 2022. 10 pages, 4 figures and 2 tables

  13. arXiv:2204.06328  [pdf, other

    cs.CL cs.SD eess.AS

    HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

    Authors: Ji Won Yoon, Beom Jun Woo, Nam Soo Kim

    Abstract: Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2.0, has brought significant improvements in automatic speech recognition (ASR). However, these models usually require an expensive computational cost to achieve outstanding performance, slowing down the inference speed. To improve the model efficiency, we introduce an early exit scheme for ASR, namely HuBERT-E… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted by INTERSPEECH 2024

  14. arXiv:2111.03664  [pdf, other

    cs.LG eess.AS eess.IV

    Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

    Authors: Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

    Abstract: Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teach… ▽ More

    Submitted 11 August, 2023; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  15. arXiv:2106.07889  [pdf, other

    eess.AS cs.SD

    UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

    Authors: Won Jang, Dan Lim, Jaesam Yoon, Bongwan Kim, Juntae Kim

    Abstract: Most neural vocoders employ band-limited mel-spectrograms to generate waveforms. If full-band spectral features are used as the input, the vocoder can be provided with as much acoustic information as possible. However, in some models employing full-band mel-spectrograms, an over-smoothing problem occurs as part of which non-sharp spectrograms are generated. To address this problem, we propose Univ… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  16. arXiv:2105.00240  [pdf, other

    eess.IV cs.CV cs.LG

    Simultaneous super-resolution and motion artifact removal in diffusion-weighted MRI using unsupervised deep learning

    Authors: Hyungjin Chung, Jaehyun Kim, Jeong Hee Yoon, Jeong Min Lee, Jong Chul Ye

    Abstract: Diffusion-weighted MRI is nowadays performed routinely due to its prognostic ability, yet the quality of the scans are often unsatisfactory which can subsequently hamper the clinical utility. To overcome the limitations, here we propose a fully unsupervised quality enhancement scheme, which boosts the resolution and removes the motion artifact simultaneously. This process is done by first training… ▽ More

    Submitted 1 May, 2021; originally announced May 2021.

  17. arXiv:2011.09631  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

    Authors: Won Jang, Dan Lim, Jaesam Yoon

    Abstract: We propose Universal MelGAN, a vocoder that synthesizes high-fidelity speech in multiple domains. To preserve sound quality when the MelGAN-based structure is trained with a dataset of hundreds of speakers, we added multi-resolution spectrogram discriminators to sharpen the spectral resolution of the generated waveforms. This enables the model to generate realistic waveforms of multi-speakers, by… ▽ More

    Submitted 3 March, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

  18. TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition

    Authors: Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won Ik Cho, Nam Soo Kim

    Abstract: In recent years, there has been a great deal of research in developing end-to-end speech recognition models, which enable simplifying the traditional pipeline and achieving promising results. Despite their remarkable performance improvements, end-to-end models typically require expensive computational cost to show successful performance. To reduce this computational burden, knowledge distillation… ▽ More

    Submitted 16 September, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  19. arXiv:2005.08213  [pdf, other

    cs.CL cs.SD eess.AS

    Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

    Authors: Won Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim

    Abstract: Speech is one of the most effective means of communication and is full of information that helps the transmission of utterer's thoughts. However, mainly due to the cumbersome processing of acoustic features, phoneme or word posterior probability has frequently been discarded in understanding the natural language. Thus, some recent spoken language understanding (SLU) modules have utilized end-to-en… ▽ More

    Submitted 8 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020 Camera-ready

  20. arXiv:2005.07799  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

    Authors: Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bongwan Kim, Jaesam Yoon

    Abstract: We propose Jointly trained Duration Informed Transformer (JDI-T), a feed-forward Transformer with a duration predictor jointly trained without explicit alignments in order to generate an acoustic feature sequence from an input text. In this work, inspired by the recent success of the duration informed networks such as FastSpeech and DurIAN, we further simplify its sequential, two-stage training pi… ▽ More

    Submitted 4 October, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: Accepted for publication in Interspeech 2020

  21. arXiv:1911.04824  [pdf, other

    cs.IR cs.SD eess.AS

    How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging

    Authors: Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jay Ho Jeon, Jason Yoon

    Abstract: Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram rep… ▽ More

    Submitted 28 June, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: The 28th European Signal Processing Conference (EUSIPCO)

  22. arXiv:1909.13692  [pdf

    eess.IV cs.LG eess.SP stat.ML

    Nonlinear Dipole Inversion (NDI) enables Quantitative Susceptibility Mapping (QSM) without parameter tuning

    Authors: Daniel Polak, Itthi Chatnuntawech, Jaeyeon Yoon, Siddharth Srinivasan Iyer, Jongho Lee, Peter Bachert, Elfar Adalsteinsson, Kawin Setsompop, Berkin Bilgic

    Abstract: We propose Nonlinear Dipole Inversion (NDI) for high-quality Quantitative Susceptibility Mapping (QSM) without regularization tuning, while matching the image quality of state-of-the-art reconstruction techniques. In addition to avoiding over-smoothing that these techniques often suffer from, we also obviate the need for parameter selection. NDI is flexible enough to allow for reconstruction from… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  23. arXiv:1909.09263  [pdf, other

    cs.CV cs.LG eess.IV

    Propagated Perturbation of Adversarial Attack for well-known CNNs: Empirical Study and its Explanation

    Authors: Jihyeun Yoon, Kyungyul Kim, Jongseong Jang

    Abstract: Deep Neural Network based classifiers are known to be vulnerable to perturbations of inputs constructed by an adversarial attack to force misclassification. Most studies have focused on how to make vulnerable noise by gradient based attack methods or to defense model from adversarial attack. The use of the denoiser model is one of a well-known solution to reduce the adversarial noise although clas… ▽ More

    Submitted 23 September, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

    Journal ref: ICCV 2019 Workshop on Interpreting and Explaining Visual Artificial Intelligence Models

  24. arXiv:1909.07716  [pdf

    eess.IV

    Exploring linearity of deep neural network trained QSM: QSMnet+

    Authors: Woojin Jung, Jaeyeon Yoon, Joon Yul Choi, Jae Myung Kim, Yoonho Nam, Eung Yeop Kim, Jongho Lee

    Abstract: Recently, deep neural network-powered quantitative susceptibility mapping (QSM), QSMnet, successfully performed ill conditioned dipole inversion in QSM and generated high-quality susceptibility maps. In this paper, the network, which was trained by healthy volunteer data, is evaluated for hemorrhagic lesions that have substantially higher susceptibility than healthy tissues in order to test linear… ▽ More

    Submitted 14 October, 2019; v1 submitted 17 September, 2019; originally announced September 2019.

    Comments: 22 pages

  25. arXiv:1906.05797  [pdf, other

    cs.CV cs.GR eess.IV

    The Replica Dataset: A Digital Replica of Indoor Spaces

    Authors: Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra , et al. (5 additional authors not shown)

    Abstract: We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale. Each scene consists of a dense mesh, high-resolution high-dynamic-range (HDR) textures, per-primitive semantic class and instance information, and planar mirror and glass reflectors. The goal of Replica is to enable machine learning (ML) research that relies on visually, geometr… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  26. arXiv:1904.02644  [pdf, other

    physics.optics eess.IV

    Characterising optical fibre transmission matrices using metasurface reflector stacks for lensless imaging without distal access

    Authors: George S. D. Gordon, Milana Gataric, Alberto Gil C. P. Ramos, Ralf Mouthaan, Calum Williams, Jonghee Yoon, Timothy D. Wilkinson, Sarah E. Bohndiek

    Abstract: The ability to form images through hair-thin optical fibres promises to open up new applications from biomedical imaging to industrial inspection. Unfortunately, deployment has been limited because small changes in mechanical deformation (e.g. bending) and temperature can completely scramble optical information, distorting images. Since such changes are dynamic, correcting them requires measuremen… ▽ More

    Submitted 5 April, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

    Comments: Main text: 38 pages, 9 Figures, Appendices: 26 pages, 6 Figures. Corrected author affiliation

    Journal ref: Phys. Rev. X 9, 041050 (2019)

  27. arXiv:1810.04325  [pdf, other

    eess.SP

    Analysis of Maximal Topologies Achieving Optimal DoF and DoF $\frac{1}{n}$ in Topological Interference Management

    Authors: Jong-Yoon Yoon, Jong-Seon No

    Abstract: Topological interference management (TIM) can obtain degrees of freedom (DoF) gains with no channel state information at the transmitters (CSIT) except topological information of network in the interference channel. It was shown that TIM achieves the optimal symmetric DoF when internal conflict does not exist among messages. However, it is difficult to assure whether a specific topology can achiev… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

  28. arXiv:1808.02401  [pdf, other

    cs.IT eess.SP

    Building Encoder and Decoder with Deep Neural Networks: On the Way to Reality

    Authors: Minhoe Kim, Woonsup Lee, Jungmin Yoon, Ohyun Jo

    Abstract: Deep learning has been a groundbreaking technology in various fields as well as in communications systems. In spite of the notable advancements of deep neural network (DNN) based technologies in recent years, the high computational complexity has been a major obstacle to apply DNN in practical communications systems which require real-time operation. In this sense, challenges regarding practical i… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: This work has been submitted to the IEEE for possible publication

  29. Quantitative Susceptibility Mapping using Deep Neural Network: QSMnet

    Authors: Jaeyeon Yoon, Enhao Gong, Itthi Chatnuntawech, Berkin Bilgic, Jingu Lee, Woojin Jung, Jingyu Ko, Hosan Jung, Kawin Setsompop, Greg Zaharchuk, Eung Yeop Kim, John Pauly, Jongho Lee

    Abstract: Deep neural networks have demonstrated promising potential for the field of medical image reconstruction. In this work, an MRI reconstruction algorithm, which is referred to as quantitative susceptibility mapping (QSM), has been developed using a deep neural network in order to perform dipole deconvolution, which restores magnetic susceptibility source from an MRI field map. Previous approaches of… ▽ More

    Submitted 15 June, 2018; v1 submitted 15 March, 2018; originally announced March 2018.

    Comments: This work is accepted in neuroimage on 8 June, 2018 and soon will be published. The pubmed link is https://www.ncbi.nlm.nih.gov/pubmed/29894829