Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Min, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.18892  [pdf, other

    cs.RO cs.AI eess.SY

    SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces

    Authors: Seunghyeop Nam, Tuan Anh Nguyen, Eunmi Choi, Dugki Min

    Abstract: This paper introduces SHANGUS, an advanced framework combining Deep Reinforcement Learning (DRL) with heuristic optimization to improve frontier-based exploration efficiency in unknown environments, particularly for intelligent vehicles in autonomous air services, search and rescue operations, and space exploration robotics. SHANGUS harnesses DRL's adaptability and heuristic prioritization, marked… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  2. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report

  3. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  4. arXiv:2306.11427  [pdf

    eess.AS

    Auditory Neural Response Inspired Sound Event Detection Based on Spectro-temporal Receptive Field

    Authors: Deokki Min, Hyeonuk Nam, Yong-Hwa Park

    Abstract: Sound event detection (SED) is one of tasks to automate function by human auditory system which listens and understands auditory scenes. Therefore, we were inspired to make SED recognize sound events in the way human auditory system does. Spectro-temporal receptive field (STRF), an approach to describe the relationship between perceived sound at ear and transformed neural response in the auditory… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Submitted to DCASE 2023 Workshop

  5. arXiv:2306.11277  [pdf, other

    cs.SD eess.AS

    Frequency & Channel Attention for Computationally Efficient Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Yong-Hwa Park

    Abstract: We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue assoc… ▽ More

    Submitted 28 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to DCASE 2023 workshop

  6. Adaptive Endpointing with Deep Contextual Multi-armed Bandits

    Authors: Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

    Abstract: Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Journal ref: Proc. IEEE ICASSP, June 2023

  7. arXiv:2211.09383  [pdf, other

    eess.AS cs.AI cs.SD

    Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

    Authors: Minki Kang, Dongchan Min, Sung Ju Hwang

    Abstract: There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers' styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adapt… ▽ More

    Submitted 13 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

  8. arXiv:2208.10922  [pdf, other

    cs.CV cs.LG eess.AS eess.IV

    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

    Authors: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang

    Abstract: We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

  9. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  10. arXiv:2206.11645  [pdf, ps, other

    eess.AS

    Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi, Yong-Hwa Park

    Abstract: While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Reprot submitted for DCASE2022 Challenge Task4

  11. arXiv:2106.03153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

    Authors: Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

    Abstract: With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho… ▽ More

    Submitted 16 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by ICML 2021

  12. arXiv:2011.10897   

    cs.AI eess.SY

    Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

    Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

    Abstract: Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First… ▽ More

    Submitted 19 May, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: We request withdrawal of this article due to a definition error on methodology and problem definition (Section 3-4; pages 2-5)

  13. arXiv:2006.16659  [pdf, other

    eess.SY cs.LG

    Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid

    Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

    Abstract: A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as… ▽ More

    Submitted 20 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

  14. arXiv:1610.00089  [pdf

    eess.SY

    System Identification of NN-based Model Reference Control of RUAV during Hover

    Authors: Bhaskar Prasad Rimal, Idris E. Putro, Agus Budiyono, Dugki Min, Eunmi Choi

    Abstract: UAV control system is a huge and complex system, and to design and test a UAV control system is time-cost and money-cost. This paper considered the simulation of identification of a nonlinear system dynamics using artificial neural networks approach. This experiment develops a neural network model of the plant that we want to control. In the control design stage, experiment uses the neural network… ▽ More

    Submitted 1 October, 2016; originally announced October 2016.

    Comments: 26 pages, Book Chapter, Artificial Neural Networks- Industrial and Control Engineering Applications, INTECH, April, 2011