Zum Hauptinhalt springen

Showing 1–50 of 51 results for author: Jung, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.04266  [pdf, other

    cs.RO eess.SY

    BPMP-Tracker: A Versatile Aerial Target Tracker Using Bernstein Polynomial Motion Primitives

    Authors: Yunwoo Lee, Jungwon Park, Boseong Jeon, Seungwoo Jung, H. Jin Kim

    Abstract: This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we e… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  2. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  3. Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

    Authors: Kwang-Hyun Uhm, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Altho… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: This article has been accepted for publication in Computers in Biology and Medicine

    Journal ref: Computers in Biology and Medicine, 108746, 2024

  4. arXiv:2404.03991  [pdf, other

    eess.IV cs.CV cs.LG

    Towards Efficient and Accurate CT Segmentation via Edge-Preserving Probabilistic Downsampling

    Authors: Shahzad Ali, Yu Rim Lee, Soo Young Park, Won Young Tak, Soon Ki Jung

    Abstract: Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries. This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions. This situation exemplifies the trade-off be… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 5 pages (4 figures, 1 table); This work has been submitted to the IEEE Signal Processing Letters. Copyright may be transferred without notice, after which this version may no longer be accessible

  5. arXiv:2403.14154  [pdf, other

    eess.SY

    LR-FHSS Transceiver for Direct-to-Satellite IoT Communications: Design, Implementation, and Verification

    Authors: Sooyeob Jung, Seongah Jeong, Jinkyu Kang, Gyeongrae Im, Sangjae Lee, Mi-Kyung Oh, Joon Gyu Ryu, Joonhyuk Kang

    Abstract: This paper proposes a long range-frequency hopping spread spectrum (LR-FHSS) transceiver design for the Direct-to-Satellite Internet of Things (DtS-IoT) communication system. The DtS-IoT system has recently attracted attention as a promising nonterrestrial network (NTN) solution to provide high-traffic and low-latency data transfer services to IoT devices in global coverage. In particular, this st… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 17pages, 23 figures

  6. arXiv:2403.05093  [pdf, other

    cs.CV eess.IV

    Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile

    Authors: Seokjun Lee, Seung-Won Jung, Hyunseok Seo

    Abstract: Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  7. arXiv:2401.13921  [pdf, other

    eess.AS cs.SD

    Intelli-Z: Toward Intelligible Zero-Shot TTS

    Authors: Sunghee Jung, Won Jang, Jaesam Yoon, Bongwan Kim

    Abstract: Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  8. arXiv:2401.13146  [pdf, other

    eess.AS cs.CL cs.SD

    Locality enhanced dynamic biasing and sampling strategies for contextual ASR

    Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  9. arXiv:2401.12085  [pdf, other

    eess.AS cs.SD

    Consistency Based Unsupervised Self-training For ASR Personalisation

    Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  10. arXiv:2312.05548  [pdf, other

    eess.IV cs.CV cs.LG

    A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data

    Authors: Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

    Journal ref: JBHI, 2022

  11. arXiv:2312.05528  [pdf, other

    eess.IV cs.CV

    Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge

    Authors: Kwang-Hyun Uhm, Hyunjun Cho, Zhixin Xu, Seohoon Lim, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: MICCAI 2023, KITS 2023 challenge 2nd place

  12. arXiv:2312.01638  [pdf, other

    eess.IV cs.CV

    J-Net: Improved U-Net for Terahertz Image Super-Resolution

    Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

    Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  13. arXiv:2311.15683  [pdf

    eess.AS cs.SD eess.SP

    Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency

    Authors: Chenyu Tang, Muzi Xu, Wentian Yi, Zibo Zhang, Edoardo Occhipinti, Chaoqun Dong, Dafydd Ravenscroft, Sung-Min Jung, Sanghyo Lee, Shuo Gao, Jong Min Kim, Luigi G. Occhipinti

    Abstract: Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 42… ▽ More

    Submitted 7 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 5 figures in the article; 11 figures and 4 tables in supplementary information

    Journal ref: npj Flexible Electronics (2024)

  14. arXiv:2307.13343  [pdf, other

    eess.AS cs.CR cs.SD

    On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

    Authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

    Abstract: Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Proceedings of INTERSPEECH 2023

  15. arXiv:2306.09382  [pdf, ps, other

    cs.SD cs.LG cs.MM eess.AS

    Sound Demixing Challenge 2023 Music Demixing Track Technical Report: TFC-TDF-UNet v3

    Authors: Minseok Kim, Jun Hyung Lee, Soonyoung Jung

    Abstract: In this report, we present our award-winning solutions for the Music Demixing Track of Sound Demixing Challenge 2023. First, we propose TFC-TDF-UNet v3, a time-efficient music source separation model that achieves state-of-the-art results on the MUSDB benchmark. We then give full details regarding our solutions for each Leaderboard, including a loss masking approach for noise-robust training. Code… ▽ More

    Submitted 21 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 5 pages, 4 tables

  16. arXiv:2306.04137  [pdf, other

    cs.MA eess.SY

    Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility

    Authors: Chanyoung Park, Gyu Seon Kim, Soohyun Park, Soyi Jung, Joongheon Kim

    Abstract: The development of urban-air-mobility (UAM) is rapidly progressing with spurs, and the demand for efficient transportation management systems is a rising need due to the multifaceted environmental uncertainties. Thus, this paper proposes a novel air transportation service management algorithm based on multi-agent deep reinforcement learning (MADRL) to address the challenges of multi-UAM cooperatio… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 15 pages, 14 figures

  17. arXiv:2305.13779  [pdf, other

    cs.AR eess.SP

    Transceiver Design and Performance Analysis for LR-FHSS-based Direct-to-Satellite IoT

    Authors: Sooyeob Jung, Seongah Jeong, Jinkyu Kang, Joon Gyu Ryu, Joonhyuk Kang

    Abstract: This paper presents a novel transceiver design aimed at enabling Direct-to-Satellite Internet of Things (DtS-IoT) systems based on long range-frequency hopping spread spectrum (LR-FHSS). Our focus lies in developing an accurate transmission method through the analysis of the frame structure and key parameters outlined in Long Range Wide-Area Network (LoRaWAN) [1]. To address the Doppler effect in… ▽ More

    Submitted 25 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 5 pages, 6 figures

    Report number: CL2023-1147

  18. Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography

    Authors: Yucheng Lu, Zhixin Xu, Moon Hyung Choi, Jimin Kim, Seung-Won Jung

    Abstract: Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effe… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

    Journal ref: IEEE Transactions on Medical Imaging (2024)

  19. arXiv:2304.05920  [pdf, other

    eess.SP physics.optics

    Learning to exploit z-Spatial Diversity for Coherent Nonlinear Optical Fiber Communication

    Authors: Sebastian Jung, Tim Uhlemann, Alexander Span, Maximilian Bauhofer, Stephan ten Brink

    Abstract: Higher-order solitons inherently possess a spatial periodicity along the propagation axis. The pulse expands and compresses in both, frequency and time domain. This property is exploited for a bandwidth-limited receiver by sampling the optical signal at two different distances. Numerical simulations show that when pure solions are transmitted and the second (i.e., further propagated) signal is als… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  20. arXiv:2302.14273  [pdf, other

    cs.RO eess.SY

    QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking

    Authors: Yunwoo Lee, Jungwon Park, Seungwoo Jung, Boseong Jeon, Dahyun Oh, H. Jin Kim

    Abstract: Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the pred… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: 15 pages, 13 figures

  21. arXiv:2301.03815  [pdf, other

    eess.SY

    Marine IoT Systems with Space-Air-Sea Integrated Networks: Hybrid LEO and UAV Edge Computing

    Authors: Sooyeob Jung, Seongah Jeong, Jinkyu Kang, Joonhyuk Kang

    Abstract: Marine Internet of Things (IoT) systems have grown substantially with the development of non-terrestrial networks (NTN) via aerial and space vehicles in the upcoming sixth-generation (6G), thereby assisting environment protection, military reconnaissance, and sea transportation. Due to unpredictable climate changes and the extreme channel conditions of maritime networks, however, it is challenging… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: 12 pages, 8 figures, 3 tables, submission in IEEE IoT Journal

    Report number: IoT-27450-2022

  22. arXiv:2301.00124  [pdf, other

    eess.SY cs.RO

    Situation-Aware Deep Reinforcement Learning for Autonomous Nonlinear Mobility Control in Cyber-Physical Loitering Munition Systems

    Authors: Hyunsoo Lee, Soohyun Park, Won Joon Yun, Soyi Jung, Joongheon Kim

    Abstract: According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

  23. arXiv:2211.03502  [pdf, other

    eess.SP cs.CV

    Neural Architectural Nonlinear Pre-Processing for mmWave Radar-based Human Gesture Perception

    Authors: Hankyul Baek, Yoo Jeong, Ha, Minjae Yoo, Soyi Jung, Joongheon Kim

    Abstract: In modern on-driving computing environments, many sensors are used for context-aware applications. This paper utilizes two deep learning models, U-Net and EfficientNet, which consist of a convolutional neural network (CNN), to detect hand gestures and remove noise in the Range Doppler Map image that was measured through a millimeter-wave (mmWave) radar. To improve the performance of classification… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 4 pages, 7 figures

  24. arXiv:2208.07639  [pdf, other

    eess.IV

    RAWtoBit: A Fully End-to-end Camera ISP Network

    Authors: Wooseok Jeong, Seung-Won Jung

    Abstract: Image compression is an essential and last processing unit in the camera image signal processing (ISP) pipeline. While many studies have been made to replace the conventional ISP pipeline with a single end-to-end optimized deep learning model, image compression is barely considered as a part of the model. In this paper, we investigate the designing of a fully end-to-end optimized camera ISP incorp… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted at ECCV2022

  25. Lightweight Encoder-Decoder Architecture for Foot Ulcer Segmentation

    Authors: Shahzad Ali, Arif Mahmood, Soon Ki Jung

    Abstract: Continuous monitoring of foot ulcer healing is needed to ensure the efficacy of a given treatment and to avoid any possibility of deterioration. Foot ulcer segmentation is an essential step in wound diagnosis. We developed a model that is similar in spirit to the well-established encoder-decoder and residual convolution neural networks. Our model includes a residual connection along with a channel… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Published version of this article is available at https://link.springer.com/chapter/10.1007/978-3-031-06381-7_17

    Journal ref: Frontiers of Computer Vision. IW-FCV 2022. Communications in Computer and Information Science, vol 1578. Springer, Cham (2022)

  26. arXiv:2204.00491  [pdf, other

    cs.CV eess.IV

    FrequencyLowCut Pooling -- Plug & Play against Catastrophic Overfitting

    Authors: Julia Grabinski, Steffen Jung, Janis Keuper, Margret Keuper

    Abstract: Over the last years, Convolutional Neural Networks (CNNs) have been the dominating neural architecture in a wide range of computer vision tasks. From an image and signal processing point of view, this success might be a bit surprising as the inherent spatial pyramid design of most CNNs is apparently violating basic signal processing laws, i.e. Sampling Theorem in their down-sampling operations. Ho… ▽ More

    Submitted 20 September, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: accepted at ECCV 2022

  27. arXiv:2203.16852  [pdf, other

    eess.AS cs.LG cs.SD

    JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech

    Authors: Dan Lim, Sunghee Jung, Eesung Kim

    Abstract: In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech. For example, FastSpeech2 transforms an input text to a mel-spectrogram and then HiFi-GAN generates a raw waveform from a mel-spectogram where they are called an acoustic feature generator and a neural vocoder respectively. However, their training pipeline i… ▽ More

    Submitted 1 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted to INTERSPEECH 2022

  28. arXiv:2202.10456  [pdf, other

    cs.LG cs.CR cs.CV eess.IV

    Feasibility Study of Multi-Site Split Learning for Privacy-Preserving Medical Systems under Data Imbalance Constraints in COVID-19, X-Ray, and Cholesterol Dataset

    Authors: Yoo Jeong Ha, Gusang Lee, Minjae Yoo, Soyi Jung, Seehwan Yoo, Joongheon Kim

    Abstract: It seems as though progressively more people are in the race to upload content, data, and information online; and hospitals haven't neglected this trend either. Hospitals are now at the forefront for multi-site medical data sharing to provide groundbreaking advancements in the way health records are shared and patients are diagnosed. Sharing of medical data is essential in modern medical research.… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

  29. arXiv:2201.05843  [pdf, other

    eess.SY cs.AI cs.LG cs.RO

    Cooperative Multi-Agent Deep Reinforcement Learning for Reliable Surveillance via Autonomous Multi-UAV Control

    Authors: Won Joon Yun, Soohyun Park, Joongheon Kim, MyungJae Shin, Soyi Jung, David A. Mohaisen, Jae-Hyun Kim

    Abstract: CCTV-based surveillance using unmanned aerial vehicles (UAVs) is considered a key technology for security in smart city environments. This paper creates a case where the UAVs with CCTV-cameras fly over the city area for flexible and reliable surveillance services. UAVs should be deployed to cover a large area while minimize overlapping and shadow areas for a reliable surveillance system. However,… ▽ More

    Submitted 15 January, 2022; originally announced January 2022.

    Comments: 10 pages, 6 figures, Accepted for publication in IEEE Transactions on Industrial Informatics (TII)

  30. arXiv:2111.13321  [pdf, other

    eess.AS cs.LG cs.SD

    Learning source-aware representations of music in a discrete latent space

    Authors: Jinsung Kim, Yeong-Seok Jeong, Woosung Choi, Jaehwa Chung, Soonyoung Jung

    Abstract: In recent years, neural network based methods have been proposed as a method that cangenerate representations from music, but they are not human readable and hardly analyzable oreditable by a human. To address this issue, we propose a novel method to learn source-awarelatent representations of music through Vector-Quantized Variational Auto-Encoder(VQ-VAE).We train our VQ-VAE to encode an input mi… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: MDX Workshop @ ISMIR 2021, 7 pages, 2 figure

  31. arXiv:2111.12516  [pdf, other

    eess.AS cs.LG cs.SD

    LightSAFT: Lightweight Latent Source Aware Frequency Transform for Source Separation

    Authors: Yeong-Seok Jeong, Jinsung Kim, Woosung Choi, Jaehwa Chung, Soonyoung Jung

    Abstract: Conditioned source separations have attracted significant attention because of their flexibility, applicability and extensionality. Their performance was usually inferior to the existing approaches, such as the single source separation model. However, a recently proposed method called LaSAFT-Net has shown that conditioned models can show comparable performance against existing single-source separa… ▽ More

    Submitted 26 January, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: MDX Workshop @ ISMIR 2021, 6 pages, 1 figure

  32. arXiv:2111.12203  [pdf, other

    eess.AS cs.SD

    KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing

    Authors: Minseok Kim, Woosung Choi, Jaehwa Chung, Daewon Lee, Soonyoung Jung

    Abstract: Recently, many methods based on deep learning have been proposed for music source separation. Some state-of-the-art methods have shown that stacking many layers with many skip connections improve the SDR performance. Although such a deep and complex architecture shows outstanding performance, it usually requires numerous computing resources and time for training and evaluation. This paper proposes… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: MDX Workshop @ ISMIR 2021, 7 pages, 3 figures

  33. arXiv:2110.08796  [pdf, other

    eess.SY

    Stable Marriage Matching for Traffic-Aware Space-Air-Ground Integrated Networks: A Gale-Shapley Algorithmic Approach

    Authors: Hyunsoo Lee, Haemin Lee, Soyi Jung, Joongheon Kim

    Abstract: In keeping with the rapid development of communication technology, a new communication structure is required in a next-generation communication system. In particular, research using High Altitude Platform (HAP) or Unmanned Aerial Vehicle(UAV) in existing terrestrial networks is active. In this paper, we propose matching HAP and UAV using the Gale-Shapley algorithm in a relay communication situatio… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

  34. arXiv:2108.10147  [pdf, other

    cs.LG cs.AI eess.IV

    Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies with COVID-19 CT, X-Ray, and Cholesterol Data

    Authors: Yoo Jeong Ha, Minjae Yoo, Gusang Lee, Soyi Jung, Sae Won Choi, Joongheon Kim, Seehwan Yoo

    Abstract: Machine learning requires a large volume of sample data, especially when it is used in high-accuracy medical applications. However, patient records are one of the most sensitive private information that is not usually shared among institutes. This paper presents spatio-temporal split learning, a distributed deep neural network framework, which is a turning point in allowing collaboration among pri… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  35. arXiv:2108.00626  [pdf, ps, other

    eess.SY

    Quantum Scheduling for Millimeter-Wave Observation Satellite Constellation

    Authors: Joongheon Kim, Yunseok Kwak, Soyi Jung, Jae-Hyun Kim

    Abstract: In beyond 5G and 6G network scenarios, the use of satellites has been actively discussed for extending target monitoring areas, even for extreme circumstances, where the monitoring functionalities can be realized due to the usage of millimeter-wave wireless links. This paper designs an efficient scheduling algorithm which minimizes overlapping monitoring areas among observation satellite constella… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

  36. arXiv:2107.11790  [pdf, other

    eess.SP

    Distributed and Autonomous Aerial Data Collection in Smart City Surveillance Applications

    Authors: Haemin Lee, Soyi Jung, Joongheon Kim

    Abstract: The massive growth of Smart City and Internet of Things applications enables safety and security. The data those are produced from surveillance cameras in aerial devices such as unmanned aerial networks (UAVs) are needed to be transferred to ground stations for secure data analysis. When the scale of network is relatively large compare to the wireless communication coverage of device, it is not al… ▽ More

    Submitted 25 July, 2021; originally announced July 2021.

  37. Progressive Joint Low-light Enhancement and Noise Removal for Raw Images

    Authors: Yucheng Lu, Seung-Won Jung

    Abstract: Low-light imaging on mobile devices is typically challenging due to insufficient incident light coming through the relatively small aperture, resulting in a low signal-to-noise ratio. Most of the previous works on low-light image processing focus either only on a single task such as illumination adjustment, color enhancement, or noise removal; or on a joint illumination adjustment and denoising ta… ▽ More

    Submitted 2 September, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

  38. arXiv:2104.13553  [pdf, other

    eess.AS cs.LG cs.SD

    AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries

    Authors: Woosung Choi, Minseok Kim, Marco A. Martínez Ramírez, Jaehwa Chung, Soonyoung Jung

    Abstract: This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is `transparent'; it usually carries… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: 10 pages, 8 figures, 3 tables, under reviewing of ACMMM 21

  39. arXiv:2010.11631  [pdf, other

    cs.SD cs.LG eess.AS

    LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

    Authors: Woosung Choi, Minseok Kim, Jaehwa Chung, Soonyoung Jung

    Abstract: Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns.… ▽ More

    Submitted 14 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 3 figures, 2 tables. accepted to ICASSP 2021

  40. arXiv:2009.05210  [pdf

    eess.SP

    A 6.3-Nanowatt-per-Channel 96-Channel Neural Spike Processor for a Movement-Intention-Decoding Brain-Computer-Interface Implant

    Authors: Zhewei Jiang, Jiangyi Li, Pavan K. Chundi, Sung Justin Kim, Minhao Yang, Joonseong Kang, Seungchul Jung, Sang Joon Kim, Mingoo Seok

    Abstract: This paper presents microwatt end-to-end neural signal processing hardware for deployment-stage real-time upper-limb movement intent decoding. This module features intercellular spike detection, sorting, and decoding operations for a 96-channel prosthetic implant. We design the algorithms for those operations to achieve minimal computation complexity while matching or advancing the accuracy of sta… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  41. arXiv:2008.06208  [pdf

    eess.AS cs.CL cs.SD

    Adaptable Multi-Domain Language Model for Transformer ASR

    Authors: Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong, Jihyun Lee, Hosik Lee, Young Sang Choi

    Abstract: We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM c… ▽ More

    Submitted 10 February, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: This paper is accepted for presentation at IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE ICASSP), 2021

  42. arXiv:2006.06942  [pdf

    eess.AS

    Domain-adversarial training of multi-speaker TTS

    Authors: Sunghee Jung, Hoirin Kim

    Abstract: Multi-speaker TTS has to learn both linguistic embedding and text embedding to generate speech of desired linguistic content in desired voice. However, it is unclear which characteristic of speech results from speaker and which part from linguistic content. In this paper, text embedding is forced to unlearn speaker dependent characteristic using gradient reversal layer to auxiliary speaker classif… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  43. arXiv:2006.06940  [pdf

    eess.AS cs.SD

    Neural voice cloning with a few low-quality samples

    Authors: Sunghee Jung, Hoirin Kim

    Abstract: In this paper, we explore the possibility of speech synthesis from low quality found data using only limited number of samples of target speaker. We try to extract only the speaker embedding from found data of target speaker unlike previous works which tries to train the entire text-to-speech system on found data. Also, the two speaker mimicking approaches which are adaptation and speaker-encoder-… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  44. arXiv:2006.06937  [pdf

    eess.AS

    Non-parallel voice conversion based on source-to-target direct mapping

    Authors: Sunghee Jung, Youngjoo Suh, Yeunju Choi, Hoirin Kim

    Abstract: Recent works of utilizing phonetic posteriograms (PPGs) for non-parallel voice conversion have significantly increased the usability of voice conversion since the source and target DBs are no longer required for matching contents. In this approach, the PPGs are used as the linguistic bridge between source and target speaker features. However, this PPG-based non-parallel voice conversion has some l… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: Submitted to Interspeech 2019

  45. arXiv:2005.10456  [pdf

    eess.AS cs.SD

    Pitchtron: Towards audiobook generation from ordinary people's voices

    Authors: Sunghee Jung, Hoirin Kim

    Abstract: In this paper, we explore prosody transfer for audiobook generation under rather realistic condition where training DB is plain audio mostly from multiple ordinary people and reference audio given during inference is from professional and richer in prosody than training DB. To be specific, we explore transferring Korean dialects and emotive speech even though training set is mostly composed of sta… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

  46. In defence of metric learning for speaker recognition

    Authors: Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han

    Abstract: The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper… ▽ More

    Submitted 24 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: The code can be found at https://github.com/clovaai/voxceleb_trainer

  47. arXiv:2001.00577  [pdf, other

    eess.AS cs.LG cs.SD

    Attention based on-device streaming speech recognition with large speech corpus

    Authors: Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim

    Abstract: In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

    Comments: Accepted and presented at the ASRU 2019 conference

  48. arXiv:1912.02591  [pdf, other

    eess.AS cs.LG cs.MM cs.SD stat.ML

    Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

    Authors: Woosung Choi, Minseok Kim, Jaehwa Chung, Daewon Lee, Soonyoung Jung

    Abstract: Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We… ▽ More

    Submitted 8 October, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 8 pages 4 tables 6 figures, accepted to ISMIR 2020

  49. arXiv:1905.11172  [pdf, other

    eess.IV cs.CV

    GRDN:Grouped Residual Dense Network for Real Image Denoising and GAN-based Real-world Noise Modeling

    Authors: Dong-Wook Kim, Jae Ryun Chung, Seung-Won Jung

    Abstract: Recent research on image denoising has progressed with the development of deep learning architectures, especially convolutional neural networks. However, real-world image denoising is still very challenging because it is not possible to obtain ideal pairs of ground-truth images and real-world noisy images. Owing to the recent release of benchmark datasets, the interest of the image denoising commu… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR 2019 workshop. The winners of the NTIRE2019 Challenge on Image Denoising Challenge: Track 2 sRGB

  50. arXiv:1701.06811  [pdf, other

    eess.SY

    Socio-technical Smart Grid Optimization via Decentralized Charge Control of Electric Vehicles

    Authors: Evangelos Pournaras, Seoho Jung, Srivatsan Yadhunathan, Huiting Zhang, Xingliang Fang

    Abstract: The penetration of electric vehicles becomes a catalyst for the sustainability of Smart Cities. However, unregulated battery charging remains a challenge causing high energy costs, power peaks or even blackouts. This paper studies this challenge from a socio-technical perspective: social dynamics such as the participation in demand-response programs, the discomfort experienced by alternative sugge… ▽ More

    Submitted 21 May, 2019; v1 submitted 24 January, 2017; originally announced January 2017.