Zum Hauptinhalt springen

Showing 1–50 of 61 results for author: Han, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.04752  [pdf, other

    eess.AS cs.SD

    HILCodec: High Fidelity and Lightweight Neural Audio Codec

    Authors: Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

    Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2405.03129  [pdf, other

    eess.SP cs.IT

    Active Sensing for Multiuser Beam Tracking with Reconfigurable Intelligent Surface

    Authors: Han Han, Tao Jiang, Wei Yu

    Abstract: This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channe… ▽ More

    Submitted 31 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  3. arXiv:2404.17585  [pdf, other

    cs.HC cs.AI cs.LG eess.SP

    NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG

    Authors: Cheol-Hui Lee, Hakseung Kim, Hyun-jee Han, Min-Kyung Jung, Byung C. Yoon, Dong-Joo Kim

    Abstract: The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need fo… ▽ More

    Submitted 13 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures

  4. arXiv:2403.14402  [pdf, other

    cs.SD cs.CL eess.AS

    XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

    Authors: HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

    Abstract: Speech recognition and translation systems perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noise. However, audio-visual (AV) data is only available in limited amounts and for fewer languages than audio-only resources. To address this gap, we present XLAVS-R, a cross-lingual audio-v… ▽ More

    Submitted 12 August, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ACL2024

  5. arXiv:2402.09797  [pdf, other

    cs.SD cs.HC eess.AS

    A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings

    Authors: Hyewon Han, Naveen Kumar

    Abstract: In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted for presentation at the Hands-free Speech Communication and Microphone Arrays (HSCMA 2024)

  6. arXiv:2402.00744  [pdf, other

    cs.SD cs.CL eess.AS

    BATON: Aligning Text-to-Audio Model with Human Preference Feedback

    Authors: Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

    Abstract: With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, a framework designed to enhance the alignment betw… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  7. arXiv:2401.01685  [pdf

    eess.IV cs.CV

    Modality Exchange Network for Retinogeniculate Visual Pathway Segmentation

    Authors: Hua Han, Cheng Li, Lei Xie, Yuanjing Feng, Alou Diakite, Shanshan Wang

    Abstract: Accurate segmentation of the retinogeniculate visual pathway (RGVP) aids in the diagnosis and treatment of visual disorders by identifying disruptions or abnormalities within the pathway. However, the complex anatomical structure and connectivity of RGVP make it challenging to achieve accurate segmentation. In this study, we propose a novel Modality Exchange Network (ME-Net) that effectively utili… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  8. arXiv:2401.01654  [pdf, other

    eess.IV cs.LG

    LESEN: Label-Efficient deep learning for Multi-parametric MRI-based Visual Pathway Segmentation

    Authors: Alou Diakite, Cheng Li, Lei Xie, Yuanjing Feng, Hua Han, Shanshan Wang

    Abstract: Recent research has shown the potential of deep learning in multi-parametric MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for training is laborious and time-consuming. Therefore, it is crucial to develop effective algorithms in situations with limited labeled samples. In this work, we propose a label-efficient deep learning method with self-ensembling (LESEN). LESEN… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  9. arXiv:2312.10472  [pdf, other

    cs.LG cs.AI eess.SY

    Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

    Authors: Ruining Zhang, Haoran Han, Maolong Lv, Qisong Yang, Jian Cheng

    Abstract: Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state sp… ▽ More

    Submitted 31 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

  10. arXiv:2312.09446  [pdf, other

    eess.SP cs.AI cs.CV

    A Distributed Inference System for Detecting Task-wise Single Trial Event-Related Potential in Stream of Satellite Images

    Authors: Sung-Jin Kim, Heon-Gyu Kwak, Hyeon-Taek Han, Dae-Hyeok Lee, Ji-Hoon Jeong, Seong-Whan Lee

    Abstract: Brain-computer interface (BCI) has garnered the significant attention for their potential in various applications, with event-related potential (ERP) performing a considerable role in BCI systems. This paper introduces a novel Distributed Inference System tailored for detecting task-wise single-trial ERPs in a stream of satellite images. Unlike traditional methodologies that employ a single model… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  11. arXiv:2312.06065  [pdf, other

    eess.AS cs.SD

    EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

    Authors: Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

    Abstract: In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  12. Learning to Solve Inverse Problems for Perceptual Sound Matching

    Authors: Han Han, Vincent Lostanlen, Mathieu Lagrange

    Abstract: Perceptual sound matching (PSM) aims to find the input parameters to a synthesizer so as to best imitate an audio target. Deep learning for PSM optimizes a neural network to analyze and reconstruct prerecorded samples. In this context, our article addresses the problem of designing a suitable loss function when the training set is generated by a differentiable synthesizer. Our main contribution is… ▽ More

    Submitted 6 May, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  13. arXiv:2307.13821  [pdf, other

    cs.SD cs.AI cs.LG eess.AS math.FA

    Fitting Auditory Filterbanks with Multiresolution Neural Networks

    Authors: Vincent Lostanlen, Daniel Haider, Han Han, Mathieu Lagrange, Peter Balazs, Martin Ehler

    Abstract: Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optima… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: 4 pages, 4 figures, 1 table, conference

    Journal ref: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2023)

  14. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  15. arXiv:2306.01411  [pdf, other

    eess.AS cs.SD

    HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

    Authors: Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

    Abstract: This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then restore missing signal components, our model performs these tasks in parallel using two heterogeneous decoder networks. Based on the U-Net style enco… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  16. arXiv:2305.19051  [pdf, other

    eess.AS cs.AI cs.SD

    Towards single integrated spoofing-aware speaker verification embeddings

    Authors: Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

    Abstract: This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023. Code and models are available in https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline

  17. arXiv:2301.10183  [pdf, other

    cs.SD cs.LG eess.AS

    Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

    Authors: Cyrus Vahidi, Han Han, Changhong Wang, Mathieu Lagrange, György Fazekas, Vincent Lostanlen

    Abstract: Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in deep learning. Currently, autoe… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  18. arXiv:2301.02886  [pdf, other

    cs.SD cs.LG eess.AS

    Perceptual-Neural-Physical Sound Matching

    Authors: Han Han, Vincent Lostanlen, Mathieu Lagrange

    Abstract: Sound matching algorithms seek to approximate a target waveform by parametric audio synthesis. Deep neural networks have achieved promising results in matching sustained harmonic tones. However, the task is more challenging when targets are nonstationary and inharmonic, e.g., percussion. We attribute this problem to the inadequacy of loss function. On one hand, mean square error in the parametric… ▽ More

    Submitted 13 March, 2023; v1 submitted 7 January, 2023; originally announced January 2023.

  19. arXiv:2212.13544  [pdf, other

    cs.DC eess.SP

    Enhancing Federated Learning with spectrum allocation optimization and device selection

    Authors: Tinghao Zhang, Kwok-Yan Lam, Jun Zhao, Feng Li, Huimei Han, Norziana Jamil

    Abstract: Machine learning (ML) is a widely accepted means for supporting customized services for mobile devices and applications. Federated Learning (FL), which is a promising approach to implement machine learning while addressing data privacy concerns, typically involves a large number of wireless mobile devices to collect model training data. Under such circumstances, FL is expected to meet stringent tr… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

    Comments: This paper is accepted by IEEE/ACM Transactions on Networking

  20. arXiv:2211.08783  [pdf

    eess.IV cs.CV cs.LG

    Uncertainty-Aware Multi-Parametric Magnetic Resonance Image Information Fusion for 3D Object Segmentation

    Authors: Cheng Li, Yousuf Babiker M. Osman, Weijian Huang, Zhenzhen Xue, Hua Han, Hairong Zheng, Shanshan Wang

    Abstract: Multi-parametric magnetic resonance (MR) imaging is an indispensable tool in the clinic. Consequently, automatic volume-of-interest segmentation based on multi-parametric MR imaging is crucial for computer-aided disease diagnosis, treatment planning, and prognosis monitoring. Despite the extensive studies conducted in deep learning-based medical image analysis, further investigations are still req… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  21. arXiv:2210.02732  [pdf, other

    eess.AS

    Fully Unsupervised Training of Few-shot Keyword Spotting

    Authors: Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples. To alleviate the expensive data collection with labeling, in this paper, we propose a novel FS-KWS system trained only on synthetic data. The proposed system is based on metric le… ▽ More

    Submitted 6 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE SLT 2022

  22. arXiv:2209.14900  [pdf, other

    cs.LG eess.SP

    Joint Optimization of Energy Consumption and Completion Time in Federated Learning

    Authors: Xinyu Zhou, Jun Zhao, Huimei Han, Claude Guet

    Abstract: Federated Learning (FL) is an intriguing distributed machine learning approach due to its privacy-preserving characteristics. To balance the trade-off between energy and execution latency, and thus accommodate different demands and application scenarios, we formulate an optimization problem to minimize a weighted sum of total energy consumption and completion time through two weight parameters. Th… ▽ More

    Submitted 10 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: This paper appears in the Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS) 2022. Please feel free to contact us for questions or remarks

  23. arXiv:2209.13871  [pdf, ps, other

    eess.SP cs.SI

    Resource Allocation and Resolution Control in the Metaverse with Mobile Augmented Reality

    Authors: Peiyuan Si, Jun Zhao, Huimei Han, Kwok-Yan Lam, Yang Liu

    Abstract: With the development of blockchain and communication techniques, the Metaverse is considered as a promising next-generation Internet paradigm, which enables the connection between reality and the virtual world. The key to rendering a virtual world is to provide users with immersive experiences and virtual avatars, which is based on virtual reality (VR) technology and high data transmission rate. H… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: A full paper published in IEEE Global Communications Conference (GLOBECOM) 2022

  24. arXiv:2208.08012  [pdf, other

    eess.AS cs.SD

    Disentangled Speaker Representation Learning via Mutual Information Minimization

    Authors: Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim

    Abstract: Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive lo… ▽ More

    Submitted 12 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted by APSIPA ASC 2022. Camera-ready. 8 pages, 4 figures, and 1 table

  25. arXiv:2206.15400  [pdf, other

    eess.AS cs.AI cs.LG

    Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

    Authors: Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

    Abstract: In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences. Unlike previous approaches requiring speech keyword enrollment, our method compares input queries with an enrolled text keyword sequence. To place the audio and text representations within a common latent space, we adopt an attenti… ▽ More

    Submitted 1 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022

  26. arXiv:2204.08269  [pdf, other

    cs.SD cs.LG eess.AS

    Differentiable Time-Frequency Scattering on GPU

    Authors: John Muradeli, Cyrus Vahidi, Changhong Wang, Han Han, Vincent Lostanlen, Mathieu Lagrange, George Fazekas

    Abstract: Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet… ▽ More

    Submitted 19 July, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: 8 pages, 6 figures. Submitted to the International Conference on Digital Audio Effects (DAFX) 2022

  27. arXiv:2204.01005  [pdf, other

    eess.AS cs.AI

    Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

    Authors: Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim

    Abstract: The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional lay… ▽ More

    Submitted 12 October, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted by IEEE SLT 2022. 7 pages, 4 figures, 1 table. Code is available at https://github.com/msh9184/ska-tdnn.git

  28. arXiv:2203.07373  [pdf, other

    eess.IV cs.AI cs.CV

    SATr: Slice Attention with Transformer for Universal Lesion Detection

    Authors: Han Li, Long Chen, Hu Han, S. Kevin Zhou

    Abstract: Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis. Promising ULD results have been reported by multi-slice-input detection approaches which model 3D context from multiple adjacent CT slices, but such methods still experience difficulty in obtaining a global representation among different slices and within each individual slice since they on… ▽ More

    Submitted 12 March, 2022; originally announced March 2022.

    Comments: 11 pages, 3 figures

  29. arXiv:2203.06967  [pdf, other

    eess.IV cs.CV

    Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots

    Authors: Zejin Wang, Jiazheng Liu, Guoqing Li, Hua Han

    Abstract: Real noisy-clean pairs on a large scale are costly and difficult to obtain. Meanwhile, supervised denoisers trained on synthetic data perform poorly in practice. Self-supervised denoisers, which learn only from single noisy images, solve the data collection problem. However, self-supervised denoising methods, especially blindspot-driven ones, suffer sizable information loss during input or network… ▽ More

    Submitted 7 May, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR2022

  30. arXiv:2202.11918  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

    Authors: Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

    Abstract: Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  31. arXiv:2112.08929  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

    Authors: Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, Nam Soo Kim

    Abstract: In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In t… ▽ More

    Submitted 24 December, 2021; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by IEEE Access

  32. arXiv:2111.00428  [pdf, other

    eess.SP

    Reconfigurable Intelligent Surface-induced Randomness for mmWave Key Generation

    Authors: Shubo Yang, Han Han, Yihong Liu, Weisi Guo, Zhibo Pang, Lei Zhang

    Abstract: Secret key generation in physical layer security exploits the unpredictable random nature of wireless channels. The millimeter-wave (mmWave) channels have limited multipath and channel randomness in static environments. In this paper, for mmWave secret key generation of physical layer security, we use a reconfigurable intelligent surface (RIS) to induce randomness directly in wireless environments… ▽ More

    Submitted 8 August, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Add contents, including continuous group phase shifts and secret key rate analysis

  33. arXiv:2106.15345  [pdf, other

    cs.CV cs.LG eess.IV

    Where is the disease? Semi-supervised pseudo-normality synthesis from an abnormal image

    Authors: Yuanqi Du, Quan Quan, Hu Han, S. Kevin Zhou

    Abstract: Pseudo-normality synthesis, which computationally generates a pseudo-normal image from an abnormal one (e.g., with lesions), is critical in many perspectives, from lesion detection, data augmentation to clinical surgery suggestion. However, it is challenging to generate high-quality pseudo-normal images in the absence of the lesion information. Thus, expensive lesion segmentation data have been in… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  34. arXiv:2106.06455  [pdf, ps, other

    eess.SY

    Certifying the LTL Formula p Until q in Hybrid Systems

    Authors: Hyejin Han, Mohamed Maghenem, Ricardo G. Sanfelice

    Abstract: In this paper, we propose sufficient conditions to guarantee that a linear temporal logic (LTL) formula of the form p Until q, denoted by $p \mathcal{U} q$, is satisfied for a hybrid system. Roughly speaking, the formula $p \mathcal{U} q$ is satisfied means that the solutions, initially satisfying proposition p, keep satisfying this proposition until proposition q is satisfied. To certify such a f… ▽ More

    Submitted 17 August, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: 21 pages. The technical report accompanying "Certifying the LTL Formula p Until q in Hybrid Systems" submitted to IEEE Transactions on Automatic Control, 2021

  35. arXiv:2103.00829  [pdf, ps, other

    cs.IT eess.SP

    6G Downlink Transmission via Rate Splitting Space Division Multiple Access Based on Grouped Code Index Modulation

    Authors: Wenchao Zhai, Yishan Wu, Jun Zhao, Huimei Han

    Abstract: A novel rate splitting space division multiple access (SDMA) scheme based on grouped code index modulation (GrCIM) is proposed for the sixth generation (6G) downlink transmission. The proposed RSMA-GrCIM scheme transmits information to multiple user equipments (UEs) through the space division multiple access (SDMA) technique, and exploits code index modulation for rate splitting. Since the CIM sch… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  36. arXiv:2101.06421  [pdf, ps, other

    eess.SP

    Smart City Enabled by 5G/6G Networks: An Intelligent Hybrid Random Access Scheme

    Authors: Huimei Han, Wenchao Zhai, Jun Zhao

    Abstract: The Internet of Things (IoT) is the enabler for smart city to achieve the envision of the "Internet of Everything" by intelligently connecting devices without human interventions. The explosive growth of IoT devices makes the amount of business data generated by machine-type communications (MTC) account for a great proportion in all communication services. The fifth-generation (5G) specification f… ▽ More

    Submitted 5 May, 2022; v1 submitted 16 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.13537

  37. arXiv:2012.13539  [pdf, ps, other

    cs.IT eess.SP

    A GCICA Grant-Free Random Access Scheme for M2M Communications in Crowded Massive MIMO Systems

    Authors: Huimei Han, Lushun Fang, Weidang Lu, Wenchao Zhai, Ying Li, Jun Zhao

    Abstract: A high success rate of grant-free random access scheme is proposed to support massive access for machine-to-machine communications in massive multipleinput multiple-output systems. This scheme allows active user equipments (UEs) to transmit their modulated uplink messages along with super pilots consisting of multiple sub-pilots to a base station (BS). Then, the BS performs channel state informati… ▽ More

    Submitted 25 December, 2020; originally announced December 2020.

  38. arXiv:2012.13537  [pdf, ps, other

    eess.SP cs.NI

    An LSTM-Aided Hybrid Random Access Scheme for 6G Machine Type Communication Networks

    Authors: Wenchao Zhai, Huimei Han, Lei Liu, Jun Zhao

    Abstract: In this paper, an LSTM-aided hybrid random access scheme (LSTMH-RA) is proposed to support diverse quality of service (QoS) requirements in 6G machine-type communication (MTC) networks, where massive MTC (mMTC) devices and ultra-reliable low latency communications (URLLC) devices coexist. In the proposed LSTMH-RA scheme, mMTC devices access the network via a timing advance (TA)-aided four-step pro… ▽ More

    Submitted 29 July, 2022; v1 submitted 25 December, 2020; originally announced December 2020.

  39. arXiv:2010.11433  [pdf, other

    eess.AS cs.SD

    Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the pr… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, 4 tables

  40. arXiv:2010.11408  [pdf, ps, other

    eess.AS cs.SD

    Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in INTERSPEECH 2020

  41. Disentangled speaker and nuisance attribute embedding for robust speaker verification

    Authors: Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states)… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Access

  42. arXiv:2008.01698  [pdf, other

    eess.AS cs.SD

    MIRNet: Learning multiple identities representations in overlapped speech

    Authors: Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

    Abstract: Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speak… ▽ More

    Submitted 6 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted in Interspeech 2020

  43. arXiv:2007.10299  [pdf, other

    cs.SD cs.LG eess.AS

    wav2shape: Hearing the Shape of a Drum Machine

    Authors: Han Han, Vincent Lostanlen

    Abstract: Disentangling and recovering physical attributes, such as shape and material, from a few waveform examples is a challenging inverse problem in audio signal processing, with numerous applications in musical acoustics as well as structural engineering. We propose to address this problem via a combination of time--frequency analysis and supervised machine learning. We start by synthesizing a dataset… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: 11 pages, 7 figures. To appear in the Proceedings of Forum Acusticum, Lyon (France), December 2020

  44. arXiv:2007.09383  [pdf, other

    cs.CV cs.LG eess.IV

    Bounding Maps for Universal Lesion Detection

    Authors: Han Li, Hu Han, S. Kevin Zhou

    Abstract: Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis systems. Many detection approaches achieve excellent results for ULD using possible bounding boxes (or anchors) as proposals. However, empirical evidence shows that using anchor-based proposals leads to a high false-positive (FP) rate. In this paper, we propose a box-to-map method to represe… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: 11 pages, 4 figures

  45. arXiv:2007.06370  [pdf, other

    eess.SP

    A novel random access scheme for M2M communication in crowded asynchronous massive MIMO systems

    Authors: Huimei Han, Wenchao Zhai, Zhefu Wu, Ying Li, Jun Zhao, Mingda Chen

    Abstract: A new random access scheme is proposed to solve the intra-cell pilot collision for M2M communication in crowded asynchronous massive multiple-input multiple-output (MIMO) systems. The proposed scheme utilizes the proposed estimation of signal parameters via rotational invariance technique enhanced (ESPRIT-E) method to estimate the effective timing offsets, and then active UEs obtain their timing e… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  46. arXiv:2004.14774  [pdf, other

    cs.CV cs.LG cs.RO eess.IV stat.ML

    IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report

    Authors: Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet , et al. (11 additional authors not shown)

    Abstract: This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and Automation Magazine. arXiv admin note: text overlap with arXiv:1911.06487

  47. Design of QAM-FBMC Waveforms Considering MMSE Receiver

    Authors: Hyungsik Han, Namshik Kim, Hyuncheol Park

    Abstract: Due to its high spectral confinement characteristics and spectral efficiency, QAM-FBMC is considered a candidate waveform to replace CP-OFDM. QAM-FBMC has inevitable non-orthogonality both in time and frequency, and the system and filter must be well-designed to minimize the interferences. However, existing QAM-FBMC studies utilize a matched filter as the receiver filter, which is not suitable for… ▽ More

    Submitted 8 January, 2020; originally announced January 2020.

    Comments: 5 pages, 10 figures, Accepted to IEEE Communications Letters

  48. arXiv:1912.03468  [pdf, ps, other

    eess.SP

    Reconfigurable Intelligent Surface Aided Power Control for Physical-Layer Broadcasting

    Authors: Huimei Han, Jun Zhao, Zehui Xiong, Dusit Niyato, Wenchao Zhai, Marco Di Renzo, Quoc-Viet Pham, Weidang Lu

    Abstract: Reconfigurable intelligent surface (RIS), a recently introduced technology for future wireless com-munication systems, enhances the spectral and energy efficiency by intelligently adjusting the propaga-tion conditions between a base station (BS) and mobile equipments (MEs). An RIS consists of manylow-cost passive reflecting elements to improve the quality of the received signal. In this paper, wes… ▽ More

    Submitted 5 March, 2022; v1 submitted 7 December, 2019; originally announced December 2019.

  49. arXiv:1911.04283  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

    Authors: Sathish Indurthi, Houjeung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim

    Abstract: End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed… ▽ More

    Submitted 27 April, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: ICASSP 2020

  50. arXiv:1910.14383  [pdf, ps, other

    cs.IT eess.SP

    Intelligent Reflecting Surface Aided Network: Power Control for Physical-Layer Broadcasting

    Authors: Huimei Han, Jun Zhao, Dusit Niyato, Marco Di Renzo, Quoc-Viet Pham

    Abstract: As a recently proposed idea for future wireless systems, intelligent reflecting surface (IRS) can assist communications between entities which do not have high-quality direct channels in between. Specifically, an IRS comprises many low-cost passive elements, each of which reflects the incident signal by incurring a phase change so that the reflected signals add coherently at the receiver. In this… ▽ More

    Submitted 27 January, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: This paper appears in the Proceedings of IEEE International Conference on Communications (ICC) 2020