Zum Hauptinhalt springen

Showing 1–50 of 194 results for author: Li, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.08673  [pdf, other

    cs.SD cs.AI eess.AS

    MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

    Authors: Pengfei Cai, Yan Song, Kang Li, Haoyu Song, Ian McLoughlin

    Abstract: Sound event detection (SED) methods that leverage a large pre-trained Transformer encoder network have shown promising performance in recent DCASE challenges. However, they still rely on an RNN-based context network to model temporal dependencies, largely due to the scarcity of labeled data. In this work, we propose a pure Transformer-based SED model with masked-reconstruction based pre-training,… ▽ More

    Submitted 19 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: Received by interspeech 2024

  2. arXiv:2408.02085  [pdf, other

    cs.CV cs.AI cs.CL eess.SP

    Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

    Authors: Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu, Ke Li, Xing Sun

    Abstract: Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: review, survey, 28 pages, 2 figures, 4 tables

  3. arXiv:2408.00323  [pdf, other

    eess.SY

    A Novel Edge Laplacian-based Approach for Adaptive Formation Control of Uncertain Multi-agent Systems with Unified Relative Error Performance

    Authors: Kun Li, Kai Zhao, Yongduan Song, Lihua Xie

    Abstract: For most existing prescribed performance formation control methods, performance requirements are not directly imposed on the relative states between agents but on the consensus error, which lacks a clear physical interpretation of their solution. In this paper, we propose a novel adaptive prescribed performance formation control strategy, capable of guaranteeing prescribed performance on the relat… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 9 pages, 3 figures, submitted to IEEE

  4. arXiv:2407.18447  [pdf, other

    eess.AS

    Matlab-based Epoch Extraction for Speaker Differentiation

    Authors: Kunlun Li, Daniel Ferro, Xu Zhao, Abdul Jabbar Syed, Anil K Vuppala, Azeemuddin Syed

    Abstract: Epoch extraction has become increasingly popular in recent years for speech analysis research because accurately detecting the location of the Epoch is crucial for analyzing speech signals. The Epoch, occurring at the instant of excitation in the vocal tract system, particularly during glottal closure, plays a significant role in differentiating speakers in multi-speaker conversations. However, th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 8 pages, 11 figures, This paper is currently under review by the 9th ACM/IEEE Symposium on Edge Computing (SEC 2024)

    MSC Class: 94A12 (Primary); 68T10 (Secondary) ACM Class: I.5.4; H.5.5

  5. arXiv:2407.16664  [pdf, other

    cs.CL eess.AS

    Towards scalable efficient on-device ASR with transfer learning

    Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

    Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  6. arXiv:2407.14904  [pdf, other

    eess.IV cs.AI cs.CL cs.CV

    Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

    Authors: Chen Shen, Chunfeng Lian, Wanqing Zhang, Fan Wang, Jianhua Zhang, Shuanliang Fan, Xin Wei, Gongji Wang, Kehan Li, Hongshu Mu, Hao Wu, Xinggong Liang, Jianhua Ma, Zhenyuan Wang

    Abstract: Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi u… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 28 pages, 6 figures, under review

  7. arXiv:2407.02930  [pdf, other

    eess.SP

    Timely Requesting for Time-Critical Content Users in Decentralized F-RANs

    Authors: Xingran Chen, Kai Li, Kun Yang

    Abstract: With the rising demand for high-rate and timely communications, fog radio access networks (F-RANs) offer a promising solution. This work investigates age of information (AoI) performance in F-RANs, consisting of multiple content users (CUs), enhanced remote radio heads (eRRHs), and content providers (CPs). Time-critical CUs need rapid content updates from CPs but cannot communicate directly with t… ▽ More

    Submitted 3 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  8. arXiv:2407.01801  [pdf, other

    eess.SP

    Joint State and Parameter Estimation Using the Partial Errors-in-Variables Principle

    Authors: Peng Liu, Kailai Li, Gustaf Hendeby, Fredrik Gustafsson

    Abstract: This letter proposes a new method for joint state and parameter estimation in uncertain dynamical systems. We exploit the partial errors-in-variables (PEIV) principle and formulate a regression problem in the sense of weighted total least squares, where the uncertainty in the parameter prior is explicitly considered. Based thereon, the PEIV regression can be solved iteratively through the Kalman s… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 5 pages

  9. arXiv:2406.14474  [pdf, ps, other

    eess.SY

    Spatio-temporal Patterns between ENSO and Weather-related Power Outages in the Continental United States

    Authors: Long Huo, Xin Chen, Kaiwen Li, Fengying Cai, Jürgen Kurths

    Abstract: El Niño-Southern Oscillation (ENSO) exhibits significant impacts on the frequency of extreme weather events and its socio-economic implications prevail on a global scale. However, a fundamental gap still exists in understanding the relationship between the ENSO and weather-related power outages in the continental United States. Through 24-year (2000-2023) composite and statistical analysis, our st… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  10. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  11. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  12. arXiv:2406.08782   

    eess.IV cs.CV

    Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

    Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

    Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More

    Submitted 1 August, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: There are some errors in professional theory

  13. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13-page, Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  14. arXiv:2405.00056  [pdf, other

    eess.SY cs.GT

    Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

    Authors: Yousef Emami, Hao Gao, Kai Li, Luis Almeida, Eduardo Tovar, Zhu Han

    Abstract: Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs… ▽ More

    Submitted 2 May, 2024; v1 submitted 24 April, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2312.09953

    MSC Class: 00 ACM Class: C.2

  15. arXiv:2404.02063  [pdf, other

    cs.SD cs.AI eess.AS

    SPMamba: State-space model is all you need in speech separation

    Authors: Kai Li, Guo Chen

    Abstract: In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical Report. Work in progress. Code is available at https://github.com/JusperLee/SPMamba

  16. arXiv:2403.12115  [pdf, other

    eess.IV cs.CV cs.LG

    Deep learning automates Cobb angle measurement compared with multi-expert observers

    Authors: Keyu Li, Hanxue Gu, Roy Colglazier, Robert Lark, Elizabeth Hubbard, Robert French, Denise Smith, Jikai Zhang, Erin McCrum, Anthony Catanzano, Joseph Cao, Leah Waldman, Maciej A. Mazurowski, Benjamin Alman

    Abstract: Scoliosis, a prevalent condition characterized by abnormal spinal curvature leading to deformity, requires precise assessment methods for effective diagnosis and management. The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. Yet, manual measuring of Cobb angles is time-consuming and labor-intensive, fraught with signi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 17 pages, 5 figures

  17. arXiv:2403.08200  [pdf, ps, other

    eess.SY eess.SP

    Prototyping and Experimental Results for Environment-Aware Millimeter Wave Beam Alignment via Channel Knowledge Map

    Authors: Zhuoyin Dai, Di Wu, Zhenjun Dong, Kun Li, Dingyang Ding, Sihan Wang, Yong Zeng

    Abstract: Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, te… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  18. arXiv:2403.07271  [pdf, other

    math.OC cs.AI cs.LG eess.SP

    Anderson acceleration for iteratively reweighted $\ell_1$ algorithm

    Authors: Kexin Li

    Abstract: Iteratively reweighted L1 (IRL1) algorithm is a common algorithm for solving sparse optimization problems with nonconvex and nonsmooth regularization. The development of its acceleration algorithm, often employing Nesterov acceleration, has sparked significant interest. Nevertheless, the convergence and complexity analysis of these acceleration algorithms consistently poses substantial challenges.… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  19. arXiv:2403.06066  [pdf

    eess.IV cs.CV cs.LG

    CausalCellSegmenter: Causal Inference inspired Diversified Aggregation Convolution for Pathology Image Segmentation

    Authors: Dawei Fan, Yifan Gao, Jiaming Yu, Yanping Chen, Wencheng Li, Chuancong Lin, Kaibin Li, Changcai Yang, Riqing Chen, Lifang Wei

    Abstract: Deep learning models have shown promising performance for cell nucleus segmentation in the field of pathology image analysis. However, training a robust model from multiple domains remains a great challenge for cell nucleus segmentation. Additionally, the shortcomings of background noise, highly overlapping between cell nucleus, and blurred edges often lead to poor performance. To address these ch… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures, 2 tables, MICCAI

  20. arXiv:2403.05256  [pdf, other

    eess.IV cs.CV cs.LG

    DuDoUniNeXt: Dual-domain unified hybrid model for single and multi-contrast undersampled MRI reconstruction

    Authors: Ziqi Gao, Yue Zhang, Xinwen Liu, Kaiyan Li, S. Kevin Zhou

    Abstract: Multi-contrast (MC) Magnetic Resonance Imaging (MRI) reconstruction aims to incorporate a reference image of auxiliary modality to guide the reconstruction process of the target modality. Known MC reconstruction methods perform well with a fully sampled reference image, but usually exhibit inferior performance, compared to single-contrast (SC) methods, when the reference image is missing or of low… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures, 2 tables

  21. arXiv:2402.16129  [pdf, other

    eess.SP

    Localization in Reconfigurable Intelligent Surface Aided mmWave Systems: A Multiple Measurement Vector Based Channel Estimation Method

    Authors: Kunlun Li, Jiguang He, Mohammed El-Hajjar, Lie-Liang Yang

    Abstract: The sparsity of millimeter wave (mmWave) channels in the angular and temporal domains is beneficial to channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage poses a significant challenge on the localization in mmWave systems, potentially leading to substantial positioning errors. A promising solution is to employ reconfi… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  22. arXiv:2402.09430  [pdf, other

    eess.SP cs.AI cs.CV cs.MM

    WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

    Authors: Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann

    Abstract: WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user se… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

    Comments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFi

  23. arXiv:2402.04448  [pdf, other

    eess.SY

    Failure Analysis in Next-Generation Critical Cellular Communication Infrastructures

    Authors: Siguo Bi, Xin Yuan, Shuyan Hu, Kai Li, Wei Ni, Ekram Hossain, Xin Wang

    Abstract: The advent of communication technologies marks a transformative phase in critical infrastructure construction, where the meticulous analysis of failures becomes paramount in achieving the fundamental objectives of continuity, security, and availability. This survey enriches the discourse on failures, failure analysis, and countermeasures in the context of the next-generation critical communication… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  24. arXiv:2402.03897  [pdf, other

    eess.SY

    Robust Data-EnablEd Predictive Leading Cruise Control via Reachability Analysis

    Authors: Shuai Li, Chaoyi Chen, Haotian Zheng, Jiawei Wang, Qing Xu, Keqiang Li

    Abstract: Data-driven predictive control promises model-free wave-dampening strategies for Connected and Autonomous Vehicles (CAVs) in mixed traffic flow. However, its performance relies on data quality, which suffers from unknown noise and disturbances.This paper introduces a Robust Data-EnablEd Predictive Leading Cruise Control (RDeeP-LCC) method based on reachability analysis, aiming to achieve safe and… ▽ More

    Submitted 14 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures

  25. arXiv:2402.03497  [pdf, other

    eess.SP

    An Analytic Solution for Kernel Adaptive Filtering

    Authors: Benjamin Colburn, Luis G. Sanchez Giraldo, Kan Li, Jose C. Principe

    Abstract: Conventional kernel adaptive filtering (KAF) uses a prescribed, positive definite, nonlinear function to define the Reproducing Kernel Hilbert Space (RKHS), where the optimal solution for mean square error estimation is approximated using search techniques. Instead, this paper proposes to embed the full statistics of the input data in the kernel definition, obtaining the first analytical solution… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2402.03390  [pdf, other

    eess.IV cs.AI cs.CV cs.NI

    PixelGen: Rethinking Embedded Camera Systems

    Authors: Kunjun Li, Manoj Gulati, Steven Waskito, Dhairya Shah, Shantanu Chakrabarty, Ambuj Varshney

    Abstract: Embedded camera systems are ubiquitous, representing the most widely deployed example of a wireless embedded system. They capture a representation of the world - the surroundings illuminated by visible or infrared light. Despite their widespread usage, the architecture of embedded camera systems has remained unchanged, which leads to limitations. They visualize only a tiny portion of the world. Ad… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  27. TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

    Authors: Samuel Pegg, Kai Li, Xiaolin Hu

    Abstract: Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more paramete… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Journal ref: 2023 13th International Conference on Information Science and Technology (ICIST), Cairo, Egypt, 2023, pp. 243-252

  28. arXiv:2401.03150  [pdf, other

    eess.IV

    O-PRESS: Boosting OCT axial resolution with Prior guidance, Recurrence, and Equivariant Self-Supervision

    Authors: Kaiyan Li, Jingyuan Yang, Wenxuan Liang, Xingde Li, Chenxi Zhang, Lulu Chen, Chan Wu, Xiao Zhang, Zhiyan Xu, Yuelin Wang, Lihui Meng, Yue Zhang, Youxin Chen, S. Kevin Zhou

    Abstract: Optical coherence tomography (OCT) is a noninvasive technology that enables real-time imaging of tissue microanatomies. The axial resolution of OCT is intrinsically constrained by the spectral bandwidth of the employed light source while maintaining a fixed center wavelength for a specific application. Physically extending this bandwidth faces strong limitations and requires a substantial cost. We… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  29. arXiv:2312.06337  [pdf, other

    cs.SD cs.CL eess.AS

    Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations

    Authors: Tao Meng, Yuntao Shou, Wei Ai, Nan Yin, Keqin Li

    Abstract: The main task of Multimodal Emotion Recognition in Conversations (MERC) is to identify the emotions in modalities, e.g., text, audio, image and video, which is a significant development direction for realizing machine intelligence. However, many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 16 pages, 9 figures

  30. arXiv:2312.03787  [pdf, other

    eess.SY

    Detection and Mitigation of Position Spoofing Attacks on Cooperative UAV Swarm Formations

    Authors: Siguo Bi, Kai Li, Shuyan Hu, Wei Ni, Cong Wang, Xin Wang

    Abstract: Detecting spoofing attacks on the positions of unmanned aerial vehicles (UAVs) within a swarm is challenging. Traditional methods relying solely on individually reported positions and pairwise distance measurements are ineffective in identifying the misbehavior of malicious UAVs. This paper presents a novel systematic structure designed to detect and mitigate spoofing attacks in UAV swarms. We for… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: accepted by IEEE TIFS in Dec. 2023

  31. arXiv:2312.03464  [pdf, other

    cs.LG cs.SD eess.AS

    Subnetwork-to-go: Elastic Neural Network with Dynamic Training and Customizable Inference

    Authors: Kai Li, Yi Luo

    Abstract: Deploying neural networks to different devices or platforms is in general challenging, especially when the model size is large or model complexity is high. Although there exist ways for model pruning or distillation, it is typically required to perform a full round of model training or finetuning procedure in order to obtain a smaller model that satisfies the model size or complexity constraints.… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures

  32. arXiv:2311.12083  [pdf, other

    cs.CV eess.IV

    PanBench: Towards High-Resolution and High-Performance Pansharpening

    Authors: Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Pin Tao

    Abstract: Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. Whil… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  33. arXiv:2309.17189  [pdf, other

    cs.SD cs.CV eess.AS

    RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation

    Authors: Samuel Pegg, Kai Li, Xiaolin Hu

    Abstract: Audio-visual speech separation methods aim to integrate different modalities to generate high-quality separated speech, thereby enhancing the performance of downstream tasks such as speech recognition. Most existing state-of-the-art (SOTA) models operate in the time domain. However, their overly simplistic approach to modeling acoustic features often necessitates larger and more computationally in… ▽ More

    Submitted 21 March, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by The Twelfth International Conference on Learning Representations (ICLR) 2024, see https://openreview.net/forum?id=PEuDO2EiDr

  34. arXiv:2309.14474  [pdf

    eess.IV cs.CV

    Gastro-Intestinal Tract Segmentation Using an Explainable 3D Unet

    Authors: Kai Li, Jonathan Chan

    Abstract: In treating gastrointestinal cancer using radiotherapy, the role of the radiation oncologist is to administer high doses of radiation, through x-ray beams, toward the tumor while avoiding the stomach and intestines. With the advent of precise radiation treatment technology such as the MR-Linac, oncologists can visualize the daily positions of the tumors and intestines, which may vary day to day. B… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 5 pages, 8 figures, 13th Joint Symposium on Computational Intelligence (JSCI13)

  35. arXiv:2309.13018  [pdf, other

    eess.AS cs.CL cs.SD

    Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

    Authors: Jiamin Xie, Ke Li, Jinxi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

    Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  36. arXiv:2309.09734  [pdf, other

    eess.SY

    Learning Optimal Robust Control of Connected Vehicles in Mixed Traffic Flow

    Authors: Jie Li, Jiawei Wang, Shengbo Eben Li, Keqiang Li

    Abstract: Connected and automated vehicles (CAVs) technologies promise to attenuate undesired traffic disturbances. However, in mixed traffic where human-driven vehicles (HDVs) also exist, the nonlinear human-driving behavior has brought critical challenges for effective CAV control. This paper employs the policy iteration method to learn the optimal robust controller for nonlinear mixed traffic systems. Pr… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  37. arXiv:2309.03686  [pdf, other

    eess.IV cs.CV

    MS-UNet-v2: Adaptive Denoising Method and Training Strategy for Medical Image Segmentation with Small Training Data

    Authors: Haoyuan Chen, Yufei Han, Pin Xu, Yanyi Li, Kuan Li, Jianping Yin

    Abstract: Models based on U-like structures have improved the performance of medical image segmentation. However, the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing t… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  38. arXiv:2309.01994  [pdf, other

    eess.SY

    Cloud Control of Connected Vehicle under Bi-directional Time-varying delay: An Application of Predictor-observer Structured Controller

    Authors: Ji-An Pan, Qing Xu, Keqiang Li, Chunying Yang, Jianqiang Wang

    Abstract: This article is devoted to addressing the cloud control of connected vehicles, specifically focusing on analyzing the effect of bi-directional communication-induced delays. To mitigate the adverse effects of such delays, a novel predictor-observer structured controller is proposed which compensate for both measurable output delays and unmeasurable, yet bounded, input delays simultaneously. The stu… ▽ More

    Submitted 9 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

  39. arXiv:2309.01625  [pdf, other

    eess.SY

    Information Flow Topology in Mixed Traffic: A Comparative Study between "Looking Ahead" and "Looking Behind"

    Authors: Shuai Li, Haotian Zheng, Jiawei Wang, Chaoyi Chen, Qing Xu, Jianqiang Wang, Keqiang Li

    Abstract: The emergence of connected and automated vehicles (CAVs) promises smoother traffic flow. In mixed traffic where human-driven vehicles (HDVs) also exist, existing research mostly focuses on "looking ahead" (i.e., the CAVs receive information from preceding vehicles) strategies for CAVs, while recent work reveals that "looking behind" (i.e., the CAVs receive information from their rear vehicles) str… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: This paper has been accepted by 26th IEEE International Conference on Intelligent Transportation Systems ITSC 2023

  40. arXiv:2308.13421  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Exploiting Diverse Feature for Multimodal Sentiment Analysis

    Authors: Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang

    Abstract: In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation aims to predict the continuous arousal and valence values of a participant based on their audio-visual, language, and physiological signal modalities data. Considering different people have personal characteristics, the main cha… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  41. arXiv:2308.09223  [pdf, other

    eess.IV cs.CV cs.LG

    DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

    Authors: Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris N. Metaxas

    Abstract: Accurate 3D cardiac reconstruction from cine magnetic resonance imaging (cMRI) is crucial for improved cardiovascular disease diagnosis and understanding of the heart's motion. However, current cardiac MRI-based reconstruction technology used in clinical settings is 2D with limited through-plane resolution, resulting in low-quality reconstructed cardiac volumes. To better reconstruct 3D cardiac vo… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted in MICCAI 2023

  42. arXiv:2308.08143  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

    Authors: Kai Li, Runxuan Yang, Fuchun Sun, Xiaolin Hu

    Abstract: Recent research has made significant progress in designing fusion modules for audio-visual speech separation. However, they predominantly focus on multi-modal fusion at a single temporal scale of auditory and visual features without employing selective attention mechanisms, which is in sharp contrast with the brain. To address this issue, We propose a novel model called Intra- and Inter-Attention… ▽ More

    Submitted 2 February, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 18 pages, 6 figures

  43. arXiv:2308.06981  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

    Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

    Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted for Transactions of the International Society for Music Information Retrieval

  44. arXiv:2308.04743  [pdf

    eess.SY cs.RO math.DS

    Missile guidance law design based on free-time convergent error dynamics

    Authors: Yuanhe Liu, Nianhao Xie, Kebo Li, Yangang Liang

    Abstract: The design of guidance law can be considered a kind of finite-time error-tracking problem. A unified free-time convergent guidance law design approach based on the error dynamics and the free-time convergence method is proposed in this paper. Firstly, the desired free-time convergent error dynamics approach is proposed, and its convergent time can be set freely, which is independent of the initial… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 13 pages, 6 figures, accepted by Journal of Systems Engineering and Electronics

  45. arXiv:2308.04417  [pdf, other

    cs.CV cs.LG eess.IV

    DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images

    Authors: Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao

    Abstract: Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image qual… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 13 pages, 7 figures

  46. arXiv:2307.11795  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Prompting Large Language Models with Speech Recognition Abilities

    Authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings,… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  47. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  48. arXiv:2307.07829  [pdf, other

    eess.IV cs.CV

    HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance

    Authors: Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxiang Tang, Yulun Zhang, Xiu Li, Yaowei Wang

    Abstract: Unpaired Medical Image Enhancement (UMIE) aims to transform a low-quality (LQ) medical image into a high-quality (HQ) one without relying on paired images for training. While most existing approaches are based on Pix2Pix/CycleGAN and are effective to some extent, they fail to explicitly use HQ information to guide the enhancement process, which can lead to undesired artifacts and structural distor… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: 14 pages, 10 figures

  49. arXiv:2307.00828  [pdf, other

    eess.SY cs.LG math.OC

    Model-Assisted Probabilistic Safe Adaptive Control With Meta-Bayesian Learning

    Authors: Shengbo Wang, Ke Li, Yin Yang, Yuting Cao, Tingwen Huang, Shiping Wen

    Abstract: Breaking safety constraints in control systems can lead to potential risks, resulting in unexpected costs or catastrophic damage. Nevertheless, uncertainty is ubiquitous, even among similar tasks. In this paper, we develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we… ▽ More

    Submitted 13 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  50. arXiv:2307.00637  [pdf, other

    eess.SY

    On Embedding B-Splines in Recursive State Estimation

    Authors: Kailai Li

    Abstract: We present a principled study on establishing a novel probabilistic framework for state estimation. B-splines are embedded in the state-space modeling as a continuous-time intermediate between the states of recurrent control points and asynchronous sensor measurements. Based thereon, the spline-embedded recursive estimation scheme is established w.r.t. common sensor fusion tasks, and the correspon… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.