Zum Hauptinhalt springen

Showing 1–50 of 123 results for author: Zhou, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.14270  [pdf, other

    eess.IV cs.CV

    Reliable Multi-modal Medical Image-to-image Translation Independent of Pixel-wise Aligned Data

    Authors: Langrui Zhou, Guang Li

    Abstract: The current mainstream multi-modal medical image-to-image translation methods face a contradiction. Supervised methods with outstanding performance rely on pixel-wise aligned training data to constrain the model optimization. However, obtaining pixel-wise aligned multi-modal medical image datasets is challenging. Unsupervised methods can be trained without paired data, but their reliability cannot… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted as a research article by Medical Physics

  2. arXiv:2408.13705  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval

    Authors: Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu

    Abstract: The success of speech-image retrieval relies on establishing an effective alignment between speech and image. Existing methods often model cross-modal interaction through simple cosine similarity of the global feature of each modality, which fall short in capturing fine-grained details within modalities. To address this issue, we introduce an effective framework and a novel learning task named cro… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.13119

  3. arXiv:2408.13201  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    EAViT: External Attention Vision Transformer for Audio Classification

    Authors: Aquib Iqbal, Abid Hasan Zim, Md Asaduzzaman Tonmoy, Limengnan Zhou, Asad Malik, Minoru Kuribayashi

    Abstract: This paper presents the External Attention Vision Transformer (EAViT) model, a novel approach designed to enhance audio classification accuracy. As digital audio resources proliferate, the demand for precise and efficient audio classification systems has intensified, driven by the need for improved recommendation systems and user personalization in various applications, including music streaming p… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  4. arXiv:2408.13119  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Coarse-to-fine Alignment Makes Better Speech-image Retrieval

    Authors: Lifeng Zhou, Yuke Li

    Abstract: In this paper, we propose a novel framework for speech-image retrieval. We utilize speech-image contrastive (SIC) learning tasks to align speech and image representations at a coarse level and speech-image matching (SIM) learning tasks to further refine the fine-grained cross-modal alignment. SIC and SIM learning tasks are jointly trained in a unified manner. To optimize the learning process, we u… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  5. arXiv:2408.09067  [pdf, ps, other

    eess.SP

    FAS vs. ARIS: Which Is More Important for FAS-ARIS Communication Systems?

    Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming Jin, Chongwen Huang, Chau Yuen

    Abstract: In this paper, we investigate the question of which technology, fluid antenna systems (FAS) or active reconfigurable intelligent surfaces (ARIS), plays a more crucial role in FAS-ARIS wireless communication systems. To address this, we develop a comprehensive system model and explore the problem from an optimization perspective. We introduce an alternating optimization (AO) algorithm incorporating… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI

    Authors: Lei Zhou, Yuzhong Zhang, Jiadong Zhang, Xuejun Qian, Chen Gong, Kun Sun, Zhongxiang Ding, Xing Wang, Zhenhui Li, Zaiyi Liu, Dinggang Shen

    Abstract: Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal trade-off between computati… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Journal ref: 2024,IEEE Transactions on Medical Imaging

  7. arXiv:2407.14651  [pdf, other

    eess.IV cs.AI cs.CV

    Improving Representation of High-frequency Components for Medical Foundation Models

    Authors: Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Xin Gao

    Abstract: Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomic… ▽ More

    Submitted 25 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  8. arXiv:2407.11307  [pdf, ps, other

    eess.SP

    Fluid Antenna-Assisted Simultaneous Wireless Information and Power Transfer Systems

    Authors: Liaoshi Zhou, Junteng Yao, Tuo Wu, Ming Jin, Chau Yuen, Fumiyuki Adachi

    Abstract: This paper examines a fluid antenna (FA)-assisted simultaneous wireless information and power transfer (SWIPT) system. Unlike traditional SWIPT systems with fixed-position antennas (FPAs), our FA-assisted system enables dynamic reconfiguration of the radio propagation environment by adjusting the positions of FAs. This capability enhances both energy harvesting and communication performance. The s… ▽ More

    Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  9. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.05738  [pdf, ps, other

    eess.SP eess.SY

    Collaborative Secret and Covert Communications for Multi-User Multi-Antenna Uplink UAV Systems: Design and Optimization

    Authors: Jinpeng Xu, Lin Bai, Xin Xie, Lin Zhou

    Abstract: Motivated by diverse secure requirements of multi-user in UAV systems, we propose a collaborative secret and covert transmission method for multi-antenna ground users to unmanned aerial vehicle (UAV) communications. Specifically, based on the power domain non-orthogonal multiple access (NOMA), two ground users with distinct security requirements, named Bob and Carlo, superimpose their signals and… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  11. arXiv:2407.02816  [pdf, other

    cs.IT eess.SP math.ST

    Large and Small Deviations for Statistical Sequence Matching

    Authors: Lin Zhou, Qianyun Wang, Jingjing Wang, Lin Bai, Alfred O. Hero

    Abstract: We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for the generalized likelihood ratio test (GLRT). We first consider the case where the number of matched pairs of sequences between the databases is known. In this case, the task is to accurately find the matched pairs of sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Extended version of ISIT paper

  12. arXiv:2406.08523  [pdf, other

    eess.IV

    A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography

    Authors: Weicheng Yan, Qiude Zhang, Yun Wu, Zhaohui Liu, Liang Zhou, Mingyue Ding, Ming Yuchi, Wu Qiu

    Abstract: Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  14. arXiv:2406.05370  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    Authors: Sanyuan Chen, Shujie Liu, Long Zhou, Yanqing Liu, Xu Tan, Jinyu Li, Sheng Zhao, Yao Qian, Furu Wei

    Abstract: This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, the new iteration introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Demo posted

  15. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  16. arXiv:2405.14770  [pdf, other

    eess.IV

    Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography

    Authors: Shuo Han, Yongshun Xu, Dayang Wang, Bahareh Morovati, Li Zhou, Jonathan S. Maltz, Ge Wang, Hengyong Yu

    Abstract: Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this pa… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages

  17. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  18. arXiv:2405.09554  [pdf, ps, other

    eess.SP cs.IT

    Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

    Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

    Abstract: In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod… ▽ More

    Submitted 17 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

  19. arXiv:2405.04274  [pdf, other

    eess.IV cs.CV

    Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression

    Authors: Zhenghao Chen, Luping Zhou, Zhihao Hu, Dong Xu

    Abstract: Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents. Although these methods have been very practical in neural image compression (NIC), their application in neural video compression (NVC) is still limited due to two main aspects: 1), video compression relies heavily on temporal redundancy, therefore updating just one or a few… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  20. arXiv:2405.01161  [pdf, ps, other

    eess.SP

    Exponentially Consistent Outlier Hypothesis Testing for Continuous Sequences

    Authors: Lina Zhu, Lin Zhou

    Abstract: In outlier hypothesis testing, one aims to detect outlying sequences among a given set of sequences, where most sequences are generated i.i.d. from a nominal distribution while outlying sequences (outliers) are generated i.i.d. from a different anomalous distribution. Most existing studies focus on discrete-valued sequences, where each data sample takes values in a finite set. To account for pract… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  21. arXiv:2405.01113  [pdf, other

    cs.CV cs.AI eess.IV

    Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

    Authors: Seungyeop Lee, Knut Peterson, Solmaz Arezoomandan, Bill Cai, Peihan Li, Lifeng Zhou, David Han

    Abstract: A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data g… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  22. arXiv:2404.10026  [pdf

    eess.IV cs.CR cs.LG

    Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection

    Authors: Lisang Zhou, Meng Wang, Ning Zhou

    Abstract: Distributed training can facilitate the processing of large medical image datasets, and improve the accuracy and efficiency of disease diagnosis while protecting patient privacy, which is crucial for achieving efficient medical image analysis and accelerating medical research progress. This paper presents an innovative approach to medical image classification, leveraging Federated Learning (FL) to… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Journal ref: Journal of Information, Technology and Policy (2023): 1-12

  23. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  24. arXiv:2404.00656  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    WavLLM: Towards Robust and Adaptive Speech Large Language Model

    Authors: Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

    Abstract: The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In th… ▽ More

    Submitted 14 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  25. arXiv:2403.18235  [pdf, other

    eess.SY math.OC

    A Parallel Vector-form $LDL^\top$ Decomposition for Accelerating Execution-time-certified $\ell_1$-penalty Soft-constrained MPC

    Authors: Liang Wu, Liwei Zhou, Richard D. Braatz

    Abstract: Handling possible infeasibility and providing an execution time certificate are two pressing requirements of real-time Model Predictive Control (MPC). To meet these two requirements simultaneously, this paper proposes an $\ell_1$-penalty soft-constrained MPC formulation that is globally feasible and solvable with an execution time certificate using our proposed algorithm. This paper proves for the… ▽ More

    Submitted 8 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 11 pages

  26. arXiv:2403.00453  [pdf, ps, other

    eess.SP

    Exploring Fairness for FAS-assisted Communication Systems: from NOMA to OMA

    Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

    Abstract: This paper addresses the fairness issue within fluid antenna system (FAS)-assisted non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) systems, where a single fixed-antenna base station (BS) transmits superposition-coded signals to two users, each with a single fluid antenna. We define fairness through the minimization of the maximum outage probability for the two users, und… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  27. arXiv:2401.15803  [pdf, other

    cs.RO cs.AI cs.CV eess.SY

    GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow

    Authors: Liguo Zhou, Yinglei Song, Yichao Gao, Zhou Yu, Michael Sodamin, Hongshen Liu, Liang Ma, Lian Liu, Hao Liu, Yang Liu, Haichuan Li, Guang Chen, Alois Knoll

    Abstract: Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte… ▽ More

    Submitted 30 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  28. arXiv:2401.00246  [pdf, other

    cs.CL cs.SD eess.AS

    Boosting Large Language Model for Speech Synthesis: An Empirical Study

    Authors: Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu Wei

    Abstract: Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prompting LLMs with perception abilities like auditory comprehension, and the effective approach for augmenting LLMs with speech synthesis capabilities re… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  29. arXiv:2311.15215  [pdf, other

    eess.SP eess.SY

    From OTFS to DD-ISAC: Integrating Sensing and Communications in the Delay Doppler Domain

    Authors: Weijie Yuan, Lin Zhou, Saeid K. Dehkordi, Shuangyang Li, Pingzhi Fan, Giuseppe Caire, H. Vincent Poor

    Abstract: Next-generation vehicular networks are expected to provide the capability of robust environmental sensing in addition to reliable communications to meet intelligence requirements. A promising solution is the integrated sensing and communication (ISAC) technology, which performs both functionalities using the same spectrum and hardware resources. Most existing works on ISAC consider the Orthogonal… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Magazine paper submitted to IEEE

  30. arXiv:2311.10416  [pdf, other

    eess.SP

    Meta-DSP: A Meta-Learning Approach for Data-Driven Nonlinear Compensation in High-Speed Optical Fiber Systems

    Authors: Xinyu Xiao, Zhennan Zhou, Bin Dong, Dingjiong Ma, Li Zhou, Jie Sun

    Abstract: Non-linear effects in long-haul, high-speed optical fiber systems significantly hinder channel capacity. While the Digital Backward Propagation algorithm (DBP) with adaptive filter (ADF) can mitigate these effects, it suffers from an overwhelming computational complexity. Recent solutions have incorporated deep neural networks in a data-driven strategy to alleviate this complexity in the DBP model… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  31. arXiv:2311.06491  [pdf, other

    eess.SY

    Nonsmooth-Optimization-Based Bandwidth Optimal Control for Precision Motion Systems

    Authors: Jingjie Wu, Lei Zhou

    Abstract: Precision motion systems are at the core of various manufacturing equipment. The rapidly increasing demand for higher productivity necessitates higher control bandwidth in the motion systems to effectively reject disturbances while maintaining excellent positioning accuracy. However, most existing optimal control methods do not explicitly optimize for control bandwidth, and the classic loop-shapin… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  32. arXiv:2310.17116  [pdf, other

    eess.AS cs.SD

    Real-time Neonatal Chest Sound Separation using Deep Learning

    Authors: Yang Yi Poh, Ethan Grooby, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Mehrtash Harandi, Faezeh Marzbanrad

    Abstract: Auscultation for neonates is a simple and non-invasive method of providing diagnosis for cardiovascular and respiratory disease. Such diagnosis often requires high-quality heart and lung sounds to be captured during auscultation. However, in most cases, obtaining such high-quality sounds is non-trivial due to the chest sounds containing a mixture of heart, lung, and noise sounds. As such, addition… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  33. arXiv:2310.12405  [pdf, other

    eess.IV cs.CV

    LoMAE: Low-level Vision Masked Autoencoders for Low-dose CT Denoising

    Authors: Dayang Wang, Yongshun Xu, Shuo Han, Zhan Wu, Li Zhou, Bahareh Morovati, Hengyong Yu

    Abstract: Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings.… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  34. arXiv:2310.07729  [pdf, other

    cs.RO eess.SY

    Energy-Aware Routing Algorithm for Mobile Ground-to-Air Charging

    Authors: Bill Cai, Fei Lu, Lifeng Zhou

    Abstract: We investigate the problem of energy-constrained planning for a cooperative system of an Unmanned Ground Vehicles (UGV) and an Unmanned Aerial Vehicle (UAV). In scenarios where the UGV serves as a mobile base to ferry the UAV and as a charging station to recharge the UAV, we formulate a novel energy-constrained routing problem. To tackle this problem, we design an energy-aware routing algorithm, a… ▽ More

    Submitted 6 August, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

  35. arXiv:2310.00141  [pdf, other

    cs.CL eess.AS

    The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

    Authors: Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews

    Abstract: Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continu… ▽ More

    Submitted 30 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted to IEEE ASRU 2023

  36. arXiv:2309.14248  [pdf, other

    eess.SY

    Transcending the Acceleration-Bandwidth Trade-off: Lightweight Precision Stages with Active Control of Flexible Dynamics

    Authors: Jingjie Wu, Lei Zhou

    Abstract: Micro/Nano-positioning stages are of great importance in a wide range of manufacturing machines and instruments. In recent years, the drastically growing demand for higher throughput and reduced power consumption in various IC manufacturing equipment calls for the development of next-generation precision positioning systems with unprecedented acceleration capability while maintaining exceptional p… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2301.04208; text overlap with arXiv:2309.11735

  37. arXiv:2309.13874  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

    Authors: Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

    Abstract: Target Speech Extraction (TSE) is a crucial task in speech processing that focuses on isolating the clean speech of a specific speaker from complex mixtures. While discriminative methods are commonly used for TSE, they can introduce distortion in terms of speech perception quality. On the other hand, generative approaches, particularly diffusion-based methods, can enhance speech quality perceptual… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  38. arXiv:2309.11735  [pdf, other

    eess.SY

    FleXstage: Lightweight Magnetically Levitated Precision Stage with Over-Actuation towards High-Throughput IC Manufacturing

    Authors: Jingjie Wu, Lei Zhou

    Abstract: Precision motion stages play a critical role in various manufacturing and inspection equipment, for example, the wafer/reticle scanning in photolithography scanners and positioning stages in wafer inspection systems. To meet the growing demand for higher throughput in chip manufacturing and inspection, it is critical to create new precision motion stages with higher acceleration capability with hi… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  39. arXiv:2309.09953  [pdf, other

    eess.SY math.AP

    PINN-based viscosity solution of HJB equation

    Authors: Tianyu Liu, Steven Ding, Jiarui Zhang, Liutao Zhou

    Abstract: This paper proposed a novel PINN-based viscosity solution for HJB equations. Although there exists work using PINN to solve HJB, but none of them gives the solution in viscosity sense. This paper reveals the fact that using the convex neural network, one can guarantee the viscosity solution and thus the neural network can easily converge to the true solution of HJB despite of the starting point.

    Submitted 18 September, 2023; originally announced September 2023.

  40. arXiv:2308.10157  [pdf, ps, other

    eess.IV cs.CV

    Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

    Authors: Zeyu Han, Yuhan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen

    Abstract: To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images. One widely adopted technique is the generative adversarial networks (GANs), yet recently, diffusion probabilistic models (DPMs) have emerged as a compelling alternat… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted and presented in MICCAI 2023. To be published in Proceedings

  41. arXiv:2308.04805  [pdf, other

    cs.IR cs.SD eess.AS

    DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

    Authors: Hongru Liang, Jingyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

    Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, published to ACM MM 2023

  42. arXiv:2307.04015  [pdf, other

    cs.SD cs.MM eess.AS

    Emotion-Guided Music Accompaniment Generation Based on Variational Autoencoder

    Authors: Qi Wang, Shubing Zhang, Li Zhou

    Abstract: Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue,… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted By International Joint Conference on Neural Networks 2023(IJCNN2023)

  43. arXiv:2307.03917  [pdf, other

    eess.AS cs.CL cs.SD

    On decoder-only architecture for speech-to-text and large language model integration

    Authors: Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

    Abstract: Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA,… ▽ More

    Submitted 2 October, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

  44. arXiv:2305.17778  [pdf

    physics.med-ph eess.IV

    PND-Net: Physics based Non-local Dual-domain Network for Metal Artifact Reduction

    Authors: Jinqiu Xia, Yiwen Zhou, Hailong Wang, Wenxin Deng, Jing Kang, Wangjiang Wu, Mengke Qi, Linghong Zhou, Jianhui Ma, Yuan Xu

    Abstract: Metal artifacts caused by the presence of metallic implants tremendously degrade the reconstructed computed tomography (CT) image quality, affecting clinical diagnosis or reducing the accuracy of organ delineation and dose calculation in radiotherapy. Recently, deep learning methods in sinogram and image domains have been rapidly applied on metal artifact reduction (MAR) task. The supervised dual-… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: 19 pages, 8 figures

  45. arXiv:2305.16107  [pdf, other

    cs.CL cs.SD eess.AS

    VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

    Authors: Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei

    Abstract: Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities. In this paper, we propose VioLA, a single auto-regressive Transformer decoder-only network that unifies various cross-modal tasks involving speech and text, such as speech-to-text, text-to-text, text-to-speech, and speech-to-speech tasks, as a con… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Working in progress

  46. arXiv:2305.14838  [pdf, other

    cs.CL cs.SD eess.AS

    ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

    Authors: Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang

    Abstract: Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks. Particularly, we propose to incorporate… ▽ More

    Submitted 14 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023, Poster

  47. arXiv:2304.10691  [pdf, other

    eess.IV cs.CV cs.LG

    SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model

    Authors: Juexiao Zhou, Xiaonan He, Liyuan Sun, Jiannan Xu, Xiuying Chen, Yuetan Chu, Longxi Zhou, Xingyu Liao, Bin Zhang, Xin Gao

    Abstract: Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin di… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  48. arXiv:2303.12441  [pdf, ps, other

    cs.NI eess.SP

    AMPLE: An Adaptive Multiple Path Loss Exponent Radio Propagation Model Considering Environmental Factors

    Authors: Lingyou Zhou, Jie Zhang, Jiliang Zhang, Oktay Cetinkaya, Steve Jubb

    Abstract: We present AMPLE -- a novel multiple path loss exponent (PLE) radio propagation model that can adapt to different environmental factors. The proposed model aims at accurately predicting path loss with low computational complexity considering environmental factors. In the proposed model, the scenario under consideration is classified into regions from a raster map, and each type of region is assign… ▽ More

    Submitted 20 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: This paper has been submitted to IEEE Transactions for possible publications

  49. arXiv:2303.03926  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

    Authors: Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

    Abstract: We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis. Specifically, we extend VALL-E and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target language speech by using both the source language speech and the target language text as prompts. VALL-E X inherits strong in-context learning capabilitie… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: We encourage readers to listen to the audio samples on our demo page: \url{https://aka.ms/vallex}

  50. arXiv:2303.00786  [pdf

    cs.CL eess.AS

    Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

    Authors: Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

    Abstract: We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information. By combining gated transformer experts with shared transformer layers, we const… ▽ More

    Submitted 7 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.