Zum Hauptinhalt springen

Showing 1–50 of 107 results for author: Yao, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.15474  [pdf, other

    eess.AS cs.SD

    Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

    Authors: Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie

    Abstract: Rap, a prominent genre of vocal performance, remains underexplored in vocal generation. General vocal synthesis depends on precise note and duration inputs, requiring users to have related musical knowledge, which limits flexibility. In contrast, rap typically features simpler melodies, with a core focus on a strong rhythmic sense that harmonizes with accompanying beats. In this paper, we propose… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  2. arXiv:2408.13447  [pdf, ps, other

    eess.SP

    FAS-RIS Communication: Model, Analysis, and Optimization

    Authors: Junteng Yao, Jianchao Zheng, Tuo Wu, Ming Jin, Chau Yuen, Kai-Kit Wong, Fumiyuki Adachi

    Abstract: This correspondence investigates the novel fluid antenna system (FAS) technology, combining with reconfigurable intelligent surface (RIS) for wireless communications, where a base station (BS) communicates with a FAS-enabled user with the assistance of a RIS. To analyze this technology, we derive the outage probability based on the block-diagonal matrix approximation (BDMA) model. With this, we ob… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  3. arXiv:2408.13444  [pdf, ps, other

    eess.SP

    FAS-RIS: A Block-Correlation Model Analysis

    Authors: Xiazhi Lai, Junteng Yao, Kangda Zhi, Tuo Wu, David Morales-Jimenez, Kai-Kit Wong

    Abstract: In this correspondence, we analyze the performance of a reconfigurable intelligent surface (RIS)-aided communication system that involves a fluid antenna system (FAS)-enabled receiver. By applying the central limit theorem (CLT), we derive approximate expressions for the system outage probability when the RIS has a large number of elements. Also, we adopt the block-correlation channel model to sim… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  4. arXiv:2408.12162  [pdf, ps, other

    cs.IT eess.SP

    Empowering Over-the-Air Personalized Federated Learning via RIS

    Authors: Wei Shi, Jiacheng Yao, Jindan Xu, Wei Xu, Lexi Xu, Chunming Zhao

    Abstract: Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, AirComp-enabled FL (AirFL) with a single global consensus model fails to address the data heterogeneity in real-life FL scenarios with non-independent and identically distributed l… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by SCIENCE CHINA Information Sciences

  5. arXiv:2408.09067  [pdf, ps, other

    eess.SP

    FAS vs. ARIS: Which Is More Important for FAS-ARIS Communication Systems?

    Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming Jin, Chongwen Huang, Chau Yuen

    Abstract: In this paper, we investigate the question of which technology, fluid antenna systems (FAS) or active reconfigurable intelligent surfaces (ARIS), plays a more crucial role in FAS-ARIS wireless communication systems. To address this, we develop a comprehensive system model and explore the problem from an optimization perspective. We introduce an alternating optimization (AO) algorithm incorporating… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2407.18054  [pdf, other

    eess.IV cs.CV

    LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels

    Authors: Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, Xinggang Wang

    Abstract: The segmentation of cell nuclei in tissue images stained with the blood dye hematoxylin and eosin (H$\&$E) is essential for various clinical applications and analyses. Due to the complex characteristics of cellular morphology, a large receptive field is considered crucial for generating high-quality segmentation. However, previous methods face challenges in achieving a balance between the receptiv… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  7. arXiv:2407.17460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

    Authors: Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K. Roy-Chowdhury, Jiachen Li

    Abstract: Reinforcement Learning (RL) has enabled social robots to generate trajectories without human-designed rules or interventions, which makes it more effective than hard-coded systems for generalizing to complex real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians while previous RL-based solutions fall short in safety perf… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Project website: https://sonic-social-nav.github.io/

  8. arXiv:2407.12648  [pdf, ps, other

    cs.IT eess.SP

    Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

    Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 17 pages

  9. arXiv:2407.11629  [pdf, other

    eess.AS

    MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution designed to conceal the speaker's identity while preserving the linguistic content and para-linguistic information of the original speech. While most prior studies focus solely on a single language, an ideal speaker anonymization system should be capable of handling multiple languages. This paper proposes MUSA, a Multi-lingual Speak… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Submitted to TASLP

  10. arXiv:2407.11307  [pdf, ps, other

    eess.SP

    Fluid Antenna-Assisted Simultaneous Wireless Information and Power Transfer Systems

    Authors: Liaoshi Zhou, Junteng Yao, Tuo Wu, Ming Jin, Chau Yuen, Fumiyuki Adachi

    Abstract: This paper examines a fluid antenna (FA)-assisted simultaneous wireless information and power transfer (SWIPT) system. Unlike traditional SWIPT systems with fixed-position antennas (FPAs), our FA-assisted system enables dynamic reconfiguration of the radio propagation environment by adjusting the positions of FAs. This capability enhances both energy harvesting and communication performance. The s… ▽ More

    Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  11. arXiv:2407.08141  [pdf, ps, other

    eess.SP

    A Framework of FAS-RIS Systems: Performance Analysis and Throughput Optimization

    Authors: Junteng Yao, Xiazhi Lai, Kangda Zhi, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Chau Yuen, Kai-Kit Wong

    Abstract: In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  12. arXiv:2407.00718  [pdf, other

    eess.IV cs.CV

    ASPS: Augmented Segment Anything Model for Polyp Segmentation

    Authors: Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han

    Abstract: Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performan… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  13. arXiv:2406.16876  [pdf, other

    eess.SP

    Near-Field Mobile Tracking: A Framework of Using XL-RIS Information

    Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Junteng Yao, Hong Ren, Maged Elkashlan, Chau Yuen

    Abstract: This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more… ▽ More

    Submitted 5 August, 2024; v1 submitted 3 April, 2024; originally announced June 2024.

  14. arXiv:2406.15047  [pdf, other

    cs.IT eess.SP

    Optimal Transmit Signal Design for Multi-Target MIMO Sensing Exploiting Prior Information

    Authors: Jiayi Yao, Shuowen Zhang

    Abstract: In this paper, we study the transmit signal optimization in a multiple-input multiple-output (MIMO) radar system for sensing the angle information of multiple targets via their reflected echo signals. We consider a challenging and practical scenario where the angles to be sensed are unknown and random, while their probability information is known a priori for exploitation. First, we establish an a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: submitted for possible piblication

  15. arXiv:2406.07846  [pdf, other

    eess.AS

    DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

    Authors: Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  16. arXiv:2406.02233  [pdf, other

    eess.AS

    Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction

    Authors: Renmingyue Du, Jixun Yao, Qiuqiang Kong, Yin Cao

    Abstract: Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to its important role in deepfake algorithm recognition. However, most of the current approaches for detecting OOD in deepfake algorithm recognition rely… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  17. arXiv:2405.15271  [pdf

    eess.SY physics.ins-det physics.optics

    Seamless Integration and Implementation of Distributed Contact and Contactless Vital Sign Monitoring

    Authors: Dingding Liang, Yang Chen, Jiawei Gao, Taixia Shi, Jianping Yao

    Abstract: Real-time vital sign monitoring is gaining immense significance not only in the medical field but also in personal health management. Facing the needs of different application scenarios of the smart and healthy city in the future, the low-cost, large-scale, scalable, and distributed vital sign monitoring system is of great significance. In this work, a seamlessly integrated contact and contactless… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 14 pages,9 figures

  18. arXiv:2405.12478  [pdf, other

    eess.SY

    Efficient Economic Model Predictive Control of Water Treatment Process with Learning-based Koopman Operator

    Authors: Minghao Han, Jingshi Yao, Adrian Wing-Keung Law, Xunyuan Yin

    Abstract: Used water treatment plays a pivotal role in advancing environmental sustainability. Economic model predictive control holds the promise of enhancing the overall operational performance of the water treatment facilities. In this study, we propose a data-driven economic predictive control approach within the Koopman modeling framework. First, we propose a deep learning-enabled input-output Koopman… ▽ More

    Submitted 14 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  19. arXiv:2405.10786  [pdf, other

    eess.AS

    Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution that aims to conceal the speaker's identity while preserving the naturalness and distinctiveness of the original speech. Mainstream approaches use an utterance-level vector from a pre-trained automatic speaker verification (ASV) model to represent speaker identity, which is then averaged or modified for anonymization. However, these… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  20. Array SAR 3D Sparse Imaging Based on Regularization by Denoising Under Few Observed Data

    Authors: Yangyang Wang, Xu Zhan, Jing Gao, Jinjie Yao, Shunjun Wei, JianSheng Bai

    Abstract: Array synthetic aperture radar (SAR) three-dimensional (3D) imaging can obtain 3D information of the target region, which is widely used in environmental monitoring and scattering information measurement. In recent years, with the development of compressed sensing (CS) theory, sparse signal processing is used in array SAR 3D imaging. Compared with matched filter (MF), sparse SAR imaging can effect… ▽ More

    Submitted 26 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  21. arXiv:2404.05374  [pdf

    eess.SP

    Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

    Authors: Taixia Shi, Yang Chen, Jianping Yao

    Abstract: Integration of radar, wireless communications, and spectrum sensing is being investigated for 6G with an increased spectral efficiency. Microwave photonics (MWP), a technique that combines microwave engineering and photonic technology to take advantage of the wide bandwidth offered by photonics for microwave signal generation and processing is considered an effective solution for the implementatio… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 18 pages, 10 figures

  22. arXiv:2404.04878  [pdf, other

    eess.IV cs.CV

    CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

    Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

    Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: CVPR accepted paper

  23. arXiv:2403.10323  [pdf, ps, other

    eess.SP

    Joint Optimization for Achieving Covertness in MIMO Over-the-Air Computation Networks

    Authors: Junteng Yao, Tuo Wu, Ming Jin, Cunhua Pan, Quanzhong Li, Jinhong Yuan

    Abstract: This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-sq… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  24. arXiv:2403.00453  [pdf, ps, other

    eess.SP

    Exploring Fairness for FAS-assisted Communication Systems: from NOMA to OMA

    Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

    Abstract: This paper addresses the fairness issue within fluid antenna system (FAS)-assisted non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) systems, where a single fixed-antenna base station (BS) transmits superposition-coded signals to two users, each with a single fluid antenna. We define fairness through the minimization of the maximum outage probability for the two users, und… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  25. arXiv:2402.16894  [pdf, other

    q-bio.NC eess.IV

    Topological Analysis of Mouse Brain Vasculature via 3D Light-sheet Microscopy Images

    Authors: Jiachen Yao, Nina Hagemann, Qiaojie Xiong, Jianxu Chen, Dirk M. Hermann, Chao Chen

    Abstract: Vascular networks play a crucial role in understanding brain functionalities. Brain integrity and function, neuronal activity and plasticity, which are crucial for learning, are actively modulated by their local environments, specifically vascular networks. With recent developments in high-resolution 3D light-sheet microscopy imaging together with tissue processing techniques, it becomes feasible… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  26. arXiv:2402.15335  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Rank Representations Meets Deep Unfolding: A Generalized and Interpretable Network for Hyperspectral Anomaly Detection

    Authors: Chenyu Li, Bing Zhang, Danfeng Hong, Jing Yao, Jocelyn Chanussot

    Abstract: Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data. These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness on the separation of background and target features and the reliance on manual parameter selection. To this end, we build a new set… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  27. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, Jingyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  28. arXiv:2311.15420  [pdf

    eess.SY cs.CV

    Data-Driven Modelling for Harmonic Current Emission in Low-Voltage Grid Using MCReSANet with Interpretability Analysis

    Authors: Jieyu Yao, Hao Yu, Paul Judge, Jiabin Jia, Sasa Djokic, Verner Püvi, Matti Lehtonen, Jan Meyer

    Abstract: Even though the use of power electronics PE loads offers enhanced electrical energy conversion efficiency and control, they remain the primary sources of harmonics in grids. When diverse loads are connected in the distribution system, their interactions complicate establishing analytical models for the relationship between harmonic voltages and currents. To solve this, our paper presents a data-dr… ▽ More

    Submitted 19 January, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  29. arXiv:2310.07550  [pdf, other

    eess.SP

    Proactive Monitoring via Jamming in Fluid Antenna Systems

    Authors: Junteng Yao, Tuo Wu, Xiazhi Lai, Ming Jin, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

    Abstract: This paper investigates the efficacy of utilizing fluid antenna system (FAS) at a legitimate monitor to oversee suspicious communication. The monitor switches the antenna position to minimize its outage probability for enhancing the monitoring performance. Our objective is to maximize the average monitoring rate, whose expression involves the integral of the first-order Marcum $Q$ function. The op… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 3 figs, submitted to IEEE journal

  30. arXiv:2310.05051  [pdf, other

    cs.SD eess.AS

    SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

    Authors: Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 8 pages, 3 figures; Accepted by ASRU2023

  31. arXiv:2309.16499  [pdf, other

    cs.CV eess.IV

    Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

    Authors: Danfeng Hong, Bing Zhang, Hao Li, Yuxuan Li, Jing Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

    Abstract: Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-ed… ▽ More

    Submitted 3 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  32. arXiv:2309.15496  [pdf, other

    eess.AS cs.SD

    DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC… ▽ More

    Submitted 18 January, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP2024

  33. arXiv:2309.11715   

    cs.CV eess.IV

    Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal

    Authors: Xiao Feng Zhang, Tian Yi Song, Jia Wei Yao

    Abstract: Segment Anything (SAM), an advanced universal image segmentation model trained on an expansive visual dataset, has set a new benchmark in image segmentation and computer vision. However, it faced challenges when it came to distinguishing between shadows and their backgrounds. To address this, we developed Deshadow-Anything, considering the generalization of large-scale datasets, and we performed F… ▽ More

    Submitted 2 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: it needs revised

  34. arXiv:2309.09262  [pdf, other

    eess.AS cs.SD

    PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

    Authors: Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Style voice conversion aims to transform the style of source speech to a desired style according to real-world application demands. However, the current style voice conversion approach relies on pre-defined labels or reference speech to control the conversion process, which leads to limitations in style diversity or falls short in terms of the intuitive and interpretability of style representation… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  35. arXiv:2309.07582  [pdf, other

    eess.SP

    On Performance of Fluid Antenna System using Maximum Ratio Combining

    Authors: Xiazhi Lai, Tuo Wu, Junteng Yao, Cunhua Pan, Maged Elkashlan, Kai-Kit Wong

    Abstract: This letter investigates a fluid antenna system (FAS) where multiple ports can be activated for signal combining for enhanced receiver performance. Given $M$ ports at the FAS, the best $K$ ports out of the $M$ available ports are selected before maximum ratio combining (MRC) is used to combine the received signals from the selected ports. The aim of this letter is to study the achievable performan… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: submitted to IEEE journal

  36. arXiv:2309.05905  [pdf, other

    eess.SY

    Geometry Enhanced Optimal Control Technique for Acrobatic Flip Motion of Quadcopter

    Authors: Jie Yao

    Abstract: A nonlinear optimal control strategy, named the geometry enhanced finite time $\boldsymbol{θ-}$D technique, is proposed to manipulate the acrobatic flip flight of variable pitch (VP) quadcopter unmanned aerial vehicles (abbreviated as VP copter). A unique superiority of the VP copter, which can provide the thrust in both positive and negative vertical directions by varying the pitch angles of blad… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 8 pages, 7 figures

  37. arXiv:2309.02835  [pdf

    physics.optics eess.IV

    A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

    Authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi Jin, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang

    Abstract: Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 25 pages, 5 figures and 1 table

  38. arXiv:2309.00929  [pdf, other

    cs.SD eess.AS

    Timbre-reserved Adversarial Attack in Speaker Identification

    Authors: Qing Wang, Jixun Yao, Li Zhang, Pengcheng Guo, Lei Xie

    Abstract: As a type of biometric identification, a speaker identification (SID) system is confronted with various kinds of attacks. The spoofing attacks typically imitate the timbre of the target speakers, while the adversarial attacks confuse the SID system by adding a well-designed adversarial perturbation to an arbitrary speech. Although the spoofing attack copies a similar timbre as the victim, it does… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: 11 pages, 8 figures

  39. arXiv:2308.04025  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

    Authors: Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, Jingjing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao

    Abstract: Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world. While current studies primarily focus on recognition and generalization abilities, our research pioneers an investigation into the reliability of SER methods in the presence of semantic data shifts and explores how to exert fine-gra… ▽ More

    Submitted 22 March, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages

  40. arXiv:2308.02498  [pdf, other

    eess.IV cs.CV cs.LG

    Learning to Segment from Noisy Annotations: A Spatial Correction Approach

    Authors: Jiachen Yao, Yikai Zhang, Songzhu Zheng, Mayank Goswami, Prateek Prasanna, Chao Chen

    Abstract: Noisy labels can significantly affect the performance of deep neural networks (DNNs). In medical image segmentation tasks, annotations are error-prone due to the high demand in annotation time and in the annotators' expertise. Existing methods mostly assume noisy labels in different pixels are \textit{i.i.d}. However, segmentation label noise usually has strong spatial correlation and has prominen… ▽ More

    Submitted 20 July, 2023; originally announced August 2023.

  41. arXiv:2308.00507  [pdf, other

    eess.IV cs.CV cs.LG

    Improved Prognostic Prediction of Pancreatic Cancer Using Multi-Phase CT by Integrating Neural Distance and Texture-Aware Transformer

    Authors: Hexin Dong, Jiawen Yao, Yuxing Tang, Mingze Yuan, Yingda Xia, Jian Zhou, Hong Lu, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Yu Shi, Ling Zhang

    Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that descr… ▽ More

    Submitted 13 September, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: MICCAI 2023

  42. arXiv:2307.12795  [pdf, other

    cs.IT eess.SP

    Superimposed RIS-phase Modulation for MIMO Communications: A Novel Paradigm of Information Transfer

    Authors: Jiacheng Yao, Jindan Xu, Wei Xu, Chau Yuen, Xiaohu You

    Abstract: Reconfigurable intelligent surface (RIS) is regarded as an important enabling technology for the sixth-generation (6G) network. Recently, modulating information in reflection patterns of RIS, referred to as reflection modulation (RM), has been proven in theory to have the potential of achieving higher transmission rate than existing passive beamforming (PBF) schemes of RIS. To fully unlock this po… ▽ More

    Submitted 9 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE TWC

  43. arXiv:2307.12793  [pdf, other

    cs.IT eess.SP

    Imperfect CSI: A Key Factor of Uncertainty to Over-the-Air Federated Learning

    Authors: Jiacheng Yao, Zhaohui Yang, Wei Xu, Dusit Niyato, Xiaohu You

    Abstract: Over-the-air computation (AirComp) has recently been identified as a prominent technique to enhance communication efficiency of wireless federated learning (FL). This letter investigates the impact of channel state information (CSI) uncertainty at the transmitter on an AirComp enabled FL (AirFL) system with the truncated channel inversion strategy. To characterize the performance of the AirFL syst… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Submitted to IEEE for possible publication

  44. arXiv:2307.08268  [pdf, other

    eess.IV cs.CV

    Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network

    Authors: Ke Yan, Xiaoli Yin, Yingda Xia, Fakai Wang, Shu Wang, Yuan Gao, Jiawen Yao, Chunli Li, Xiaoyu Bai, Jingren Zhou, Ling Zhang, Le Lu, Yu Shi

    Abstract: Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and… ▽ More

    Submitted 21 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023, code: https://github.com/alibaba-damo-academy/pixel-lesion-patient-network

  45. arXiv:2307.04525  [pdf, other

    eess.IV cs.CV cs.LG

    Cluster-Induced Mask Transformers for Effective Opportunistic Gastric Cancer Screening on Non-contrast CT Scans

    Authors: Mingze Yuan, Yingda Xia, Xin Chen, Jiawen Yao, Junli Wang, Mingyan Qiu, Hexin Dong, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Ling Zhang

    Abstract: Gastric cancer is the third leading cause of cancer-related mortality worldwide, but no guideline-recommended screening test exists. Existing methods can be invasive, expensive, and lack sensitivity to identify early-stage gastric cancer. In this study, we explore the feasibility of using a deep learning approach on non-contrast CT scans for gastric cancer detection. We propose a novel cluster-ind… ▽ More

    Submitted 15 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023

  46. arXiv:2306.07848  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition

    Authors: Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao

    Abstract: Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER). In this paper, we propose GEmo-CLAP, a kind of gender-attribute-enhanced contrastive language-audio pretraining (CLAP) method for SER. Specifically, we first construct an effective emotion CLAP (Emo-CLAP) for SER,… ▽ More

    Submitted 4 December, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 5 pages

  47. arXiv:2305.19020  [pdf, other

    cs.SD eess.AS

    Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification

    Authors: Qing Wang, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie

    Abstract: In this study, we propose a timbre-reserved adversarial attack approach for speaker identification (SID) to not only exploit the weakness of the SID model but also preserve the timbre of the target speaker in a black-box attack setting. Particularly, we generate timbre-reserved fake audio by adding an adversarial constraint during the training of the voice conversion model. Then, we leverage a pse… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 5 pages

  48. arXiv:2305.12425  [pdf, other

    eess.AS cs.SD

    DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

    Authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

    Abstract: Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities. Unlike typical (non-streaming) voice conversion, which can leverage the entire utterance as full context, streaming voice conversion faces significant challenges due to the missing future information, resulting in degraded intelligibility,… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

  49. arXiv:2305.05348  [pdf, other

    cs.IT eess.SP

    Robust Beamforming Design for RIS-aided Cell-free Systems with CSI Uncertainties and Capacity-limited Backhaul

    Authors: Jiacheng Yao, Jindan Xu, Wei Xu, Derrick Wing Kwan Ng, Chau Yuen, Xiaohu You

    Abstract: In this paper, we consider the robust beamforming design in a reconfigurable intelligent surface (RIS)-aided cell-free (CF) system considering the channel state information (CSI) uncertainties of both the direct channels and cascaded channels at the transmitter with capacity-limited backhaul. We jointly optimize the precoding at the access points (APs) and the phase shifts at multiple RISs to maxi… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted by IEEE TCOM

  50. arXiv:2303.00942  [pdf, other

    eess.IV cs.CV cs.LG

    Meta-information-aware Dual-path Transformer for Differential Diagnosis of Multi-type Pancreatic Lesions in Multi-phase CT

    Authors: Bo Zhou, Yingda Xia, Jiawen Yao, Le Lu, Jingren Zhou, Chi Liu, James S. Duncan, Ling Zhang

    Abstract: Pancreatic cancer is one of the leading causes of cancer-related death. Accurate detection, segmentation, and differential diagnosis of the full taxonomy of pancreatic lesions, i.e., normal, seven major types of lesions, and other lesions, is critical to aid the clinical decision-making of patient management and treatment. However, existing works focus on segmentation and classification for very s… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted at Information Processing in Medical Imaging (IPMI 2023)