Zum Hauptinhalt springen

Showing 1–26 of 26 results for author: Shen, K

Searching in archive eess. Search in all archives.
.
  1. An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming

    Authors: Wenhai Lai, Zheyu Wu, Yi Feng, Kaiming Shen, Ya-Feng Liu

    Abstract: Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif… ▽ More

    Submitted 28 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 5 pages

    Journal ref: IEEE Signal Processing Letters 2024

  2. arXiv:2407.12648  [pdf, ps, other

    cs.IT eess.SP

    Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

    Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 17 pages

  3. arXiv:2406.10910  [pdf, ps, other

    cs.IT eess.SP

    Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications

    Authors: Yannan Chen, Yi Feng, Xiaoyang Li, Licheng Zhao, Kaiming Shen

    Abstract: This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2404.03204  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

    Authors: Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

    Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  5. arXiv:2403.03100  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

    Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

    Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

  6. arXiv:2312.16918  [pdf, other

    cs.IT eess.SP

    Intelligent Surfaces Empowered Wireless Network: Recent Advances and The Road to 6G

    Authors: Qingqing Wu, Beixiong Zheng, Changsheng You, Lipeng Zhu, Kaiming Shen, Xiaodan Shao, Weidong Mei, Boya Di, Hongliang Zhang, Ertugrul Basar, Lingyang Song, Marco Di Renzo, Zhi-Quan Luo, Rui Zhang

    Abstract: Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities… ▽ More

    Submitted 24 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  7. arXiv:2311.04546  [pdf, ps, other

    eess.SP cs.IT

    Discerning and Enhancing the Weighted Sum-Rate Maximization Algorithms in Communications

    Authors: Zepeng Zhang, Ziping Zhao, Kaiming Shen, Daniel P. Palomar, Wei Yu

    Abstract: Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximiza… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  8. arXiv:2310.08705  [pdf, other

    cs.CV eess.IV

    A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches

    Authors: Kangqing Shen, Gemine Vivone, Xiaoyuan Yang, Simone Lolli, Michael Schmitt

    Abstract: Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is stil… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 16 pages, 16 figures, 6 tables

  9. arXiv:2309.02285  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    PromptTTS 2: Describing and Generating Voices with Text Prompt

    Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

    Abstract: Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text… ▽ More

    Submitted 11 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Demo page: https://speechresearch.github.io/prompttts2

  10. arXiv:2309.01480  [pdf, ps, other

    cs.SD cs.AI eess.AS

    EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events

    Authors: Ying Ren, Kailai Shen, Zhe Ye, Diqun Yan

    Abstract: Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks… ▽ More

    Submitted 11 September, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted by ICME2024

  11. arXiv:2308.04179  [pdf, other

    cs.CR cs.SD eess.AS eess.SP

    Breaking Speaker Recognition with PaddingBack

    Authors: Zhe Ye, Diqun Yan, Li Dong, Kailai Shen

    Abstract: Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transfo… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

  12. arXiv:2305.18998  [pdf, other

    cs.IT eess.SP

    Blind Beamforming for Intelligent Reflecting Surface in Fading Channels without CSI

    Authors: Wenhai Lai, Wenyu Wang, Fan Xu, Xin Li, Shaobo Niu, Kaiming Shen

    Abstract: This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 14 pages, 14 figures

  13. arXiv:2304.09116  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

    Authors: Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

    Abstract: Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating is… ▽ More

    Submitted 30 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: A large-scale text-to-speech and singing voice synthesis system with latent diffusion models. Update: NaturalSpeech 2 extension to voice conversion and speech enhancement

  14. Coordinating Multiple Intelligent Reflecting Surfaces without Channel Information

    Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means… ▽ More

    Submitted 8 January, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: 16 pages

    Journal ref: IEEE Transactions on Signal Processing 2024

  15. arXiv:2302.06727  [pdf, other

    cs.LG cs.CV eess.IV

    Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging

    Authors: Charlie Tran, Kai Shen, Kang Liu, Akshay Ashok, Adolfo Ramirez-Zamora, Jinghua Chen, Yulin Li, Ruogu Fang

    Abstract: Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable scr… ▽ More

    Submitted 18 February, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: 17 pages, 4 figures, 2 tables, 4 supplementary tables

  16. A Linear Time Algorithm for the Optimal Discrete IRS Beamforming

    Authors: Shuyi Ren, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best r… ▽ More

    Submitted 7 September, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 5 pages

  17. arXiv:2205.09306  [pdf, other

    cs.IT eess.SP

    Joint Device Selection and Power Control for Wireless Federated Learning

    Authors: Wei Guo, Ran Li, Chuan Huang, Xiaoqi Qin, Kaiming Shen, Wei Zhang

    Abstract: This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training an… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

  18. Configuring Intelligent Reflecting Surface with Performance Guarantees: Optimal Beamforming

    Authors: Yaowen Zhang, Kaiming Shen, Shuyi Ren, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that gu… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

    Comments: 9 pages, 10 figures

  19. arXiv:2104.06189  [pdf

    cs.RO eess.SY

    Numerical Energy Analysis of In-wheel Motor Driven Autonomous Electric Vehicles

    Authors: Kang Shen, Fan Yang, Xinyou Ke, Cheng Zhang, Chris Yuan

    Abstract: Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

  20. arXiv:2010.05382  [pdf

    eess.IV cs.CV physics.optics

    Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy

    Authors: Kyrollos Yanny, Nick Antipa, William Liberti, Sam Dehaeck, Kristina Monakhova, Fanglin Linda Liu, Konlin Shen, Ren Ng, Laura Waller

    Abstract: Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal pha… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: Published with Nature Springer in Light: Science and Applications

    Journal ref: Light: Science & Applications 9.1 (2020): 1-13

  21. arXiv:2006.13668  [pdf, ps, other

    cs.IT eess.SP

    Stochastic Transceiver Optimization in Multi-Tags Symbiotic Radio Systems

    Authors: Xihan Chen, Hei Victor Cheng, Kaiming Shen, An Liu, Min-Jian Zhao

    Abstract: Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transce… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted by IEEE Internet Things J

  22. arXiv:1912.11678  [pdf, other

    cs.IT eess.SP

    Joint Annotator-and-Spectrum Allocation in Wireless Networks for Crowd Labelling

    Authors: Xiaoyang Li, Guangxu Zhu, Kaiming Shen, Wei Yu, Yi Gong, Kaibin Huang

    Abstract: The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new persp… ▽ More

    Submitted 25 December, 2019; originally announced December 2019.

  23. arXiv:1910.01150  [pdf

    eess.SP stat.ML

    Fault Detection Using Nonlinear Low-Dimensional Representation of Sensor Data

    Authors: Kai Shen, Anya Mcguirk, Yuwei Liao, Arin Chaudhuri, Deovrat Kakde

    Abstract: Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection wi… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

  24. arXiv:1908.07408  [pdf, ps, other

    cs.IT eess.SP math.OC

    Mixed-Timescale Beamforming and Power Splitting for Massive MIMO Aided SWIPT IoT Network

    Authors: Xihan Chen, Hei Victor Cheng, An Liu, Kaiming Shen, Min-Jian Zhao

    Abstract: Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constr… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: An extended version of a manuscript submitted to IEEE WCL

  25. arXiv:1905.09386  [pdf, other

    eess.SP cs.ET

    A Sub-mm$^3$ Ultrasonic Free-floating Implant for Multi-mote Neural Recording

    Authors: Mohammad Meraj Ghanbari, David K. Piech, Konlin Shen, Sina Faraji Alamouti, Cem Yalcin, Benjamin C. Johnson, Jose M. Carmena, Michel M. Maharbiz, Rikky Muller

    Abstract: A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equiv… ▽ More

    Submitted 16 July, 2019; v1 submitted 18 May, 2019; originally announced May 2019.

    Comments: 11 pages, 22 figures, Submitted to Journal of Solid-State Circuits

  26. arXiv:1808.01486  [pdf, other

    eess.SP cs.IT cs.LG

    Spatial Deep Learning for Wireless Scheduling

    Authors: Wei Cui, Kaiming Shen, Wei Yu

    Abstract: The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; fur… ▽ More

    Submitted 4 February, 2021; v1 submitted 4 August, 2018; originally announced August 2018.

    Comments: This paper is the full version of the paper presented at IEEE Global Communications Conference 2018. It includes 15 pages and 12 figures

    Journal ref: IEEE J. Sel. Areas in Commun. 37 (2019) 1248-1261