Search | arXiv e-print repository

arXiv:2407.20914 [pdf, ps, other]

doi 10.1109/LSP.2024.3436669

An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming

Authors: Wenhai Lai, Zheyu Wu, Yi Feng, Kaiming Shen, Ya-Feng Liu

Abstract: Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif… ▽ More Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Differing from most existing works, this letter advocates a convex-hull relaxation of the discrete constraints which leads to a continuous reformulated problem equivalent to the original discrete problem. This letter further proposes an efficient alternating projection/proximal gradient descent and ascent algorithm for solving the reformulated problem. Simulation results show that the proposed algorithm outperforms the state-of-the-art methods significantly. △ Less

Submitted 28 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

Comments: 5 pages

Journal ref: IEEE Signal Processing Letters 2024

arXiv:2407.12648 [pdf, ps, other]

Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel… ▽ More Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namely the coverage enhancement. Although some existing works already consider the IRS-assisted coverage enhancement without CSI, they assume certain position-channel models through which the channels can be recovered from the geographic locations. In contrast, our approach solely relies on the received signal power data, not assuming any position-channel model. We examine the achievability and converse of the proposed blind beamforming method. If the IRS has $N$ reflective elements and there are $U$ receiver positions, then our method guarantees the minimum SNR of $Ω(N^2/U)$ -- which is fairly close to the upper bound $O(N+N^2\sqrt{\ln (NU)}/\sqrt[4]{U})$. Aside from the simulation results, we justify the practical use of blind beamforming in a field test at 2.6 GHz. According to the real-world experiment, the proposed blind beamforming method boosts the minimum SNR across seven random positions in a conference room by 18.22 dB, while the position-based method yields a boost of 12.08 dB. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 17 pages

arXiv:2406.10910 [pdf, ps, other]

Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications

Authors: Yannan Chen, Yi Feng, Xiaoyang Li, Licheng Zhao, Kaiming Shen

Abstract: This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-… ▽ More This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-input multiple-output (MIMO) transmission, i.e., the weighted minimum mean square error (WMMSE) algorithm, has a natural extension to the ISAC problem scenario from a fractional programming (FP) perspective. However, the extended WMMSE algorithm requires computing the $N\times N$ matrix inverse extensively, where $N$ is proportional to the antenna array size, so the algorithm becomes quite costly when antennas are massively deployed. To address this issue, we develop a nonhomogeneous bound and use it in conjunction with the FP technique to solve the ISAC beamforming problem without the need to invert any large matrices. It is further shown that the resulting new FP algorithm has an intimate connection with gradient projection, based on which we can accelerate the convergence via Nesterov's gradient extrapolation. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2404.03204 [pdf, other]

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Authors: Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th… ▽ More We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $5.6\%$ (without reranking) and $1.7\%$ (with reranking) to $2.5\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$. △ Less

Submitted 19 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.03100 [pdf, other]

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data. △ Less

Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

arXiv:2312.16918 [pdf, other]

Intelligent Surfaces Empowered Wireless Network: Recent Advances and The Road to 6G

Authors: Qingqing Wu, Beixiong Zheng, Changsheng You, Lipeng Zhu, Kaiming Shen, Xiaodan Shao, Weidong Mei, Boya Di, Hongliang Zhang, Ertugrul Basar, Lingyang Song, Marco Di Renzo, Zhi-Quan Luo, Rui Zhang

Abstract: Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities… ▽ More Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities from passive reflection to active amplification, simultaneous reflection and refraction, as well as holographic beamforming. However, the research on ISs is still in rapid progress and there have been recent technological advances in ISs and their emerging applications that are worthy of a timely review. Thus, we provide in this paper a comprehensive survey on the recent development and advances of ISs aided wireless networks. Specifically, we start with an overview on the anticipated use cases of ISs in future wireless networks such as 6G, followed by a summary of the recent standardization activities related to ISs. Then, the main design issues of the commonly adopted reflection-based IS and their state-of-the-art solutions are presented in detail, including reflection optimization, deployment, signal modulation, wireless sensing, and integrated sensing and communications. Finally, recent progress and new challenges in advanced IS architectures are discussed to inspire futrue research. △ Less

Submitted 24 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2311.04546 [pdf, ps, other]

Discerning and Enhancing the Weighted Sum-Rate Maximization Algorithms in Communications

Authors: Zepeng Zhang, Ziping Zhao, Kaiming Shen, Daniel P. Palomar, Wei Yu

Abstract: Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximiza… ▽ More Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximization (MM) algorithm, WSR maximization via MM (WSR-MM). Our contributions are threefold. Firstly, we delineate the exact relationships among WMMSE, WSR-FP, and WSR-MM, which, despite their extensive use in the literature, lack a comprehensive comparative study. By probing the theoretical underpinnings linking the BCA and MM algorithmic frameworks, we reveal the direct correlations between the equivalent transformation techniques, essential to the development of WMMSE and WSR-FP, and the surrogate functions pivotal to WSR-MM. Secondly, we propose a novel algorithm, WSR-MM+, harnessing the flexibility of selecting surrogate functions in MM framework. By circumventing the repeated matrix inversions in the search for optimal Lagrange multipliers in existing algorithms, WSR-MM+ significantly reduces the computational load per iteration and accelerates convergence. Thirdly, we reconceptualize WSR-MM+ within the BCA framework, introducing a new equivalent transform, which gives rise to an enhanced version of WSR-FP, named as WSR-FP+. We further demonstrate that WSR-MM+ can be construed as the basic gradient projection method. This perspective yields a deeper understanding into its computational intricacies. Numerical simulations corroborate the connections between WMMSE, WSR-FP, and WSR-MM and confirm the efficacy of the proposed WSR-MM+ and WSR-FP+ algorithms. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.08705 [pdf, other]

A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches

Authors: Kangqing Shen, Gemine Vivone, Xiaoyuan Yang, Simone Lolli, Michael Schmitt

Abstract: Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is stil… ▽ More Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 16 pages, 16 figures, 6 tables

arXiv:2309.02285 [pdf, other]

PromptTTS 2: Describing and Generating Voices with Text Prompt

Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

Abstract: Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text… ▽ More Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online. △ Less

Submitted 11 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Demo page: https://speechresearch.github.io/prompttts2

arXiv:2309.01480 [pdf, ps, other]

EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events

Authors: Ying Ren, Kailai Shen, Zhe Ye, Diqun Yan

Abstract: Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks… ▽ More Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks (DNNs) due to the fact that backdoors possess a very high attack success rate once embedded. However, existing backdoor attacks assume that the attacker actively feeds samples containing triggers into the model during the inference phase. This is not adapted to the specific scenario of NISQA. And current backdoor attacks on regression tasks lack an objective metric to measure the attack performance. To address these issues, we propose a novel backdoor triggering approach (EventTrojan) that utilizes an event during the usage of the NISQA model as a trigger. Moreover, we innovatively provide an objective metric for backdoor attacks on regression tasks. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the EventTrojan attack. Besides, it also has good resistance to several defense methods. △ Less

Submitted 11 September, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted by ICME2024

arXiv:2308.04179 [pdf, other]

Breaking Speaker Recognition with PaddingBack

Authors: Zhe Ye, Diqun Yan, Li Dong, Kailai Shen

Abstract: Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transfo… ▽ More Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transformations, leading to suspicion. In this paper, we propose PaddingBack, an inaudible backdoor attack that utilizes malicious operations to generate poisoned samples, rendering them indistinguishable from clean ones. Instead of using external perturbations as triggers, we exploit the widely-used speech signal operation, padding, to break speaker recognition systems. Experimental results demonstrate the effectiveness of our method, achieving a significant attack success rate while retaining benign accuracy. Furthermore, PaddingBack demonstrates the ability to resist defense methods and maintain its stealthiness against human perception. △ Less

Submitted 11 March, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

arXiv:2305.18998 [pdf, other]

Blind Beamforming for Intelligent Reflecting Surface in Fading Channels without CSI

Authors: Wenhai Lai, Wenyu Wang, Fan Xu, Xin Li, Shaobo Niu, Kaiming Shen

Abstract: This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received s… ▽ More This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received signal power at the user terminal. Thus, our method does not incur extra overhead costs for channel estimation, and does not entail collaboration from service provider, either. The main idea is to choose phase shifts at random and use the corresponding conditional sample mean of the received signal power to extract the main features of the wireless environment. This blind beamforming approach guarantees an $N^2$ boost of signal-to-noise ratio (SNR), where $N$ is the number of reflective elements (REs) of IRS, regardless of whether the direct channel is line-of-sight (LoS) or not. Moreover, blind beamforming is extended to a double-IRS system with provable performance. Finally, prototype tests show that the proposed blind beamforming method can be readily incorporated into the existing communication systems in the real world; simulation tests further show that it works for a variety of fading channel models. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 14 pages, 14 figures

arXiv:2304.09116 [pdf, other]

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Authors: Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

Abstract: Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating is… ▽ More Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating issue, and poor voice quality. In this paper, we develop NaturalSpeech 2, a TTS system that leverages a neural audio codec with residual vector quantizers to get the quantized latent vectors and uses a diffusion model to generate these latent vectors conditioned on text input. To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 to large-scale datasets with 44K hours of speech and singing data and evaluate its voice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTS systems by a large margin in terms of prosody/timbre similarity, robustness, and voice quality in a zero-shot setting, and performs novel zero-shot singing synthesis with only a speech prompt. Audio samples are available at https://speechresearch.github.io/naturalspeech2. △ Less

Submitted 30 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: A large-scale text-to-speech and singing voice synthesis system with latent diffusion models. Update: NaturalSpeech 2 extension to voice conversion and speech enhancement

arXiv:2302.09717 [pdf, ps, other]

doi 10.1109/TSP.2023.3334818

Coordinating Multiple Intelligent Reflecting Surfaces without Channel Information

Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means… ▽ More Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means of statistics without knowing CSI. Blind beamforming only requires measuring the received signal power at the user terminal for a sequence of randomly generated phase shifts across all IRSs. The main idea is to extract the key statistical quantity for beamforming by exploring only a small portion of the whole solution space of phase shifts. We show that blind beamforming guarantees a signal-to-noise ratio (SNR) boost of Theta(N^{2L}) under certain conditions, where L is the number of IRSs and N is the number of reflecting elements per IRS. The proposed conditions for achieving the optimal SNR boost of Theta(N^{4}) in a double-IRS system are much easier to satisfy than the existing ones in the literature. Most importantly, the proposed conditions can be extended to a fully general L-IRS system. The above result significantly improves upon the state of the art in the area of multi-IRS-assisted communication. Moreover, blind beamforming is justified via field tests and simulations. In particular, as shown in our field tests at 2.6 GHz, our method yields up to 17 dB SNR boost; to the best of our knowledge, this is the first time that the use of multiple IRSs gets verified in the real world. △ Less

Submitted 8 January, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 16 pages

Journal ref: IEEE Transactions on Signal Processing 2024

arXiv:2302.06727 [pdf, other]

doi 10.1038/s41598-024-54251-1

Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging

Authors: Charlie Tran, Kai Shen, Kang Liu, Akshay Ashok, Adolfo Ramirez-Zamora, Jinghua Chen, Yulin Li, Ruogu Fang

Abstract: Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable scr… ▽ More Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations. △ Less

Submitted 18 February, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 17 pages, 4 figures, 2 tables, 4 supplementary tables

arXiv:2211.04704 [pdf, other]

doi 10.1109/LWC.2022.3232146

A Linear Time Algorithm for the Optimal Discrete IRS Beamforming

Authors: Shuyi Ren, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best r… ▽ More It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best results in this area. Nevertheless, this work shows that the global optimum can actually be reached in linear time on average in terms of the number of reflective elements (REs) of IRS. The main idea is to geometrically interpret the discrete beamforming problem as choosing the optimal point on the unit circle. Although the number of possible combinations of phase shifts grows exponentially with the number of REs, it turns out that there are only a linear number of circular arcs that possibly contain the optimal point. Furthermore, the proposed algorithm can be viewed as a novel approach to a special case of the discrete quadratic program (QP). △ Less

Submitted 7 September, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: 5 pages

arXiv:2205.09306 [pdf, other]

Joint Device Selection and Power Control for Wireless Federated Learning

Authors: Wei Guo, Ran Li, Chuan Huang, Xiaoqi Qin, Kaiming Shen, Wei Zhang

Abstract: This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training an… ▽ More This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training and upload the updated model parameters to the PS via over-the-air computation (AirComp). First, we propose an AirComp-based adaptive reweighing scheme for the aggregation of local updated models, where the model aggregation weights are directly determined by the uplink transmit power values of the selected devices and which enables the joint learning and communication optimization simply by the device selection and power control. Furthermore, we provide a convergence analysis for the proposed wireless FL algorithm and the upper bound on the expected optimality gap between the expected and optimal global loss values is derived. With instantaneous channel state information (CSI), we formulate the optimality gap minimization problems under both the individual and sum uplink transmit power constraints, respectively, which are shown to be solved by the semidefinite programming (SDR) technique. Numerical results reveal that our proposed wireless FL algorithm achieves close to the best performance by using the ideal FedAvg scheme with error-free model exchange and full device participation. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2112.02260 [pdf, ps, other]

doi 10.1109/JSTSP.2022.3176479

Configuring Intelligent Reflecting Surface with Performance Guarantees: Optimal Beamforming

Authors: Yaowen Zhang, Kaiming Shen, Shuyi Ren, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that gu… ▽ More This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that guarantees performance within a constant fraction (1+\cos(π/K))/2 of the global optimum, e.g., it can attain over 85% of the optimal performance for the quadrature beamforming with K=4. According to the numerical results, the proposed approximation algorithm for discrete IRS beamforming outperforms the existing algorithms significantly in boosting the received SNR. △ Less

Submitted 4 December, 2021; originally announced December 2021.

Comments: 9 pages, 10 figures

arXiv:2104.06189 [pdf]

Numerical Energy Analysis of In-wheel Motor Driven Autonomous Electric Vehicles

Authors: Kang Shen, Fan Yang, Xinyou Ke, Cheng Zhang, Chris Yuan

Abstract: Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous… ▽ More Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous electric vehicle driven by in-wheel motors, through the development of a numerical energy model, validated with the actual driving data and implemented in a case study. The energy analysis was conducted under three driving conditions: flat road, upslope, and downslope driving to examine the energy consumption, with the energy-saving potential of the in-wheel-motor driven powertrain system systematically explored and discussed. Considering the energy recovery from the regenerative braking, energy consumption and regenerated energy were calculated in specific driving cycles based on vehicle dynamics and autonomous driving patterns. A case study was conducted using the baseline electric vehicle driving data in West Los Angeles. It was found that an in-wheel motor driven autonomous electric vehicle can save up to 17.5% of energy compared with a conventional electric vehicle during the slope driving. Using the efficiency maps of a commercial in-wheel motor, the numerical energy model and validated results obtained from this study are in line with actual situations, and can be used to support sustainable development of more energy-efficient autonomous electric vehicles in the future. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:2010.05382 [pdf]

doi 10.1038/s41377-020-00403-7

Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy

Authors: Kyrollos Yanny, Nick Antipa, William Liberti, Sam Dehaeck, Kristina Monakhova, Fanglin Linda Liu, Konlin Shen, Ren Ng, Laura Waller

Abstract: Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal pha… ▽ More Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal phase mask at the objective's aperture stop. Placing the phase mask at the aperture stop significantly reduces the size of the device, and varying the focal lengths enables a uniform resolution across a wide depth range. The phase mask encodes the 3D fluorescence intensity into a single 2D measurement, and the 3D volume is recovered by solving a sparsity-constrained inverse problem. We provide methods for designing and fabricating the phase mask and an efficient forward model that accounts for the field-varying aberrations in miniature objectives. We demonstrate a prototype that is 17 mm tall and weighs 2.5 grams, achieving 2.76 $μ$m lateral, and 15 $μ$m axial resolution across most of the 900x700x390 $μm^3$ volume at 40 volumes per second. The performance is validated experimentally on resolution targets, dynamic biological samples, and mouse brain tissue. Compared with existing miniature single-shot volume-capture implementations, our system is smaller and lighter and achieves a more than 2x better lateral and axial resolution throughout a 10x larger usable depth range. Our microscope design provides single-shot 3D imaging for applications where a compact platform matters, such as volumetric neural imaging in freely moving animals and 3D motion studies of dynamic samples in incubators and lab-on-a-chip devices. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: Published with Nature Springer in Light: Science and Applications

Journal ref: Light: Science & Applications 9.1 (2020): 1-13

arXiv:2006.13668 [pdf, ps, other]

Stochastic Transceiver Optimization in Multi-Tags Symbiotic Radio Systems

Authors: Xihan Chen, Hei Victor Cheng, Kaiming Shen, An Liu, Min-Jian Zhao

Abstract: Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transce… ▽ More Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transceiver designs become much more complicated due to the presence of DL and inter-Tag interference, which further poses new challenges to the availability and reliability of DL and BL transmission. To overcome these challenges, we formulate the stochastic optimization of transceiver design as the general network utility maximization problem (GUMP). The resultant problem is a stochastic multiple-ratio fractional non-convex problem, and consequently challenging to solve. By leveraging some fractional programming techniques, we tailor a surrogate function with the specific structure and subsequently develop a batch stochastic parallel decomposition (BSPD) algorithm, which is shown to converge to stationary solutions of the GNUMP. Simulation results verify the effectiveness of the proposed algorithm by numerical examples in terms of the achieved system throughput. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: Accepted by IEEE Internet Things J

arXiv:1912.11678 [pdf, other]

Joint Annotator-and-Spectrum Allocation in Wireless Networks for Crowd Labelling

Authors: Xiaoyang Li, Guangxu Zhu, Kaiming Shen, Wei Yu, Yi Gong, Kaibin Huang

Abstract: The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new persp… ▽ More The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new perspective of wireless crowd labelling that is capable of downloading data to many imperfect mobile annotators for repetition labelling by exploiting multicasting in wireless networks. In this cross-disciplinary area, the integration of the rate-distortion theory and the principle of repetition labelling for accuracy improvement gives rise to a new tradeoff between radio-and-annotator resources under a constraint on labelling accuracy. Building on the tradeoff and aiming at maximizing the labelling throughput, this work focuses on the joint optimization of encoding rate, annotator clustering, and sub-channel allocation, which results in an NP-hard integer programming problem. To devise an efficient solution approach, we establish an optimal sequential annotator-clustering scheme based on the order of decreasing signal-to-noise ratios. Thereby, the optimal solution can be found by an efficient tree search. Next, the solution is simplified by applying truncated channel inversion. Alternatively, the optimization problem can be recognized as a knapsack problem, which can be efficiently solved in pseudo-polynomial time by means of dynamic programming. In addition, exact polices are derived for the annotators constrained and spectrum constrained cases. Last, simulation results demonstrate the significant throughput gains based on the optimal solution compared with decoupled allocation of the two types of resources. △ Less

Submitted 25 December, 2019; originally announced December 2019.

arXiv:1910.01150 [pdf]

Fault Detection Using Nonlinear Low-Dimensional Representation of Sensor Data

Authors: Kai Shen, Anya Mcguirk, Yuwei Liao, Arin Chaudhuri, Deovrat Kakde

Abstract: Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection wi… ▽ More Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection with low dimensional representations provides better interpretability and is conducive to edge processing in IoT applications. △ Less

Submitted 2 October, 2019; originally announced October 2019.

arXiv:1908.07408 [pdf, ps, other]

Mixed-Timescale Beamforming and Power Splitting for Massive MIMO Aided SWIPT IoT Network

Authors: Xihan Chen, Hei Victor Cheng, An Liu, Kaiming Shen, Min-Jian Zhao

Abstract: Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constr… ▽ More Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constraint in the downlink of a massive MIMO SWIPT IoT network. In this scheme, the transmit digital beamformer is adapted to the imperfect CSI, while the receive power splitters are adapted to the long-term channel statistics only due to the consideration of hardware limit and signaling overhead. The formulated optimization problem is solved using a mixed-timescale online stochastic successive convex approximation (MO-SSCA) algorithm. Simulation results reveal significant gain over the baselines. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: An extended version of a manuscript submitted to IEEE WCL

arXiv:1905.09386 [pdf, other]

A Sub-mm$^3$ Ultrasonic Free-floating Implant for Multi-mote Neural Recording

Authors: Mohammad Meraj Ghanbari, David K. Piech, Konlin Shen, Sina Faraji Alamouti, Cem Yalcin, Benjamin C. Johnson, Jose M. Carmena, Michel M. Maharbiz, Rikky Muller

Abstract: A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equiv… ▽ More A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equivalent uplink data rate is achieved. A technique to linearize the echo amplitude modulation is introduced, resulting in <1.2\% static nonlinearity of the received signal over a $\pm$10 mV input range. The IC dissipates 37.7 $μ$W, while the neural recording front-end consumes 4 $μ$W and achieves a noise floor of 5.3 $μ$V$_{rms}$ in a 5 kHz bandwidth. This work improves sub-mm recording mote depth by >2.5x, resulting in the highest measured depth/volume ratio by $\sim$3x. Orthogonal subcarrier modulation enables simultaneous operation of multiple implants, using a single-element ultrasound external transducer. Dual-mote simultaneous power up and data transmission is demonstrated at a rate of 7 kS/s at the depth of 50 mm. △ Less

Submitted 16 July, 2019; v1 submitted 18 May, 2019; originally announced May 2019.

Comments: 11 pages, 22 figures, Submitted to Journal of Solid-State Circuits

arXiv:1808.01486 [pdf, other]

doi 10.1109/JSAC.2019.2904352

Spatial Deep Learning for Wireless Scheduling

Authors: Wei Cui, Kaiming Shen, Wei Yu

Abstract: The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; fur… ▽ More The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; furthermore, finding even a locally optimal solution of the resulting optimization problem may be computationally complex. This paper shows that by using a deep learning approach, it is possible to bypass the channel estimation and to schedule links efficiently based solely on the geographic locations of the transmitters and the receivers, due to the fact that in many propagation environments, the wireless channel strength is largely a function of the distance dependent path-loss. This is accomplished by unsupervised training over randomly deployed networks, and by using a novel neural network architecture that computes the geographic spatial convolutions of the interfering or interfered neighboring nodes along with subsequent multiple feedback stages to learn the optimum solution. The resulting neural network gives near-optimal performance for sum-rate maximization and is capable of generalizing to larger deployment areas and to deployments of different link densities. Moreover, to provide fairness, this paper proposes a novel scheduling approach that utilizes the sum-rate optimal scheduling algorithm over judiciously chosen subsets of links for maximizing a proportional fairness objective over the network. The proposed approach shows highly competitive and generalizable network utility maximization results. △ Less

Submitted 4 February, 2021; v1 submitted 4 August, 2018; originally announced August 2018.

Comments: This paper is the full version of the paper presented at IEEE Global Communications Conference 2018. It includes 15 pages and 12 figures

Journal ref: IEEE J. Sel. Areas in Commun. 37 (2019) 1248-1261

Showing 1–26 of 26 results for author: Shen, K