-
An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming
Authors:
Wenhai Lai,
Zheyu Wu,
Yi Feng,
Kaiming Shen,
Ya-Feng Liu
Abstract:
Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif…
▽ More
Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Differing from most existing works, this letter advocates a convex-hull relaxation of the discrete constraints which leads to a continuous reformulated problem equivalent to the original discrete problem. This letter further proposes an efficient alternating projection/proximal gradient descent and ascent algorithm for solving the reformulated problem. Simulation results show that the proposed algorithm outperforms the state-of-the-art methods significantly.
△ Less
Submitted 28 August, 2024; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface
Authors:
Fan Xu,
Jiawei Yao,
Wenhai Lai,
Kaiming Shen,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel…
▽ More
Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namely the coverage enhancement. Although some existing works already consider the IRS-assisted coverage enhancement without CSI, they assume certain position-channel models through which the channels can be recovered from the geographic locations. In contrast, our approach solely relies on the received signal power data, not assuming any position-channel model. We examine the achievability and converse of the proposed blind beamforming method. If the IRS has $N$ reflective elements and there are $U$ receiver positions, then our method guarantees the minimum SNR of $Ω(N^2/U)$ -- which is fairly close to the upper bound $O(N+N^2\sqrt{\ln (NU)}/\sqrt[4]{U})$. Aside from the simulation results, we justify the practical use of blind beamforming in a field test at 2.6 GHz. According to the real-world experiment, the proposed blind beamforming method boosts the minimum SNR across seven random positions in a conference room by 18.22 dB, while the position-based method yields a boost of 12.08 dB.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications
Authors:
Yannan Chen,
Yi Feng,
Xiaoyang Li,
Licheng Zhao,
Kaiming Shen
Abstract:
This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-…
▽ More
This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-input multiple-output (MIMO) transmission, i.e., the weighted minimum mean square error (WMMSE) algorithm, has a natural extension to the ISAC problem scenario from a fractional programming (FP) perspective. However, the extended WMMSE algorithm requires computing the $N\times N$ matrix inverse extensively, where $N$ is proportional to the antenna array size, so the algorithm becomes quite costly when antennas are massively deployed. To address this issue, we develop a nonhomogeneous bound and use it in conjunction with the FP technique to solve the ISAC beamforming problem without the need to invert any large matrices. It is further shown that the resulting new FP algorithm has an intimate connection with gradient projection, based on which we can accelerate the convergence via Nesterov's gradient extrapolation.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Authors:
Detai Xin,
Xu Tan,
Kai Shen,
Zeqian Ju,
Dongchao Yang,
Yuancheng Wang,
Shinnosuke Takamichi,
Hiroshi Saruwatari,
Shujie Liu,
Jinyu Li,
Sheng Zhao
Abstract:
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th…
▽ More
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $5.6\%$ (without reranking) and $1.7\%$ (with reranking) to $2.5\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$.
△ Less
Submitted 19 May, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Authors:
Zeqian Ju,
Yuancheng Wang,
Kai Shen,
Xu Tan,
Detai Xin,
Dongchao Yang,
Yanqing Liu,
Yichong Leng,
Kaitao Song,
Siliang Tang,
Zhizheng Wu,
Tao Qin,
Xiang-Yang Li,
Wei Ye,
Shikun Zhang,
Jiang Bian,
Lei He,
Jinyu Li,
Sheng Zhao
Abstract:
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di…
▽ More
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data.
△ Less
Submitted 23 April, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Intelligent Surfaces Empowered Wireless Network: Recent Advances and The Road to 6G
Authors:
Qingqing Wu,
Beixiong Zheng,
Changsheng You,
Lipeng Zhu,
Kaiming Shen,
Xiaodan Shao,
Weidong Mei,
Boya Di,
Hongliang Zhang,
Ertugrul Basar,
Lingyang Song,
Marco Di Renzo,
Zhi-Quan Luo,
Rui Zhang
Abstract:
Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities…
▽ More
Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities from passive reflection to active amplification, simultaneous reflection and refraction, as well as holographic beamforming. However, the research on ISs is still in rapid progress and there have been recent technological advances in ISs and their emerging applications that are worthy of a timely review. Thus, we provide in this paper a comprehensive survey on the recent development and advances of ISs aided wireless networks. Specifically, we start with an overview on the anticipated use cases of ISs in future wireless networks such as 6G, followed by a summary of the recent standardization activities related to ISs. Then, the main design issues of the commonly adopted reflection-based IS and their state-of-the-art solutions are presented in detail, including reflection optimization, deployment, signal modulation, wireless sensing, and integrated sensing and communications. Finally, recent progress and new challenges in advanced IS architectures are discussed to inspire futrue research.
△ Less
Submitted 24 March, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Discerning and Enhancing the Weighted Sum-Rate Maximization Algorithms in Communications
Authors:
Zepeng Zhang,
Ziping Zhao,
Kaiming Shen,
Daniel P. Palomar,
Wei Yu
Abstract:
Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximiza…
▽ More
Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximization (MM) algorithm, WSR maximization via MM (WSR-MM). Our contributions are threefold. Firstly, we delineate the exact relationships among WMMSE, WSR-FP, and WSR-MM, which, despite their extensive use in the literature, lack a comprehensive comparative study. By probing the theoretical underpinnings linking the BCA and MM algorithmic frameworks, we reveal the direct correlations between the equivalent transformation techniques, essential to the development of WMMSE and WSR-FP, and the surrogate functions pivotal to WSR-MM. Secondly, we propose a novel algorithm, WSR-MM+, harnessing the flexibility of selecting surrogate functions in MM framework. By circumventing the repeated matrix inversions in the search for optimal Lagrange multipliers in existing algorithms, WSR-MM+ significantly reduces the computational load per iteration and accelerates convergence. Thirdly, we reconceptualize WSR-MM+ within the BCA framework, introducing a new equivalent transform, which gives rise to an enhanced version of WSR-FP, named as WSR-FP+. We further demonstrate that WSR-MM+ can be construed as the basic gradient projection method. This perspective yields a deeper understanding into its computational intricacies. Numerical simulations corroborate the connections between WMMSE, WSR-FP, and WSR-MM and confirm the efficacy of the proposed WSR-MM+ and WSR-FP+ algorithms.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches
Authors:
Kangqing Shen,
Gemine Vivone,
Xiaoyuan Yang,
Simone Lolli,
Michael Schmitt
Abstract:
Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is stil…
▽ More
Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
PromptTTS 2: Describing and Generating Voices with Text Prompt
Authors:
Yichong Leng,
Zhifang Guo,
Kai Shen,
Xu Tan,
Zeqian Ju,
Yanqing Liu,
Yufei Liu,
Dongchao Yang,
Leying Zhang,
Kaitao Song,
Lei He,
Xiang-Yang Li,
Sheng Zhao,
Tao Qin,
Jiang Bian
Abstract:
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text…
▽ More
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.
△ Less
Submitted 11 October, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events
Authors:
Ying Ren,
Kailai Shen,
Zhe Ye,
Diqun Yan
Abstract:
Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks…
▽ More
Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks (DNNs) due to the fact that backdoors possess a very high attack success rate once embedded. However, existing backdoor attacks assume that the attacker actively feeds samples containing triggers into the model during the inference phase. This is not adapted to the specific scenario of NISQA. And current backdoor attacks on regression tasks lack an objective metric to measure the attack performance. To address these issues, we propose a novel backdoor triggering approach (EventTrojan) that utilizes an event during the usage of the NISQA model as a trigger. Moreover, we innovatively provide an objective metric for backdoor attacks on regression tasks. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the EventTrojan attack. Besides, it also has good resistance to several defense methods.
△ Less
Submitted 11 September, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Breaking Speaker Recognition with PaddingBack
Authors:
Zhe Ye,
Diqun Yan,
Li Dong,
Kailai Shen
Abstract:
Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transfo…
▽ More
Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transformations, leading to suspicion. In this paper, we propose PaddingBack, an inaudible backdoor attack that utilizes malicious operations to generate poisoned samples, rendering them indistinguishable from clean ones. Instead of using external perturbations as triggers, we exploit the widely-used speech signal operation, padding, to break speaker recognition systems. Experimental results demonstrate the effectiveness of our method, achieving a significant attack success rate while retaining benign accuracy. Furthermore, PaddingBack demonstrates the ability to resist defense methods and maintain its stealthiness against human perception.
△ Less
Submitted 11 March, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Blind Beamforming for Intelligent Reflecting Surface in Fading Channels without CSI
Authors:
Wenhai Lai,
Wenyu Wang,
Fan Xu,
Xin Li,
Shaobo Niu,
Kaiming Shen
Abstract:
This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received s…
▽ More
This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received signal power at the user terminal. Thus, our method does not incur extra overhead costs for channel estimation, and does not entail collaboration from service provider, either. The main idea is to choose phase shifts at random and use the corresponding conditional sample mean of the received signal power to extract the main features of the wireless environment. This blind beamforming approach guarantees an $N^2$ boost of signal-to-noise ratio (SNR), where $N$ is the number of reflective elements (REs) of IRS, regardless of whether the direct channel is line-of-sight (LoS) or not. Moreover, blind beamforming is extended to a double-IRS system with provable performance. Finally, prototype tests show that the proposed blind beamforming method can be readily incorporated into the existing communication systems in the real world; simulation tests further show that it works for a variety of fading channel models.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Authors:
Kai Shen,
Zeqian Ju,
Xu Tan,
Yanqing Liu,
Yichong Leng,
Lei He,
Tao Qin,
Sheng Zhao,
Jiang Bian
Abstract:
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating is…
▽ More
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating issue, and poor voice quality. In this paper, we develop NaturalSpeech 2, a TTS system that leverages a neural audio codec with residual vector quantizers to get the quantized latent vectors and uses a diffusion model to generate these latent vectors conditioned on text input. To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 to large-scale datasets with 44K hours of speech and singing data and evaluate its voice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTS systems by a large margin in terms of prosody/timbre similarity, robustness, and voice quality in a zero-shot setting, and performs novel zero-shot singing synthesis with only a speech prompt. Audio samples are available at https://speechresearch.github.io/naturalspeech2.
△ Less
Submitted 30 May, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Coordinating Multiple Intelligent Reflecting Surfaces without Channel Information
Authors:
Fan Xu,
Jiawei Yao,
Wenhai Lai,
Kaiming Shen,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means…
▽ More
Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means of statistics without knowing CSI. Blind beamforming only requires measuring the received signal power at the user terminal for a sequence of randomly generated phase shifts across all IRSs. The main idea is to extract the key statistical quantity for beamforming by exploring only a small portion of the whole solution space of phase shifts. We show that blind beamforming guarantees a signal-to-noise ratio (SNR) boost of Theta(N^{2L}) under certain conditions, where L is the number of IRSs and N is the number of reflecting elements per IRS. The proposed conditions for achieving the optimal SNR boost of Theta(N^{4}) in a double-IRS system are much easier to satisfy than the existing ones in the literature. Most importantly, the proposed conditions can be extended to a fully general L-IRS system. The above result significantly improves upon the state of the art in the area of multi-IRS-assisted communication. Moreover, blind beamforming is justified via field tests and simulations. In particular, as shown in our field tests at 2.6 GHz, our method yields up to 17 dB SNR boost; to the best of our knowledge, this is the first time that the use of multiple IRSs gets verified in the real world.
△ Less
Submitted 8 January, 2024; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging
Authors:
Charlie Tran,
Kai Shen,
Kang Liu,
Akshay Ashok,
Adolfo Ramirez-Zamora,
Jinghua Chen,
Yulin Li,
Ruogu Fang
Abstract:
Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable scr…
▽ More
Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations.
△ Less
Submitted 18 February, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
A Linear Time Algorithm for the Optimal Discrete IRS Beamforming
Authors:
Shuyi Ren,
Kaiming Shen,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best r…
▽ More
It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best results in this area. Nevertheless, this work shows that the global optimum can actually be reached in linear time on average in terms of the number of reflective elements (REs) of IRS. The main idea is to geometrically interpret the discrete beamforming problem as choosing the optimal point on the unit circle. Although the number of possible combinations of phase shifts grows exponentially with the number of REs, it turns out that there are only a linear number of circular arcs that possibly contain the optimal point. Furthermore, the proposed algorithm can be viewed as a novel approach to a special case of the discrete quadratic program (QP).
△ Less
Submitted 7 September, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Joint Device Selection and Power Control for Wireless Federated Learning
Authors:
Wei Guo,
Ran Li,
Chuan Huang,
Xiaoqi Qin,
Kaiming Shen,
Wei Zhang
Abstract:
This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training an…
▽ More
This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training and upload the updated model parameters to the PS via over-the-air computation (AirComp). First, we propose an AirComp-based adaptive reweighing scheme for the aggregation of local updated models, where the model aggregation weights are directly determined by the uplink transmit power values of the selected devices and which enables the joint learning and communication optimization simply by the device selection and power control. Furthermore, we provide a convergence analysis for the proposed wireless FL algorithm and the upper bound on the expected optimality gap between the expected and optimal global loss values is derived. With instantaneous channel state information (CSI), we formulate the optimality gap minimization problems under both the individual and sum uplink transmit power constraints, respectively, which are shown to be solved by the semidefinite programming (SDR) technique. Numerical results reveal that our proposed wireless FL algorithm achieves close to the best performance by using the ideal FedAvg scheme with error-free model exchange and full device participation.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Configuring Intelligent Reflecting Surface with Performance Guarantees: Optimal Beamforming
Authors:
Yaowen Zhang,
Kaiming Shen,
Shuyi Ren,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that gu…
▽ More
This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that guarantees performance within a constant fraction (1+\cos(π/K))/2 of the global optimum, e.g., it can attain over 85% of the optimal performance for the quadrature beamforming with K=4. According to the numerical results, the proposed approximation algorithm for discrete IRS beamforming outperforms the existing algorithms significantly in boosting the received SNR.
△ Less
Submitted 4 December, 2021;
originally announced December 2021.
-
Numerical Energy Analysis of In-wheel Motor Driven Autonomous Electric Vehicles
Authors:
Kang Shen,
Fan Yang,
Xinyou Ke,
Cheng Zhang,
Chris Yuan
Abstract:
Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous…
▽ More
Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous electric vehicle driven by in-wheel motors, through the development of a numerical energy model, validated with the actual driving data and implemented in a case study. The energy analysis was conducted under three driving conditions: flat road, upslope, and downslope driving to examine the energy consumption, with the energy-saving potential of the in-wheel-motor driven powertrain system systematically explored and discussed. Considering the energy recovery from the regenerative braking, energy consumption and regenerated energy were calculated in specific driving cycles based on vehicle dynamics and autonomous driving patterns. A case study was conducted using the baseline electric vehicle driving data in West Los Angeles. It was found that an in-wheel motor driven autonomous electric vehicle can save up to 17.5% of energy compared with a conventional electric vehicle during the slope driving. Using the efficiency maps of a commercial in-wheel motor, the numerical energy model and validated results obtained from this study are in line with actual situations, and can be used to support sustainable development of more energy-efficient autonomous electric vehicles in the future.
△ Less
Submitted 10 April, 2021;
originally announced April 2021.
-
Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy
Authors:
Kyrollos Yanny,
Nick Antipa,
William Liberti,
Sam Dehaeck,
Kristina Monakhova,
Fanglin Linda Liu,
Konlin Shen,
Ren Ng,
Laura Waller
Abstract:
Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal pha…
▽ More
Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal phase mask at the objective's aperture stop. Placing the phase mask at the aperture stop significantly reduces the size of the device, and varying the focal lengths enables a uniform resolution across a wide depth range. The phase mask encodes the 3D fluorescence intensity into a single 2D measurement, and the 3D volume is recovered by solving a sparsity-constrained inverse problem. We provide methods for designing and fabricating the phase mask and an efficient forward model that accounts for the field-varying aberrations in miniature objectives. We demonstrate a prototype that is 17 mm tall and weighs 2.5 grams, achieving 2.76 $μ$m lateral, and 15 $μ$m axial resolution across most of the 900x700x390 $μm^3$ volume at 40 volumes per second. The performance is validated experimentally on resolution targets, dynamic biological samples, and mouse brain tissue. Compared with existing miniature single-shot volume-capture implementations, our system is smaller and lighter and achieves a more than 2x better lateral and axial resolution throughout a 10x larger usable depth range. Our microscope design provides single-shot 3D imaging for applications where a compact platform matters, such as volumetric neural imaging in freely moving animals and 3D motion studies of dynamic samples in incubators and lab-on-a-chip devices.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Stochastic Transceiver Optimization in Multi-Tags Symbiotic Radio Systems
Authors:
Xihan Chen,
Hei Victor Cheng,
Kaiming Shen,
An Liu,
Min-Jian Zhao
Abstract:
Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transce…
▽ More
Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transceiver designs become much more complicated due to the presence of DL and inter-Tag interference, which further poses new challenges to the availability and reliability of DL and BL transmission. To overcome these challenges, we formulate the stochastic optimization of transceiver design as the general network utility maximization problem (GUMP). The resultant problem is a stochastic multiple-ratio fractional non-convex problem, and consequently challenging to solve. By leveraging some fractional programming techniques, we tailor a surrogate function with the specific structure and subsequently develop a batch stochastic parallel decomposition (BSPD) algorithm, which is shown to converge to stationary solutions of the GNUMP. Simulation results verify the effectiveness of the proposed algorithm by numerical examples in terms of the achieved system throughput.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Joint Annotator-and-Spectrum Allocation in Wireless Networks for Crowd Labelling
Authors:
Xiaoyang Li,
Guangxu Zhu,
Kaiming Shen,
Wei Yu,
Yi Gong,
Kaibin Huang
Abstract:
The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new persp…
▽ More
The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new perspective of wireless crowd labelling that is capable of downloading data to many imperfect mobile annotators for repetition labelling by exploiting multicasting in wireless networks. In this cross-disciplinary area, the integration of the rate-distortion theory and the principle of repetition labelling for accuracy improvement gives rise to a new tradeoff between radio-and-annotator resources under a constraint on labelling accuracy. Building on the tradeoff and aiming at maximizing the labelling throughput, this work focuses on the joint optimization of encoding rate, annotator clustering, and sub-channel allocation, which results in an NP-hard integer programming problem. To devise an efficient solution approach, we establish an optimal sequential annotator-clustering scheme based on the order of decreasing signal-to-noise ratios. Thereby, the optimal solution can be found by an efficient tree search. Next, the solution is simplified by applying truncated channel inversion. Alternatively, the optimization problem can be recognized as a knapsack problem, which can be efficiently solved in pseudo-polynomial time by means of dynamic programming. In addition, exact polices are derived for the annotators constrained and spectrum constrained cases. Last, simulation results demonstrate the significant throughput gains based on the optimal solution compared with decoupled allocation of the two types of resources.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Fault Detection Using Nonlinear Low-Dimensional Representation of Sensor Data
Authors:
Kai Shen,
Anya Mcguirk,
Yuwei Liao,
Arin Chaudhuri,
Deovrat Kakde
Abstract:
Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection wi…
▽ More
Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection with low dimensional representations provides better interpretability and is conducive to edge processing in IoT applications.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Mixed-Timescale Beamforming and Power Splitting for Massive MIMO Aided SWIPT IoT Network
Authors:
Xihan Chen,
Hei Victor Cheng,
An Liu,
Kaiming Shen,
Min-Jian Zhao
Abstract:
Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constr…
▽ More
Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constraint in the downlink of a massive MIMO SWIPT IoT network. In this scheme, the transmit digital beamformer is adapted to the imperfect CSI, while the receive power splitters are adapted to the long-term channel statistics only due to the consideration of hardware limit and signaling overhead. The formulated optimization problem is solved using a mixed-timescale online stochastic successive convex approximation (MO-SSCA) algorithm. Simulation results reveal significant gain over the baselines.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
A Sub-mm$^3$ Ultrasonic Free-floating Implant for Multi-mote Neural Recording
Authors:
Mohammad Meraj Ghanbari,
David K. Piech,
Konlin Shen,
Sina Faraji Alamouti,
Cem Yalcin,
Benjamin C. Johnson,
Jose M. Carmena,
Michel M. Maharbiz,
Rikky Muller
Abstract:
A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equiv…
▽ More
A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equivalent uplink data rate is achieved. A technique to linearize the echo amplitude modulation is introduced, resulting in <1.2\% static nonlinearity of the received signal over a $\pm$10 mV input range. The IC dissipates 37.7 $μ$W, while the neural recording front-end consumes 4 $μ$W and achieves a noise floor of 5.3 $μ$V$_{rms}$ in a 5 kHz bandwidth. This work improves sub-mm recording mote depth by >2.5x, resulting in the highest measured depth/volume ratio by $\sim$3x. Orthogonal subcarrier modulation enables simultaneous operation of multiple implants, using a single-element ultrasound external transducer. Dual-mote simultaneous power up and data transmission is demonstrated at a rate of 7 kS/s at the depth of 50 mm.
△ Less
Submitted 16 July, 2019; v1 submitted 18 May, 2019;
originally announced May 2019.
-
Spatial Deep Learning for Wireless Scheduling
Authors:
Wei Cui,
Kaiming Shen,
Wei Yu
Abstract:
The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; fur…
▽ More
The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; furthermore, finding even a locally optimal solution of the resulting optimization problem may be computationally complex. This paper shows that by using a deep learning approach, it is possible to bypass the channel estimation and to schedule links efficiently based solely on the geographic locations of the transmitters and the receivers, due to the fact that in many propagation environments, the wireless channel strength is largely a function of the distance dependent path-loss. This is accomplished by unsupervised training over randomly deployed networks, and by using a novel neural network architecture that computes the geographic spatial convolutions of the interfering or interfered neighboring nodes along with subsequent multiple feedback stages to learn the optimum solution. The resulting neural network gives near-optimal performance for sum-rate maximization and is capable of generalizing to larger deployment areas and to deployments of different link densities. Moreover, to provide fairness, this paper proposes a novel scheduling approach that utilizes the sum-rate optimal scheduling algorithm over judiciously chosen subsets of links for maximizing a proportional fairness objective over the network. The proposed approach shows highly competitive and generalizable network utility maximization results.
△ Less
Submitted 4 February, 2021; v1 submitted 4 August, 2018;
originally announced August 2018.