Search | arXiv e-print repository

Latency Minimization for IRS-enhanced Wideband MEC Networks with Practical Reflection Model

Authors: N. Li, W. Hao, X. Li, Z. Zhu, Z. Tang, S. Yang

Abstract: Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performanc… ▽ More Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performance degradation in realistic systems. To further exploit the benefits of IRS in MEC system, we consider an IRS-enhanced wideband MEC system with practical IRS reflection model. With the aim of minimizing the weighted latency of all devices, the offloading data volume, edge computing resource, BS's receiving vector, and IRS passive beamforming are jointly optimized. Since the formulated problem is non-convex, we employ the block coordinate descent (BCD) technique to decouple it into two subproblems for alternatively optimizing computing and communication settings. The effectiveness and convergence of the proposed algorithm are validate via numerical analyses. In addition, simulation results demonstrate that the proposed algorithm can achieve lower latency compared to that based on the ideal IRS reflection model, which confirms the necessary of considering practical model when designing an IRS-enhanced wideband MEC system. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures

arXiv:2406.18069 [pdf, other]

Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood pressure (BP) measurement, which is critical for the management of cardiovascular diseases. This paper presents the first work to explore the capacity of LLMs to perform cuffless BP estimation based on wearable biosignals. We extracted physiological features from electrocardiogram (ECG) and photoplethysmogram (PPG) signals and designed context-enhanced prompts by combining these features with BP domain knowledge and user information. Subsequently, we adapted LLMs to BP estimation tasks through fine-tuning. To evaluate the proposed approach, we conducted assessments of ten advanced LLMs using a comprehensive public dataset of wearable biosignals from 1,272 participants. The experimental results demonstrate that the optimally fine-tuned LLM significantly surpasses conventional task-specific baselines, achieving an estimation error of 0.00 $\pm$ 9.25 mmHg for systolic BP and 1.29 $\pm$ 6.37 mmHg for diastolic BP. Notably, the ablation studies highlight the benefits of our context enhancement strategy, leading to an 8.9% reduction in mean absolute error for systolic BP estimation. This paper pioneers the exploration of LLMs for cuffless BP measurement, providing a potential solution to enhance the accuracy of cuffless BP measurement. △ Less

Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.05325 [pdf, other]

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Authors: Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, Lirong Dai

Abstract: Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusi… ▽ More Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusion model for SVC (LDM-SVC) in this work, which attempts to perform SVC in the latent space using an LDM. We pretrain a variational autoencoder structure using the noted open-source So-VITS-SVC project based on the VITS framework, which is then used for the LDM training. Besides, we propose a singer guidance training method based on classifier-free guidance to further suppress the timbre of the original singer. Experimental results show the superiority of the proposed method over previous works in both subjective and objective evaluations of timbre similarity. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.04243 [pdf, other]

Policy Optimization in Control: Geometry and Algorithmic Implications

Authors: Shahriar Talebi, Yang Zheng, Spencer Kraisler, Na Li, Mehran Mesbahi

Abstract: This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of… ▽ More This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.12031 [pdf, other]

Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

Authors: Nian Li, Jianguo Wei

Abstract: Transformer-based architectures for speaker verification typically require more training data than ECAPA-TDNN. Therefore, recent work has generally been trained on VoxCeleb1&2. We propose a backbone network based on self-attention, which can achieve competitive results when trained on VoxCeleb2 alone. The network alternates between neighborhood attention and global attention to capture local and g… ▽ More Transformer-based architectures for speaker verification typically require more training data than ECAPA-TDNN. Therefore, recent work has generally been trained on VoxCeleb1&2. We propose a backbone network based on self-attention, which can achieve competitive results when trained on VoxCeleb2 alone. The network alternates between neighborhood attention and global attention to capture local and global features, then aggregates features of different hierarchical levels, and finally performs attentive statistics pooling. Additionally, we employ a progressive channel fusion strategy to expand the receptive field in the channel dimension as the network deepens. We trained the proposed PCF-NAT model on VoxCeleb2 and evaluated it on VoxCeleb1 and the validation sets of VoxSRC. The EER and minDCF of the shallow PCF-NAT are on average more than 20% lower than those of similarly sized ECAPA-TDNN. Deep PCF-NAT achieves an EER lower than 0.5% on VoxCeleb1-O. The code and models are publicly available at https://github.com/ChenNan1996/PCF-NAT. △ Less

Submitted 29 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, 3 tables; added github link

arXiv:2405.07281 [pdf, ps, other]

Movable Antennas Aided Multicast MISO Communication Systems

Authors: Zhenqiao Cheng, Nanxi Li, Ruizhe Long, Jianchi Zhu, Chongjun Ouyang, Peng Chen

Abstract: A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA element… ▽ More A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA elements as discrete movements. A low-complexity greedy search-based algorithm is proposed to tackle this non-convex inter-programming problem. A branch-and-bound (BAB)-based method is proposed to achieve the optimal multicast rate with a reduced time complexity than the brute-force search by assuming the two users suffer similar line-of-sight path losses. Numerical results reveal that the proposed MA systems significantly improve the multicast rate compared to conventional fixed-position antennas (FPAs)-based systems. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 5 pages

arXiv:2405.06089 [pdf, other]

Learning Low-dimensional Latent Dynamics from High-dimensional Observations: Non-asymptotics and Lower Bounds

Authors: Yuyang Zhang, Shahriar Talebi, Na Li

Abstract: In this paper, we focus on learning a linear time-invariant (LTI) model with low-dimensional latent variables but high-dimensional observations. We provide an algorithm that recovers the high-dimensional features, i.e. column space of the observer, embeds the data into low dimensions and learns the low-dimensional model parameters. Our algorithm enjoys a sample complexity guarantee of order… ▽ More In this paper, we focus on learning a linear time-invariant (LTI) model with low-dimensional latent variables but high-dimensional observations. We provide an algorithm that recovers the high-dimensional features, i.e. column space of the observer, embeds the data into low dimensions and learns the low-dimensional model parameters. Our algorithm enjoys a sample complexity guarantee of order $\tilde{\mathcal{O}}(n/ε^2)$, where $n$ is the observation dimension. We further establish a fundamental lower bound indicating this complexity bound is optimal up to logarithmic factors and dimension-independent constants. We show that this inevitable linear factor of $n$ is due to the learning error of the observer's column space in the presence of high-dimensional noises. Extending our results, we consider a meta-learning problem inspired by various real-world applications, where the observer column space can be collectively learned from datasets of multiple LTI systems. An end-to-end algorithm is then proposed, facilitating learning LTI systems from a meta-dataset which breaks the sample complexity lower bound in certain scenarios. △ Less

Submitted 25 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05787 [pdf, other]

Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

Authors: Tianpeng Zhang, Sekeun Kim, Jerome Charton, Haitong Ma, Kyungsang Kim, Na Li, Quanzheng Li

Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate mapping between CT image and robot, and (iii) ta… ▽ More The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate mapping between CT image and robot, and (iii) target US scan. Utilizing 3D US-CT registration and deep learning-based segmentation networks, we can achieve precise imaging of 3D hepatic veins, facilitating accurate coordinate mapping between CT and the robot. This enables the automatic localization of follow-up targets within the CT image, allowing the robot to navigate precisely to the target's surface. Evaluation of the ultrasound phantom confirms the quality of the US-CT registration and shows the robot reliably locates the targets in repeated trials. The proposed framework holds the potential to significantly reduce time and costs for healthcare providers, clinicians, and follow-up patients, thereby addressing the increasing healthcare burden associated with chronic disease in local communities. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05353 [pdf, other]

Eco-driving Accounting for Interactive Cut-in Vehicles

Authors: Chaozhe R. He, Nan Li

Abstract: Automated vehicles can gather information about surrounding traffic and plan safe and energy-efficient driving behavior, which is known as eco-driving. Conventional eco-driving designs only consider preceding vehicles in the same lane as the ego vehicle. In heavy traffic, however, vehicles in adjacent lanes may cut into the ego vehicle's lane, influencing the ego vehicle's eco-driving behavior and… ▽ More Automated vehicles can gather information about surrounding traffic and plan safe and energy-efficient driving behavior, which is known as eco-driving. Conventional eco-driving designs only consider preceding vehicles in the same lane as the ego vehicle. In heavy traffic, however, vehicles in adjacent lanes may cut into the ego vehicle's lane, influencing the ego vehicle's eco-driving behavior and compromising the energy-saving performance. Therefore, in this paper, we propose an eco-driving design that accounts for neighbor vehicles that have cut-in intentions. Specifically, we integrate a leader-follower game to predict the interaction between the ego and the cut-in vehicles and a model-predictive controller for planning energy-efficient behavior for the automated ego vehicle. We show that the leader-follower game model can reasonably represent the interactive motion between the ego vehicle and the cut-in vehicle. More importantly, we show that the proposed design can predict and react to neighbor vehicles' cut-in behaviors properly, leading to improved energy efficiency in cut-in scenarios compared to baseline designs that consider preceding vehicles only. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted at 2024 IEEE International Conference on Mobility: Operations, Services, and Technologies (MOST)

arXiv:2404.15278 [pdf, other]

Security-Sensitive Task Offloading in Integrated Satellite-Terrestrial Networks

Authors: Wenjun Lan, Kongyang Chen, Jiannong Cao, Yikai Li, Ning Li, Qi Chen, Yuvraj Sahni

Abstract: With the rapid development of sixth-generation (6G) communication technology, global communication networks are moving towards the goal of comprehensive and seamless coverage. In particular, low earth orbit (LEO) satellites have become a critical component of satellite communication networks. The emergence of LEO satellites has brought about new computational resources known as the \textit{LEO sat… ▽ More With the rapid development of sixth-generation (6G) communication technology, global communication networks are moving towards the goal of comprehensive and seamless coverage. In particular, low earth orbit (LEO) satellites have become a critical component of satellite communication networks. The emergence of LEO satellites has brought about new computational resources known as the \textit{LEO satellite edge}, enabling ground users (GU) to offload computing tasks to the resource-rich LEO satellite edge. However, existing LEO satellite computational offloading solutions primarily focus on optimizing system performance, neglecting the potential issue of malicious satellite attacks during task offloading. In this paper, we propose the deployment of LEO satellite edge in an integrated satellite-terrestrial networks (ISTN) structure to support \textit{security-sensitive computing task offloading}. We model the task allocation and offloading order problem as a joint optimization problem to minimize task offloading delay, energy consumption, and the number of attacks while satisfying reliability constraints. To achieve this objective, we model the task offloading process as a Markov decision process (MDP) and propose a security-sensitive task offloading strategy optimization algorithm based on proximal policy optimization (PPO). Experimental results demonstrate that our algorithm significantly outperforms other benchmark methods in terms of performance. △ Less

Submitted 20 January, 2024; originally announced April 2024.

arXiv:2404.13605 [pdf, other]

Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence

Authors: Ripon Kumar Saha, Dehao Qin, Nianyi Li, Jinwei Ye, Suren Jayasuriya

Abstract: Tackling image degradation due to atmospheric turbulence, particularly in dynamic environment, remains a challenge for long-range imaging systems. Existing techniques have been primarily designed for static scenes or scenes with small motion. This paper presents the first segment-then-restore pipeline for restoring the videos of dynamic scenes in turbulent environment. We leverage mean optical flo… ▽ More Tackling image degradation due to atmospheric turbulence, particularly in dynamic environment, remains a challenge for long-range imaging systems. Existing techniques have been primarily designed for static scenes or scenes with small motion. This paper presents the first segment-then-restore pipeline for restoring the videos of dynamic scenes in turbulent environment. We leverage mean optical flow with an unsupervised motion segmentation method to separate dynamic and static scene components prior to restoration. After camera shake compensation and segmentation, we introduce foreground/background enhancement leveraging the statistics of turbulence strength and a transformer model trained on a novel noise-based procedural turbulence generator for fast dataset augmentation. Benchmarked against existing restoration methods, our approach restores most of the geometric distortion and enhances sharpness for videos. We make our code, simulator, and data publicly available to advance the field of video restoration from turbulence: riponcs.github.io/TurbSegRes △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Paper

arXiv:2404.12163 [pdf, other]

Unsupervised Microscopy Video Denoising

Authors: Mary Aiyetigbo, Alexander Korte, Ethan Anderson, Reda Chalhoub, Peter Kalivas, Feng Luo, Nianyi Li

Abstract: In this paper, we introduce a novel unsupervised network to denoise microscopy videos featured by image sequences captured by a fixed location microscopy camera. Specifically, we propose a DeepTemporal Interpolation method, leveraging a temporal signal filter integrated into the bottom CNN layers, to restore microscopy videos corrupted by unknown noise types. Our unsupervised denoising architectur… ▽ More In this paper, we introduce a novel unsupervised network to denoise microscopy videos featured by image sequences captured by a fixed location microscopy camera. Specifically, we propose a DeepTemporal Interpolation method, leveraging a temporal signal filter integrated into the bottom CNN layers, to restore microscopy videos corrupted by unknown noise types. Our unsupervised denoising architecture is distinguished by its ability to adapt to multiple noise conditions without the need for pre-existing noise distribution knowledge, addressing a significant challenge in real-world medical applications. Furthermore, we evaluate our denoising framework using both real microscopy recordings and simulated data, validating our outperforming video denoising performance across a broad spectrum of noise scenarios. Extensive experiments demonstrate that our unsupervised model consistently outperforms state-of-the-art supervised and unsupervised video denoising techniques, proving especially effective for microscopy videos. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted at CVPRW 2024

arXiv:2403.14935 [pdf, ps, other]

Data-Driven Predictive Control with Adaptive Disturbance Attenuation for Constrained Systems

Authors: Nan Li, Ilya Kolmanovsky, Hong Chen

Abstract: In this paper, we propose a novel data-driven predictive control approach for systems subject to time-domain constraints. The approach combines the strengths of H-infinity control for rejecting disturbances and MPC for handling constraints. In particular, the approach can dynamically adapt H-infinity disturbance attenuation performance depending on measured system state and forecasted disturbance… ▽ More In this paper, we propose a novel data-driven predictive control approach for systems subject to time-domain constraints. The approach combines the strengths of H-infinity control for rejecting disturbances and MPC for handling constraints. In particular, the approach can dynamically adapt H-infinity disturbance attenuation performance depending on measured system state and forecasted disturbance level to satisfy constraints. We establish theoretical properties of the approach including robust guarantees of closed-loop stability, disturbance attenuation, constraint satisfaction under noisy data, as well as sufficient conditions for recursive feasibility, and illustrate the approach with a numerical example. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 figures

arXiv:2403.12215 [pdf, other]

Aggregate Peak EV Charging Demand: The Impact of Segmented Network Tariffs

Authors: Nanda Kishor Panda, Na Li, Simon H. Tindemans

Abstract: Aggregate peak Electric Vehicle (EV) charging demand is a matter of growing concern for network operators as it severely limits the network's capacity, preventing its reliable operation. Various tariff schemes have been proposed to limit peak demand by incentivizing flexible asset users to shift their demand from peak periods. However, fewer studies quantify the effect of these tariff schemes on t… ▽ More Aggregate peak Electric Vehicle (EV) charging demand is a matter of growing concern for network operators as it severely limits the network's capacity, preventing its reliable operation. Various tariff schemes have been proposed to limit peak demand by incentivizing flexible asset users to shift their demand from peak periods. However, fewer studies quantify the effect of these tariff schemes on the aggregate level. In this paper, we compare the effect of a multi-level segmented network tariff with and without dynamic energy prices for individual EV users on the aggregate peak demand. Results based on real charging transactions from over 1200 public charging points in the Netherlands show that the segmented network tariff with flat energy prices results in more diverse load profiles with increasing aggregation, as compared to cost-optimized dispatch based on only dynamic day-ahead energy prices. When paired with dynamic energy prices, the segmented tariff still outperforms only dynamic energy price-based tariffs in reducing peaks. Results show that a balance between power thresholds and price per threshold is crucial in designing a suitable tariff, taking into account the needs of the power network. We also provide valuable insights to network operators by calculating the diversity factor for various peak demands per charging point. △ Less

Submitted 2 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures, 2 columns

arXiv:2403.09125 [pdf, other]

doi 10.1016/j.joule.2024.05.009

Exploring the Capabilities and Limitations of Large Language Models in the Electric Energy Sector

Authors: Subir Majumder, Lin Dong, Fatemeh Doudi, Yuting Cai, Chao Tian, Dileep Kalathi, Kevin Ding, Anupam A. Thatte, Na Li, Le Xie

Abstract: Large Language Models (LLMs) as chatbots have drawn remarkable attention thanks to their versatile capability in natural language processing as well as in a wide range of tasks. While there has been great enthusiasm towards adopting such foundational model-based artificial intelligence tools in all sectors possible, the capabilities and limitations of such LLMs in improving the operation of the el… ▽ More Large Language Models (LLMs) as chatbots have drawn remarkable attention thanks to their versatile capability in natural language processing as well as in a wide range of tasks. While there has been great enthusiasm towards adopting such foundational model-based artificial intelligence tools in all sectors possible, the capabilities and limitations of such LLMs in improving the operation of the electric energy sector need to be explored, and this article identifies fruitful directions in this regard. Key future research directions include data collection systems for fine-tuning LLMs, embedding power system-specific tools in the LLMs, and retrieval augmented generation (RAG)-based knowledge pool to improve the quality of LLM responses and LLMs in safety-critical use cases. △ Less

Submitted 20 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09030 [pdf]

An AI-Driven Approach to Wind Turbine Bearing Fault Diagnosis from Acoustic Signals

Authors: Zhao Wang, Xiaomeng Li, Na Li, Longlong Shu

Abstract: This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and f… ▽ More This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and frequency domain information. The model exhibited outstanding accuracy on training samples and demonstrated excellent generalization ability during validation, indicating its proficiency of generalization capability. On the test samples, the model achieved remarkable classification performance, with an overall accuracy exceeding 99.5%, and a false positive rate of less than 1% for normal status. The findings of this study provide essential support for the diagnosis and maintenance of bearing faults in wind turbine generators, with the potential to enhance the reliability and efficiency of wind power generation. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05772 [pdf, other]

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Authors: Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

Abstract: Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a no… ▽ More Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. Particularly, it provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms. The classifier utilizes Spiking Recurrent Neural Networks (sRNN) to exploit temporal speech information. Experimental results demonstrate that our sVAD achieves remarkable noise robustness and meanwhile maintains low power consumption and a small footprint, making it a promising solution for real-world VAD applications. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted by ICASSP 2024

arXiv:2402.01808 [pdf, other]

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

arXiv:2401.17575 [pdf, other]

Can We Improve Channel Reciprocity via Loop-back Compensation for RIS-assisted Physical Layer Key Generation

Authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao, Na Li, Pengxuan Mao, Tianyuan Yang

Abstract: Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware… ▽ More Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware interference and RIS-based jamming attacks. This motivates us to propose LoCKey, a novel approach that aims to improve channel reciprocity by mitigating interferences and attacks with a loop-back compensation scheme, thus maximizing the secrecy performance of the PKG system. Specifically, our proposed LoCKey is capable of effectively compensating for the CSI non-reciprocity by the combination of transmit-back signal value and error minimization module. Firstly, we introduce the entire flowchart of our method and provide an in-depth discussion of each step. Following that, we delve into a theoretical analysis of the performance optimizations when our LoCKey is applied for CSI reciprocity enhancement. Finally, we conduct experiments to verify the effectiveness of the proposed LoCKey in improving channel reciprocity under various interferences for RIS-assisted wireless communications. The results demonstrate a significant improvement in both the rate of key generation assisted by the RIS and the consistency of the generated keys, showing great potential for the practical deployment of our LoCKey in future wireless systems. △ Less

Submitted 30 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by ICC 2024

arXiv:2401.16183 [pdf, ps, other]

Scalable Reinforcement Learning for Linear-Quadratic Control of Networks

Authors: Johan Olsson, Runyu Zhang, Emma Tegling, Na Li

Abstract: Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance. More specifically, we consider networked linear-quadratic controllers with decoupled costs and spatially exponentially decaying dynamics… ▽ More Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance. More specifically, we consider networked linear-quadratic controllers with decoupled costs and spatially exponentially decaying dynamics. We aim to exploit the structure in the problem to design a scalable reinforcement learning algorithm for learning a distributed controller. Recent work has shown that the optimal controller can be well approximated only using information from a $κ$-neighborhood of each agent. Motivated by these results, we show that similar results hold for the agents' individual value and Q-functions. We continue by designing an algorithm, based on the actor-critic framework, to learn distributed controllers only using local information. Specifically, the Q-function is estimated by modifying the Least Squares Temporal Difference for Q-functions method to only use local information. The algorithm then updates the policy using gradient descent. Finally, we evaluate the algorithm through simulations that indeed suggest near-optimal performance. △ Less

Submitted 13 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 8 pages, 4 figures

arXiv:2401.12167 [pdf, other]

Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder

Authors: Nan Li, Alexandros Iosifidis, Qi Zhang

Abstract: This paper studies the computational offloading of CNN inference in dynamic multi-access edge computing (MEC) networks. To address the uncertainties in communication time and computation resource availability, we propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN), for effective semantic extraction and compression in partial offloading. In the semantic encoder,… ▽ More This paper studies the computational offloading of CNN inference in dynamic multi-access edge computing (MEC) networks. To address the uncertainties in communication time and computation resource availability, we propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN), for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy. To effectively trade-off communication, computation, and inference accuracy, we design a reward function and formulate the offloading problem of CNN inference as a maximization problem with the goal of maximizing the average inference accuracy and throughput over the long term. To address this maximization problem, we propose a graph reinforcement learning-based AECNN (GRL-AECNN) method, which outperforms existing works DROO-AECNN, GRL-BottleNet++ and GRL-DeepJSCC under different dynamic scenarios. This highlights the advantages of GRL-AECNN in offloading decision-making in dynamic MEC. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2211.13745

arXiv:2312.14018 [pdf, ps, other]

Enabling Secure Wireless Communications via Movable Antennas

Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Xiaoming She, Chongjun Ouyang, Peng Chen

Abstract: A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitte… ▽ More A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitter are jointly optimized under two criteria: power consumption minimization and secrecy rate maximization. For each scenario, a novel suboptimal algorithm was proposed to tackle the resulting nonconvex optimization problem, capitalizing on the approaches of alternating optimization and gradient descent. Numerical results demonstrate that the proposed MA systems significantly improve physical layer security compared to various benchmark schemes relying on conventional fixed-position antennas (FPAs). △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by IEEE ICASSP 2024

arXiv:2312.13722 [pdf, other]

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

Authors: Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu

Abstract: Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose… ▽ More Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose a novel streaming adaptive bandwidth extension solution dubbed BAE-Net, which is suitable to handle the low-resolution speech with unknown and varying effective bandwidth. To address the challenges of recovering both the high-frequency magnitude and phase speech content blindly, we devise a dual-stream architecture that incorporates the magnitude inpainting and phase refinement. For potential applications on edge devices, this paper also introduces BAE-NET-lite, which is a lightweight, streaming and efficient framework. Quantitative results demonstrate the superiority of BAE-Net in terms of both performance and computational efficiency when compared with existing state-of-the-art BWE methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2312.05724 [pdf, other]

Minimum-Time Trajectory Optimization With Data-Based Models: A Linear Programming Approach

Authors: Nan Li, Ehsan Taheri, Ilya Kolmanovsky, Dimitar Filev

Abstract: In this paper, we develop a computationally-efficient approach to minimum-time trajectory optimization using input-output data-based models, to produce an end-to-end data-to-control solution to time-optimal planning/control of dynamic systems and hence facilitate their autonomous operation. The approach integrates a non-parametric data-based model for trajectory prediction and a continuous optimiz… ▽ More In this paper, we develop a computationally-efficient approach to minimum-time trajectory optimization using input-output data-based models, to produce an end-to-end data-to-control solution to time-optimal planning/control of dynamic systems and hence facilitate their autonomous operation. The approach integrates a non-parametric data-based model for trajectory prediction and a continuous optimization formulation based on an exponential weighting scheme for minimum-time trajectory planning. The optimization problem in its final form is a linear program and is easy to solve. We validate the approach and illustrate its application with a spacecraft relative motion planning problem. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 11 pages, 4 figures

arXiv:2312.05332 [pdf, other]

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

Authors: Yiwen Lu, Zishuo Li, Yihan Zhou, Na Li, Yilin Mo

Abstract: In this paper, we introduce a new class of parameterized controllers, drawing inspiration from Model Predictive Control (MPC). The controller resembles a Quadratic Programming (QP) solver of a linear MPC problem, with the parameters of the controller being trained via Deep Reinforcement Learning (DRL) rather than derived from system models. This approach addresses the limitations of common control… ▽ More In this paper, we introduce a new class of parameterized controllers, drawing inspiration from Model Predictive Control (MPC). The controller resembles a Quadratic Programming (QP) solver of a linear MPC problem, with the parameters of the controller being trained via Deep Reinforcement Learning (DRL) rather than derived from system models. This approach addresses the limitations of common controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture used in DRL, in terms of verifiability and performance guarantees, and the learned controllers possess verifiable properties like persistent feasibility and asymptotic stability akin to MPC. On the other hand, numerical examples illustrate that the proposed controller empirically matches MPC and MLP controllers in terms of control performance and has superior robustness against modeling uncertainty and noises. Furthermore, the proposed controller is significantly more computationally efficient compared to MPC and requires fewer parameters to learn than MLP controllers. Real-world experiments on vehicle drift maneuvering task demonstrate the potential of these controllers for robotics and other demanding control tasks. △ Less

Submitted 9 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.02935 [pdf, ps, other]

doi 10.1109/LWC.2023.3297231

Channel Estimation and Training Design for Active RIS Aided Wireless Communications

Authors: Hao Chen, Nanxi Li, Ruizhe Long, Ying-Chang Liang

Abstract: Active reconfigurable intelligent surface (ARIS) is a newly emerging RIS technique that leverages radio frequency (RF) reflection amplifiers to empower phase-configurable reflection elements (REs) in amplifying the incident signal. Thereby, ARIS can enhance wireless communications with the strengthened ARIS-aided links. In this letter, we propose exploiting the signal amplification capability of A… ▽ More Active reconfigurable intelligent surface (ARIS) is a newly emerging RIS technique that leverages radio frequency (RF) reflection amplifiers to empower phase-configurable reflection elements (REs) in amplifying the incident signal. Thereby, ARIS can enhance wireless communications with the strengthened ARIS-aided links. In this letter, we propose exploiting the signal amplification capability of ARIS for channel estimation, aiming to improve the estimation precision. Nevertheless, the signal amplification inevitably introduces the thermal noise at the ARIS, which can hinder the acquisition of accurate channel state information (CSI) with conventional channel estimation methods based on passive RIS (PRIS). To address this issue, we further investigate this ARIS-specific channel estimation problem and propose a least-square (LS) based channel estimator, whose performance can be further improved with the design on ARIS reflection patterns at the channel training phase. Based on the proposed LS channel estimator, we optimize the training reflection patterns to minimize the channel estimation error variance. Extensive simulation results show that our proposed design can achieve accurate channel estimation in the presence of the ARIS noises. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: This paper has been accepted for publication in IEEE Wireless Communications Letters

Journal ref: IEEE Wireless Communications Letters, early access, 2023

arXiv:2310.12507 [pdf, other]

Multi-granularity Backprojection Transformer for Remote Sensing Image Super-Resolution

Authors: Jinglei Hao, Wukai Li, Binglu Wang, Shunzhou Wang, Yuting Lu, Ning Li, Yongqiang Zhao

Abstract: Backprojection networks have achieved promising super-resolution performance for nature images but not well be explored in the remote sensing image super-resolution (RSISR) field due to the high computation costs. In this paper, we propose a Multi-granularity Backprojection Transformer termed MBT for RSISR. MBT incorporates the backprojection learning strategy into a Transformer framework. It cons… ▽ More Backprojection networks have achieved promising super-resolution performance for nature images but not well be explored in the remote sensing image super-resolution (RSISR) field due to the high computation costs. In this paper, we propose a Multi-granularity Backprojection Transformer termed MBT for RSISR. MBT incorporates the backprojection learning strategy into a Transformer framework. It consists of Scale-aware Backprojection-based Transformer Layers (SPTLs) for scale-aware low-resolution feature learning and Context-aware Backprojection-based Transformer Blocks (CPTBs) for hierarchical feature learning. A backprojection-based reconstruction module (PRM) is also introduced to enhance the hierarchical features for image reconstruction. MBT stands out by efficiently learning low-resolution features without excessive modules for high-resolution processing, resulting in lower computational resources. Experiment results on UCMerced and AID datasets demonstrate that MBT obtains state-of-the-art results compared to other leading methods. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.05508 [pdf, other]

A Comparison between Markov Chain and Koopman Operator Based Data-Driven Modeling of Dynamical Systems

Authors: Saeid Tafazzol, Nan Li, Ilya Kolmanovsky, Dimitar Filev

Abstract: Markov chain-based modeling and Koopman operator-based modeling are two popular frameworks for data-driven modeling of dynamical systems. They share notable similarities from a computational and practitioner's perspective, especially for modeling autonomous systems. The first part of this paper aims to elucidate these similarities. For modeling systems with control inputs, the models produced by t… ▽ More Markov chain-based modeling and Koopman operator-based modeling are two popular frameworks for data-driven modeling of dynamical systems. They share notable similarities from a computational and practitioner's perspective, especially for modeling autonomous systems. The first part of this paper aims to elucidate these similarities. For modeling systems with control inputs, the models produced by the two approaches differ. The second part of this paper introduces these models and their corresponding control design methods. We illustrate the two approaches and compare them in terms of model accuracy and computational efficiency for both autonomous and controlled systems in numerical examples. △ Less

Submitted 1 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.12596 [pdf, ps, other]

Movable Antenna-Empowered AirComp

Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Xiaoming She, Chongjun Ouyang, Peng Chen

Abstract: A novel over-the-air computation (AirComp) framework, empowered by the incorporation of movable antennas (MAs), is proposed to significantly enhance computation accuracy. Within this framework, the joint optimization of transmit power control, antenna positioning, and receive combining is investigated. An efficient method is proposed to tackle the problem of computation mean-squared error (MSE) mi… ▽ More A novel over-the-air computation (AirComp) framework, empowered by the incorporation of movable antennas (MAs), is proposed to significantly enhance computation accuracy. Within this framework, the joint optimization of transmit power control, antenna positioning, and receive combining is investigated. An efficient method is proposed to tackle the problem of computation mean-squared error (MSE) minimization, capitalizing on the approach of alternating optimization. Numerical results are provided to substantiate the superior MSE performance of the proposed framework, which establish its clear advantage over benchmark systems employing conventional fixed-position antennas (FPAs). △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11135 [pdf, ps, other]

Sum-Rate Maximization for Movable Antenna Enabled Multiuser Communications

Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Chongjun Ouyang

Abstract: A novel multiuser communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the downlink sum-rate. The joint optimization of the transmit beamforming vector and transmit MA positions is studied for a multiuser multiple-input single-input system. An efficient algorithm is proposed to tackle the formulated non-convex problem via cap… ▽ More A novel multiuser communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the downlink sum-rate. The joint optimization of the transmit beamforming vector and transmit MA positions is studied for a multiuser multiple-input single-input system. An efficient algorithm is proposed to tackle the formulated non-convex problem via capitalizing on fractional programming, alternating optimization, and gradient descent methods. To strike a better performance-complexity trade-off, a zero-forcing beamforming-based design is also proposed as an alternative. Numerical investigations are presented to verify the efficiency of the proposed algorithms and their superior performance compared with the benchmark relying on conventional fixed-position antennas (FPAs). △ Less

Submitted 22 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: 11 pages

arXiv:2309.05929 [pdf]

Introducing Shape Prior Module in Diffusion Model for Medical Image Segmentation

Authors: Zhiqing Zhang, Guojia Fan, Tianyong Liu, Nan Li, Yuyang Liu, Ziyu Liu, Canwei Dong, Shoujun Zhou

Abstract: Medical image segmentation is critical for diagnosing and treating spinal disorders. However, the presence of high noise, ambiguity, and uncertainty makes this task highly challenging. Factors such as unclear anatomical boundaries, inter-class similarities, and irrational annotations contribute to this challenge. Achieving both accurate and diverse segmentation templates is essential to support ra… ▽ More Medical image segmentation is critical for diagnosing and treating spinal disorders. However, the presence of high noise, ambiguity, and uncertainty makes this task highly challenging. Factors such as unclear anatomical boundaries, inter-class similarities, and irrational annotations contribute to this challenge. Achieving both accurate and diverse segmentation templates is essential to support radiologists in clinical practice. In recent years, denoising diffusion probabilistic modeling (DDPM) has emerged as a prominent research topic in computer vision. It has demonstrated effectiveness in various vision tasks, including image deblurring, super-resolution, anomaly detection, and even semantic representation generation at the pixel level. Despite the robustness of existing diffusion models in visual generation tasks, they still struggle with discrete masks and their various effects. To address the need for accurate and diverse spine medical image segmentation templates, we propose an end-to-end framework called VerseDiff-UNet, which leverages the denoising diffusion probabilistic model (DDPM). Our approach integrates the diffusion model into a standard U-shaped architecture. At each step, we combine the noise-added image with the labeled mask to guide the diffusion direction accurately towards the target region. Furthermore, to capture specific anatomical a priori information in medical images, we incorporate a shape a priori module. This module efficiently extracts structural semantic information from the input spine images. We evaluate our method on a single dataset of spine images acquired through X-ray imaging. Our results demonstrate that VerseDiff-UNet significantly outperforms other state-of-the-art methods in terms of accuracy while preserving the natural features and variations of anatomy. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.03471 [pdf, other]

Resource Management for IRS-assisted WP-MEC Networks with Practical Phase Shift Model

Authors: Nana Li, Wanming Hao, Fuhui Zhou, Zheng Chu, Shouyi Yang, Pei Xiao

Abstract: Wireless powered mobile edge computing (WP-MEC) has been recognized as a promising solution to enhance the computational capability and sustainable energy supply for low-power wireless devices (WDs). However, when the communication links between the hybrid access point (HAP) and WDs are hostile, the energy transfer efficiency and task offloading rate are compromised. To tackle this problem, we pro… ▽ More Wireless powered mobile edge computing (WP-MEC) has been recognized as a promising solution to enhance the computational capability and sustainable energy supply for low-power wireless devices (WDs). However, when the communication links between the hybrid access point (HAP) and WDs are hostile, the energy transfer efficiency and task offloading rate are compromised. To tackle this problem, we propose to employ multiple intelligent reflecting surfaces (IRSs) to WP-MEC networks. Based on the practical IRS phase shift model, we formulate a total computation rate maximization problem by jointly optimizing downlink/uplink IRSs passive beamforming, downlink energy beamforming and uplink multi-user detection (MUD) vector at HAPs, task offloading power and local computing frequency of WDs, and the time slot allocation. Specifically, we first derive the optimal time allocation for downlink wireless energy transmission (WET) to IRSs and the corresponding energy beamforming. Next, with fixed time allocation for the downlink WET to WDs, the original optimization problem can be divided into two independent subproblems. For the WD charging subproblem, the optimal IRSs passive beamforming is derived by utilizing the successive convex approximation (SCA) method and the penalty-based optimization technique, and for the offloading computing subproblem, we propose a joint optimization framework based on the fractional programming (FP) method. Finally, simulation results validate that our proposed optimization method based on the practical phase shift model can achieve a higher total computation rate compared to the baseline schemes. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 15 pages, 14 figures

arXiv:2308.03268 [pdf, other]

Towards Carbon-Free Electricity: A Flow-Based Framework for Power Grid Carbon Accounting and Decarbonization

Authors: Xin Chen, Hungpo Chao, Wenbo Shi, Na Li

Abstract: This paper introduces a comprehensive framework aimed at advancing research and policy development in the realm of decarbonization within electric power systems. The framework focuses on three key aspects: carbon accounting, carbon-aware decision-making, and carbon-electricity market design. It addresses existing problems, methods, and proposes solutions. In contrast to traditional pool-based emis… ▽ More This paper introduces a comprehensive framework aimed at advancing research and policy development in the realm of decarbonization within electric power systems. The framework focuses on three key aspects: carbon accounting, carbon-aware decision-making, and carbon-electricity market design. It addresses existing problems, methods, and proposes solutions. In contrast to traditional pool-based emissions models, our framework proposes a novel flow-based emissions model. This model incorporates the underlying physical power grid and power flows, allowing for accurate carbon accounting at both temporal and spatial scales. This, in turn, facilitates informed decision-making to achieve grid decarbonization goals. The framework is built on a flow-based accounting methodology and utilizes the carbon-aware optimal power flow (C-OPF) technique as a theoretical foundation for decarbonization decision-making. Additionally, the paper explores the potential design of carbon-electricity markets and pricing mechanisms to incentivize decentralized decarbonization actions. The critical issues of data availability, infrastructure development, and considerations of fairness and equity are also discussed. This paper seeks to advance scholarly understanding and foster progress toward achieving sustainable and carbon-free electric power systems. △ Less

Submitted 28 November, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2308.03240 [pdf, other]

Carbon-Aware Optimal Power Flow

Authors: Xin Chen, Andy Sun, Wenbo Shi, Na Li

Abstract: To facilitate effective decarbonization of the electric power sector, this paper introduces the generic Carbon-aware Optimal Power Flow (C-OPF) method for power system decision-making that considers demand-side carbon accounting and emission management. Built upon the classic optimal power flow (OPF) model, the C-OPF method incorporates carbon emission flow equations and constraints, as well as ca… ▽ More To facilitate effective decarbonization of the electric power sector, this paper introduces the generic Carbon-aware Optimal Power Flow (C-OPF) method for power system decision-making that considers demand-side carbon accounting and emission management. Built upon the classic optimal power flow (OPF) model, the C-OPF method incorporates carbon emission flow equations and constraints, as well as carbon-related objectives, to jointly optimize power flow and carbon flow. In particular, this paper establishes the feasibility and solution uniqueness of the carbon emission flow equations, and proposes modeling and linearization techniques to address the issues of undetermined power flow directions and bilinear terms in the C-OPF model. Additionally, two novel carbon emission models, together with the carbon accounting schemes, for energy storage systems are developed and integrated into the C-OPF model. Numerical simulations demonstrate the characteristics and effectiveness of the C-OPF method, in comparison with OPF solutions. △ Less

Submitted 17 July, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2307.00179 [pdf, other]

Unsupervised Coordinate-Based Video Denoising

Authors: Mary Damilola Aiyetigbo, Dineshchandar Ravichandran, Reda Chalhoub, Peter Kalivas, Nianyi Li

Abstract: In this paper, we introduce a novel unsupervised video denoising deep learning approach that can help to mitigate data scarcity issues and shows robustness against different noise patterns, enhancing its broad applicability. Our method comprises three modules: a Feature generator creating features maps, a Denoise-Net generating denoised but slightly blurry reference frames, and a Refine-Net re-int… ▽ More In this paper, we introduce a novel unsupervised video denoising deep learning approach that can help to mitigate data scarcity issues and shows robustness against different noise patterns, enhancing its broad applicability. Our method comprises three modules: a Feature generator creating features maps, a Denoise-Net generating denoised but slightly blurry reference frames, and a Refine-Net re-introducing high-frequency details. By leveraging the coordinate-based network, we can greatly simplify the network structure while preserving high-frequency details in the denoised video frames. Extensive experiments on both simulated and real-captured demonstrate that our method can effectively denoise real-world calcium imaging video sequences without prior knowledge of noise models and data augmentation during training. △ Less

Submitted 30 June, 2023; originally announced July 2023.

arXiv:2306.10369 [pdf, other]

Non-asymptotic System Identification for Linear Systems with Nonlinear Policies

Authors: Yingying Li, Tianpeng Zhang, Subhro Das, Jeff Shamma, Na Li

Abstract: This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input const… ▽ More This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input constraints. In this paper, we provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies as long as the generated state and action trajectories are bounded. This significantly generalizes the existing non-asymptotic guarantees for linear system identification, which usually consider i.i.d. random inputs or linear policies. Interestingly, our error bound is consistent with that for linear policies with respect to the dependence on the trajectory length, system dimensions, and excitation levels. Lastly, we demonstrate the applications of our results by safe learning with robust model predictive control and provide numerical analysis. △ Less

Submitted 17 June, 2023; originally announced June 2023.

arXiv:2305.17860 [pdf, other]

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Authors: Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang

Abstract: In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with d… ▽ More In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with different intensities. Furthermore, speech distortion and residual noise are often observed in enhanced speech, and the distortion of speech and noise is different. Most existing methods focus on fusing enhanced and noisy features to address this issue. In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input. Our proposed method can achieve better performance with a relative 8.6% CER reduction. △ Less

Submitted 30 May, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.10821 [pdf, other]

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

Authors: Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Abstract: Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available. However, most of them neglect to utilize speaker's 2-dimensional (2D) location cues contained in mixture signal, which limits the performance when two sources come from close directions. In this paper, we propose an end-to-end beamforming network for… ▽ More Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available. However, most of them neglect to utilize speaker's 2-dimensional (2D) location cues contained in mixture signal, which limits the performance when two sources come from close directions. In this paper, we propose an end-to-end beamforming network for 2D location guided speech separation merely given mixture signal. It first estimates discriminable direction and 2D location cues, which imply directions the sources come from in multi views of microphones and their 2D coordinates. These cues are then integrated into location-aware neural beamformer, thus allowing accurate reconstruction of two sources' speech signals. Experiments show that our proposed model not only achieves a comprehensive decent improvement compared to baseline systems, but avoids inferior performance on spatial overlapping cases. △ Less

Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Accepted by Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2212.03401

arXiv:2305.09991 [pdf, ps, other]

Angle-based formation stabilization and maneuvers in port-Hamiltonian form with bearing and velocity measurements

Authors: Ningbo Li, Pablo Borja, Arjan van der Schaft, Jacquelien M. A. Scherpen

Abstract: This paper proposes a port-Hamiltonian framework for angle-based formation stabilization and maneuvers using bearing and velocity measurements with an underlying triangulated Laman graph. The corresponding port-Hamiltonian controller is designed using virtual couplings on the errors of angle constraints in angle space and then the angle constraints and agent actuators are mapped by the constraint… ▽ More This paper proposes a port-Hamiltonian framework for angle-based formation stabilization and maneuvers using bearing and velocity measurements with an underlying triangulated Laman graph. The corresponding port-Hamiltonian controller is designed using virtual couplings on the errors of angle constraints in angle space and then the angle constraints and agent actuators are mapped by the constraint Jacobian, which can be applied to other formation constraints. In addition, due to the fact that the port-Hamiltonian model allows for complex and heterogeneous agent dynamics, our framework can be extended to networks with different agent dynamics and formation constraints. To avoid unavailable distance terms in the control law, an estimator is designed based on port-Hamiltonian theory and the property that energy is coordinate-free for different sensor modalities using bearing and velocity measurements, which permits our framework to inject damping for the formation maneuvers. Furthermore, several maneuvers are analyzed under both considerations of stabilization and transient performance. Simulations are performed to illustrate the effectiveness of the approach. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.09964 [pdf, ps, other]

A port-Hamiltonian framework for displacement-based and rigid formation tracking

Authors: Ningbo Li, Zhiyong Sun, Arjan van der Schaft, Jacquelien M. A. Scherpen

Abstract: This paper proposes a passivity-based port-Hamiltonian (pH) framework for multi-agent displacement-based and rigid formation control and velocity tracking. The control law consists of two parts, where the internal feedback is to track the velocity and the external feedback is to achieve formation stabilization by steering variables of neighboring agents that prescribe the desired geometric shape.… ▽ More This paper proposes a passivity-based port-Hamiltonian (pH) framework for multi-agent displacement-based and rigid formation control and velocity tracking. The control law consists of two parts, where the internal feedback is to track the velocity and the external feedback is to achieve formation stabilization by steering variables of neighboring agents that prescribe the desired geometric shape. Regarding the external feedback, a general framework is proposed for all kinds of formations by means of the advantage that the pH model is energy-based and coordinate-free. To solve the issue that the incidence matrix is not of full column rank over cyclic graphs, the matrix property is used to prove the convergence to the target sets for the displacement-based formation, while for rigid formations, the algebraic conditions of infinitesimal rigidity are investigated to achieve asymptotic local stability. Furthermore, the rigid formation with heterogeneous constraints is further investigated under this framework and the asymptotic local stability is proved under a mild assumption. Simulations are performed to illustrate the effectiveness of the framework. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2304.07984 [pdf, other]

A Unified Safety Protection and Extension Governor

Authors: Nan Li, Yutong Li, Ilya Kolmanovsky

Abstract: In this paper, we propose a supervisory control scheme that unifies the abilities of safety protection and safety extension. It produces a control that is able to keep the system safe indefinitely when such a control exists. When such a control does not exist due to abnormal system states, it optimizes the control to maximize the time before any safety violation, which translates into more time to… ▽ More In this paper, we propose a supervisory control scheme that unifies the abilities of safety protection and safety extension. It produces a control that is able to keep the system safe indefinitely when such a control exists. When such a control does not exist due to abnormal system states, it optimizes the control to maximize the time before any safety violation, which translates into more time to seek recovery and/or mitigate any harm. We describe the scheme and develop an approach that integrates the two capabilities into a single constrained optimization problem with only continuous variables. For linear systems with convex constraints, the problem reduces to a convex quadratic program and is easy to solve. We illustrate the proposed safety supervisor with an automotive example. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 8 pages, 4 figures

arXiv:2304.03677 [pdf, other]

Scheduling Dosage of Proton Pump Inhibitors Using Constrained Optimization With Gastric Acid Secretion Model

Authors: Yutong Li, Nan Li, Anouck Girard, Ilya Kolmanovsky

Abstract: Dosage schedule of the Proton Pump Inhibitors (PPIs) is critical for gastric acid disorder treatment. In this paper, we develop a constrained optimization based approach for scheduling the PPIs dosage. In particular, we exploit a mathematical prediction model describing the gastric acid secretion, and use it within the optimization algorithm to predict the acid level. The dosage of the PPIs which… ▽ More Dosage schedule of the Proton Pump Inhibitors (PPIs) is critical for gastric acid disorder treatment. In this paper, we develop a constrained optimization based approach for scheduling the PPIs dosage. In particular, we exploit a mathematical prediction model describing the gastric acid secretion, and use it within the optimization algorithm to predict the acid level. The dosage of the PPIs which is used to enforce acid level constraints is computed by solving a constrained optimization problem. Simulation results show that the proposed approach can successfully suppress the gastric acid level with less PPIs intake compared with the conventional fixed PPIs dosage regimen, which may reduce the long-term side effects of the PPIs. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.06858 [pdf, other]

Continuous-Time Zeroth-Order Dynamics with Projection Maps: Model-Free Feedback Optimization with Safety Guarantees

Authors: Xin Chen, Jorge I. Poveda, Na Li

Abstract: This paper introduces a class of model-free feedback methods for solving generic constrained optimization problems where the specific mathematical forms of the objective and constraint functions are not available. The proposed methods, termed Projected Zeroth-Order (P-ZO) dynamics, incorporate projection maps into a class of continuous-time model-free dynamics that make use of periodic dithering f… ▽ More This paper introduces a class of model-free feedback methods for solving generic constrained optimization problems where the specific mathematical forms of the objective and constraint functions are not available. The proposed methods, termed Projected Zeroth-Order (P-ZO) dynamics, incorporate projection maps into a class of continuous-time model-free dynamics that make use of periodic dithering for the purpose of gradient learning. In particular, the proposed P-ZO algorithms can be interpreted as new extremum-seeking algorithms that autonomously drive an unknown system toward a neighborhood of the set of solutions of an optimization problem using only output feedback, while systematically guaranteeing that the input trajectories remain in a feasible set for all times. In this way, the P-ZO algorithms can properly handle hard and asymptotical constraints in model-free optimization problems without using penalty terms or barrier functions. Moreover, the proposed dynamics have suitable robustness properties with respect to small bounded additive disturbances on the states and dynamics, a property that is fundamental for practical real-world implementations. Additional tracking results for time-varying and switching cost functions are also derived under stronger convexity and smoothness assumptions and using tools from hybrid dynamical systems. Numerical examples are presented throughout the paper to illustrate the above results. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 17 pages

arXiv:2301.08445 [pdf, other]

Online switching control with stability and regret guarantees

Authors: Yingying Li, James A. Preiss, Na Li, Yiheng Lin, Adam Wierman, Jeff Shamma

Abstract: This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stabi… ▽ More This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stability throughout the duration of its execution. We also provide a sublinear policy regret guarantee compared with the optimal stabilizing candidate controller. Lastly, we numerically test our algorithm on quadrotor planar flights and compare it with a classical switching control algorithm, falsification-based switching, and a classical multi-armed bandit algorithm, Exp3 with batches. △ Less

Submitted 23 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

arXiv:2301.05599 [pdf, other]

Short-length SSVEP data extension by a novel generative adversarial networks based framework

Authors: Yudong Pan, Ning Li, Yangsong Zhang, Peng Xu, Dezhong Yao

Abstract: Steady-state visual evoked potentials (SSVEPs) based brain-computer interface (BCI) has received considerable attention due to its high information transfer rate (ITR) and available quantity of targets. However, the performance of frequency identification methods heavily hinges on the amount of user calibration data and data length, which hinders the deployment in real-world applications. Recently… ▽ More Steady-state visual evoked potentials (SSVEPs) based brain-computer interface (BCI) has received considerable attention due to its high information transfer rate (ITR) and available quantity of targets. However, the performance of frequency identification methods heavily hinges on the amount of user calibration data and data length, which hinders the deployment in real-world applications. Recently, generative adversarial networks (GANs)-based data generation methods have been widely adopted to create synthetic electroencephalography (EEG) data, holds promise to address these issues. In this paper, we proposed a GAN-based end-to-end signal transformation network for Time-window length Extension, termed as TEGAN. TEGAN transforms short-length SSVEP signals into long-length artificial SSVEP signals. By incorporating a novel U-Net generator architecture and an auxiliary classifier into the network architecture, the TEGAN could produce conditioned features in the synthetic data. Additionally, we introduced a two-stage training strategy and the LeCam-divergence regularization term to regularize the training process of GAN during the network implementation. The proposed TEGAN was evaluated on two public SSVEP datasets (a 4-class dataset and a 12-class dataset). With the assistance of TEGAN, the performance of traditional frequency recognition methods and deep learning-based methods have been significantly improved under limited calibration data. And the classification performance gap of various frequency recognition methods has been narrowed. This study substantiates the feasibility of the proposed method to extend the data length for short-time SSVEP signals for developing a high-performance BCI system. The proposed GAN-based methods have the great potential of shortening the calibration time and cutting down the budget for various real-world BCI-based applications. △ Less

Submitted 2 October, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: 16 pages, 9 figures, 4 tables

arXiv:2211.16291 [pdf, other]

On Controller Reduction in Linear Quadratic Gaussian Control with Performance Bounds

Authors: Zhaolin Ren, Yang Zheng, Maryam Fazel, Na Li

Abstract: The problem of controller reduction has a rich history in control theory. Yet, many questions remain open. In particular, there exist very few results on the order reduction of general non-observer based controllers and the subsequent quantification of the closed-loop performance. Recent developments in model-free policy optimization for Linear Quadratic Gaussian (LQG) control have highlighted the… ▽ More The problem of controller reduction has a rich history in control theory. Yet, many questions remain open. In particular, there exist very few results on the order reduction of general non-observer based controllers and the subsequent quantification of the closed-loop performance. Recent developments in model-free policy optimization for Linear Quadratic Gaussian (LQG) control have highlighted the importance of this question. In this paper, we first propose a new set of sufficient conditions ensuring that a perturbed controller remains internally stabilizing. Based on this result, we illustrate how to perform order reduction of general non-observer based controllers using balanced truncation and modal truncation. We also provide explicit bounds on the LQG performance of the reduced-order controller. Furthermore, for single-input-single-output (SISO) systems, we introduce a new controller reduction technique by truncating unstable modes. We illustrate our theoretical results with numerical simulations. Our results will serve as valuable tools to design direct policy search algorithms for control problems with partial observations. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.12628 [pdf, other]

Safe Control and Learning Using Generalized Action Governor

Authors: Nan Li, Yutong Li, Ilya Kolmanovsky, Anouck Girard, H. Eric Tseng, Dimitar Filev

Abstract: This paper introduces the Generalized Action Governor, which is a supervisory scheme for augmenting a nominal closed-loop system with the capability of strictly handling constraints. After presenting its theory for general systems and introducing tailored design approaches for linear and discrete systems, we discuss its application to safe online learning, which aims to safely evolve control param… ▽ More This paper introduces the Generalized Action Governor, which is a supervisory scheme for augmenting a nominal closed-loop system with the capability of strictly handling constraints. After presenting its theory for general systems and introducing tailored design approaches for linear and discrete systems, we discuss its application to safe online learning, which aims to safely evolve control parameters using real-time data to improve performance for uncertain systems. In particular, we propose two safe learning algorithms based on integration of reinforcement learning/data-driven Koopman operator-based control with the generalized action governor. The developments are illustrated with a numerical example. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 10 pages, 4 figures

arXiv:2211.04767 [pdf, other]

Multimodal Remote Sensing Image Registration Based on Adaptive Multi-scale PIIFD

Authors: Ning Li, Yuxuan Li, Jichao jiao

Abstract: In recent years, due to the wide application of multi-sensor vision systems, multimodal image acquisition technology has continued to develop, and the registration problem based on multimodal images has gradually emerged. Most of the existing multimodal image registration methods are only suitable for two modalities, and cannot uniformly register multiple modal image data. Therefore, this paper pr… ▽ More In recent years, due to the wide application of multi-sensor vision systems, multimodal image acquisition technology has continued to develop, and the registration problem based on multimodal images has gradually emerged. Most of the existing multimodal image registration methods are only suitable for two modalities, and cannot uniformly register multiple modal image data. Therefore, this paper proposes a multimodal remote sensing image registration method based on adaptive multi-scale PIIFD(AM-PIIFD). This method extracts KAZE features, which can effectively retain edge feature information while filtering noise. Then adaptive multi-scale PIIFD is calculated for matching. Finally, the mismatch is removed through the consistency of the feature main direction, and the image alignment transformation is realized. The qualitative and quantitative comparisons with other three advanced methods shows that our method can achieve excellent performance in multimodal remote sensing image registration. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.05254 [pdf, other]

doi 10.1145/3552466.3556527

Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

Authors: Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang

Abstract: The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech. With our submitted system, this paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection). In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding featur… ▽ More The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech. With our submitted system, this paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection). In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features. To address track 1, low-quality data augmentation, domain adaptation via finetuning, and various complementary feature information fusion were aggregated in our system. Furthermore, we analyzed the clustering characteristics of subsystems with different features by visualization method and explained the effectiveness of our proposed greedy fusion strategy. As for track 2, frame transition and smoothing were detected using self-supervised learning structure to capture the manipulation of PF attacks in the time domain. We ranked 4th and 5th in track 1 and track 2, respectively. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: 7 pages, 1 figures, Accecpted by Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia

arXiv:2210.05092 [pdf, other]

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022

Authors: Xiaoyi Qin, Na Li, Yuke Lin, Yiwei Ding, Chao Weng, Dan Su, Ming Li

Abstract: This paper is the system description of the DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC22). In this challenge, we focus on track1 and track3. For track1, multiple backbone networks are adopted to extract frame-level features. Since track1 focus on the cross-age scenarios, we adopt the cross-age trials and perform QMF to calibrate score. The magnitude-based qualit… ▽ More This paper is the system description of the DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC22). In this challenge, we focus on track1 and track3. For track1, multiple backbone networks are adopted to extract frame-level features. Since track1 focus on the cross-age scenarios, we adopt the cross-age trials and perform QMF to calibrate score. The magnitude-based quality measures achieve a large improvement. For track3, the semi-supervised domain adaptation task, the pseudo label method is adopted to make domain adaptation. Considering the noise labels in clustering, the ArcFace is replaced by Sub-center ArcFace. The final submission achieves 0.107 mDCF in task1 and 7.135% EER in task3. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Showing 1–50 of 135 results for author: Li, N