Search | arXiv e-print repository

Uncertainty-Aware Mean Opinion Score Prediction

Authors: Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin

Abstract: Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In this paper, we point out that the absence of uncertainty modeling is a significant limitation hindering MOS prediction systems from applying to the real… ▽ More Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In this paper, we point out that the absence of uncertainty modeling is a significant limitation hindering MOS prediction systems from applying to the real and open world. We analyze the sources of uncertainty in the MOS prediction task and propose to establish an uncertainty-aware MOS prediction system that models aleatory uncertainty and epistemic uncertainty by heteroscedastic regression and Monte Carlo dropout separately. The experimental results show that the system captures uncertainty well and is capable of performing selective prediction and out-of-domain detection. Such capabilities significantly enhance the practical utility of MOS systems in diverse real and open-world environments. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: Accepted by Interspeech 2024, oral

arXiv:2408.10235 [pdf, other]

Multi-Source EEG Emotion Recognition via Dynamic Contrastive Domain Adaptation

Authors: Yun Xiao, Yimeng Zhang, Xiaopeng Peng, Shuzheng Han, Xia Zheng, Dingyi Fang, Xiaojiang Chen

Abstract: Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. To address these challenges, we introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA), which models coarse-grained inter-domain and fine-graine… ▽ More Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. To address these challenges, we introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA), which models coarse-grained inter-domain and fine-grained intra-class adaptations through a multi-branch contrastive neural network and contrastive sub-domain discrepancy learning. Our model leverages domain knowledge from each individual source and a complementary source ensemble and uses dynamically weighted learning to achieve an optimal tradeoff between domain transferability and discriminability. The proposed MS-DCDA model was evaluated using the SEED and SEED-IV datasets, achieving respectively the highest mean accuracies of $90.84\%$ and $78.49\%$ in cross-subject experiments as well as $95.82\%$ and $82.25\%$ in cross-session experiments. Our model outperforms several alternative domain adaptation methods in recognition accuracy, inter-class margin, and intra-class compactness. Our study also suggests greater emotional sensitivity in the frontal and parietal brain lobes, providing insights for mental health interventions, personalized medicine, and development of preventive strategies. △ Less

Submitted 3 August, 2024; originally announced August 2024.

arXiv:2408.03124 [pdf, other]

Closed-loop Diffusion Control of Complex Physical Systems

Authors: Long Wei, Haodong Feng, Peiyan Hu, Tao Zhang, Yuchen Yang, Xiang Zheng, Ruiqi Feng, Dixia Fan, Tailin Wu

Abstract: The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective contr… ▽ More The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective control of complex physical systems. In this paper, we propose a Closed-Loop Diffusion method for Physical systems Control (CL-DiffPhyCon). By adopting an asynchronous denoising schedule for different time steps, CL-DiffPhyCon generates control signals conditioned on real-time feedback from the environment. Thus, CL-DiffPhyCon is able to speed up diffusion control methods in a closed-loop framework. We evaluate CL-DiffPhyCon on the 1D Burgers' equation control and 2D incompressible fluid control tasks. The results demonstrate that CL-DiffPhyCon achieves notable control performance with significant sampling acceleration. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2408.02943 [pdf, other]

Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey

Authors: Wei Huo, Huiwen Yang, Nachuan Yang, Zhaohua Yang, Jiuzhou Zhang, Fuhai Nan, Xingzhou Chen, Yifan Mao, Suyang Hu, Pengyu Wang, Xuanyu Zheng, Mingming Zhao, Ling Shi

Abstract: The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex… ▽ More The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex demands of these emerging systems. As the volume of data continues to escalate, the integration of data-driven methods has become indispensable for enabling adaptive and intelligent control mechanisms in future wireless communication systems. This comprehensive survey explores recent advancements in data-driven methodologies applied to wireless communication networks. It focuses on developments over the past five years and their application to various control objectives within wireless cyber-physical systems. It encompasses critical areas such as link adaptation, user scheduling, spectrum allocation, beam management, power control, and the co-design of communication and control systems. We provide an in-depth exploration of the technical underpinnings that support these data-driven approaches, including the algorithms, models, and frameworks developed to enhance network performance and efficiency. We also examine the challenges that current data-driven algorithms face, particularly in the context of the dynamic and heterogeneous nature of next-generation wireless networks. The paper provides a critical analysis of these challenges and offers insights into potential solutions and future research directions. This includes discussing the adaptability, integration with 6G, and security of data-driven methods in the face of increasing network complexity and data volume. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2407.02007 [pdf, other]

SOT Triggered Neural Clustering for Speaker Attributed ASR

Authors: Xianrui Zheng, Guangzhi Sun, Chao Zhang, Philip C. Woodland

Abstract: This paper introduces a novel approach to speaker-attributed ASR transcription using a neural clustering method. With a parallel processing mechanism, diarisation and ASR can be applied simultaneously, helping to prevent the accumulation of errors from one sub-system to the next in a cascaded system. This is achieved by the use of ASR, trained using a serialised output training method, together wi… ▽ More This paper introduces a novel approach to speaker-attributed ASR transcription using a neural clustering method. With a parallel processing mechanism, diarisation and ASR can be applied simultaneously, helping to prevent the accumulation of errors from one sub-system to the next in a cascaded system. This is achieved by the use of ASR, trained using a serialised output training method, together with segment-level discriminative neural clustering (SDNC) to assign speaker labels. With SDNC, our system does not require an extra non-neural clustering method to assign speaker labels, thus allowing the entire system to be based on neural networks. Experimental results on the AMI meeting dataset demonstrate that SDNC outperforms spectral clustering (SC) by a 19% relative diarisation error rate (DER) reduction on the AMI Eval set. When compared with the cascaded system with SC, the parallel system with SDNC gives a 7%/4% relative improvement in cpWER on the Dev/Eval set. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: To appear in Interspeech 2024

arXiv:2406.03011 [pdf, other]

Huygens-Fresnel Model Based Position-Aided Phase Configuration for 1-Bit RIS Assisted Wireless Communication

Authors: Xiao Zheng, Wenchi Cheng, Jiangzhou Wang

Abstract: Reconfigurable intelligent surface (RIS), composed of nearly passive elements, is regarded as one of the potential paradigms to support multi-gigabit data in real-time. However, in traditional CSI (channel state information) driven frame, the training overhead of channel estimation greatly increases as the number of RIS elements increases to intelligently manipulate the reflected signals. To conve… ▽ More Reconfigurable intelligent surface (RIS), composed of nearly passive elements, is regarded as one of the potential paradigms to support multi-gigabit data in real-time. However, in traditional CSI (channel state information) driven frame, the training overhead of channel estimation greatly increases as the number of RIS elements increases to intelligently manipulate the reflected signals. To conveniently use the reflected signal without complex CSI feedback, in this paper we propose a position-aided phase configuration scheme based on the property of Fresnel zone. In particular, we design the impedance based discrete RIS elements with joint absorption mode and reflection mode considering the fabrication complexities, which integrated the property of the Fresnel zone to resist the impact of position error. Then, with joint absorption and 1-bit reflection mode elements, we develop the two-step position-aided ON/OFF states judgement (TPOSJ) scheme and the frame structure to control the ON/OFF state of RIS, followed by analyzing the impacts of mobility and position error on our proposed scheme. Also, we derive the Helmholtz-Kirchhoff integral theorem based power flow. Simulations show that the proposed scheme can manipulate the ON/OFF state intelligently without complex CSI, thus verifying the practical application of our proposed scheme. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 15 pages, accepted by IEEE TCOM (early access)

ACM Class: H.1.1

arXiv:2405.06125 [pdf]

Cooperative Route Guidance and Flow Control for Mixed Road Networks Comprising Expressway and Arterial Network

Authors: Yunran Di, Haotian Shi, Weihua Zhang, Heng Ding, Xiaoyan Zheng, Bin Ran

Abstract: Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission mo… ▽ More Facing the congestion challenges of mixed road networks comprising expressways and arterial road networks, traditional control solutions fall short. To effectively alleviate traffic congestion in mixed road networks, it is crucial to clear the interaction between expressways and arterial networks and achieve orderly coordination between them. This study employs the multi-class cell transmission model (CTM) combined with the macroscopic fundamental diagram (MFD) to model the traffic dynamics of expressway systems and arterial subregions, enabling vehicle path tracking across these two systems. Consequently, a comprehensive traffic transmission model suitable for mixed road networks has been integrated. Utilizing the SUMO software, a simulation platform for the mixed road network is established, and the average trip lengths within the model have been calibrated. Based on the proposed traffic model, this study constructs a route guidance model for mixed road networks and develops an integrated model predictive control (MPC) strategy that merges route guidance, perimeter control, and ramp metering to address the challenges of mixed road networks' traffic flow control. A case study of a scenario in which a bidirectional expressway connects two subregions is conducted, and the results validate the effectiveness of the proposed cooperative guidance and control (CGC) method in reducing overall congestion in mixed road networks. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.19242 [pdf, other]

A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial and decentering distortions of the lens to improve the accuracy of stereo vision systems and simplify their calibration process. In addition, we present an easy and flexible calibration method for the MDM of stereo vision systems with a commonly used planar pattern, which requires cameras to observe the planar pattern in different orientations. The proposed technique is easy to use and flexible compared with classical calibration techniques for depth-dependent distortion models in which the lens must be perpendicular to the planar pattern. The experimental validation of the MDM and its calibration method showed that the MDM improved the calibration accuracy by 56.55% and 74.15% compared with the Li's distortion model and traditional Brown's distortion model. Besides, an iteration-based reconstruction method is proposed to iteratively estimate the depth information in the MDM during three-dimensional reconstruction. The results showed that the accuracy of the iteration-based reconstruction method was improved by 9.08% compared with that of the non-iteration reconstruction method. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

arXiv:2403.13346 [pdf, other]

A Control-Recoverable Added-Noise-based Privacy Scheme for LQ Control in Networked Control Systems

Authors: Xuening Tang, Xianghui Cao, Wei Xing Zheng

Abstract: As networked control systems continue to evolve, ensuring the privacy of sensitive data becomes an increasingly pressing concern, especially in situations where the controller is physically separated from the plant. In this paper, we propose a secure control scheme for computing linear quadratic control in a networked control system utilizing two networked controllers, a privacy encoder and a cont… ▽ More As networked control systems continue to evolve, ensuring the privacy of sensitive data becomes an increasingly pressing concern, especially in situations where the controller is physically separated from the plant. In this paper, we propose a secure control scheme for computing linear quadratic control in a networked control system utilizing two networked controllers, a privacy encoder and a control restorer. Specifically, the encoder generates two state signals blurred with random noise and sends them to the controllers, while the restorer reconstructs the correct control signal. The proposed design effectively preserves the privacy of the control system's state without sacrificing the control performance. We theoretically quantify the privacy-preserving performance in terms of the state estimation error of the controllers and the disclosure probability. Additionally, the proposed privacy-preserving scheme is also proven to satisfy differential privacy. Moreover, we extend the proposed privacy-preserving scheme and evaluation method to cases where collusion between two controllers occurs. Finally, we verify the validity of our proposed scheme through simulations. △ Less

Submitted 22 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.01808 [pdf, other]

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

arXiv:2401.11349 [pdf, other]

Asynchronous Parallel Reinforcement Learning for Optimizing Propulsive Performance in Fin Ray Control

Authors: Xin-Yang Liu, Dariush Bodaghi, Qian Xue, Xudong Zheng, Jian-Xun Wang

Abstract: Fish fin rays constitute a sophisticated control system for ray-finned fish, facilitating versatile locomotion within complex fluid environments. Despite extensive research on the kinematics and hydrodynamics of fish locomotion, the intricate control strategies in fin-ray actuation remain largely unexplored. While deep reinforcement learning (DRL) has demonstrated potential in managing complex non… ▽ More Fish fin rays constitute a sophisticated control system for ray-finned fish, facilitating versatile locomotion within complex fluid environments. Despite extensive research on the kinematics and hydrodynamics of fish locomotion, the intricate control strategies in fin-ray actuation remain largely unexplored. While deep reinforcement learning (DRL) has demonstrated potential in managing complex nonlinear dynamics; its trial-and-error nature limits its application to problems involving computationally demanding environmental interactions. This study introduces a cutting-edge off-policy DRL algorithm, interacting with a fluid-structure interaction (FSI) environment to acquire intricate fin-ray control strategies tailored for various propulsive performance objectives. To enhance training efficiency and enable scalable parallelism, an innovative asynchronous parallel training (APT) strategy is proposed, which fully decouples FSI environment interactions and policy/value network optimization. The results demonstrated the success of the proposed method in discovering optimal complex policies for fin-ray actuation control, resulting in a superior propulsive performance compared to the optimal sinusoidal actuation function identified through a parametric grid search. The merit and effectiveness of the APT approach are also showcased through comprehensive comparison with conventional DRL training strategies in numerical experiments of controlling nonlinear dynamics. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 37 pages, 12 figures

arXiv:2312.13722 [pdf, other]

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

Authors: Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu

Abstract: Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose… ▽ More Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose a novel streaming adaptive bandwidth extension solution dubbed BAE-Net, which is suitable to handle the low-resolution speech with unknown and varying effective bandwidth. To address the challenges of recovering both the high-frequency magnitude and phase speech content blindly, we devise a dual-stream architecture that incorporates the magnitude inpainting and phase refinement. For potential applications on edge devices, this paper also introduces BAE-NET-lite, which is a lightweight, streaming and efficient framework. Quantitative results demonstrate the superiority of BAE-Net in terms of both performance and computational efficiency when compared with existing state-of-the-art BWE methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2311.12840 [pdf, other]

Wafer Map Defect Patterns Semi-Supervised Classification Using Latent Vector Representation

Authors: Qiyu Wei, Wei Zhao, Xiaoyan Zheng, Zeng Zeng

Abstract: As the globalization of semiconductor design and manufacturing processes continues, the demand for defect detection during integrated circuit fabrication stages is becoming increasingly critical, playing a significant role in enhancing the yield of semiconductor products. Traditional wafer map defect pattern detection methods involve manual inspection using electron microscopes to collect sample i… ▽ More As the globalization of semiconductor design and manufacturing processes continues, the demand for defect detection during integrated circuit fabrication stages is becoming increasingly critical, playing a significant role in enhancing the yield of semiconductor products. Traditional wafer map defect pattern detection methods involve manual inspection using electron microscopes to collect sample images, which are then assessed by experts for defects. This approach is labor-intensive and inefficient. Consequently, there is a pressing need to develop a model capable of automatically detecting defects as an alternative to manual operations. In this paper, we propose a method that initially employs a pre-trained VAE model to obtain the fault distribution information of the wafer map. This information serves as guidance, combined with the original image set for semi-supervised model training. During the semi-supervised training, we utilize a teacher-student network for iterative learning. The model presented in this paper is validated on the benchmark dataset WM-811K wafer dataset. The experimental results demonstrate superior classification accuracy and detection performance compared to state-of-the-art models, fulfilling the requirements for industrial applications. Compared to the original architecture, we have achieved significant performance improvement. △ Less

Submitted 6 October, 2023; originally announced November 2023.

Comments: 6 pages, 2 figures, CIS confernece

arXiv:2310.15417 [pdf, other]

A Semantic-driven Approach for Maintenance Digitalization in the Pharmaceutical Industry

Authors: Ju Wu, Xiaochen Zheng, Marco Madlena, Dimitrios Kyritsis

Abstract: The digital transformation of pharmaceutical industry is a challenging task due to the high complexity of involved elements and the strict regulatory compliance. Maintenance activities in the pharmaceutical industry play an essential role in ensuring product quality and integral functioning of equipment and premises. This paper first identifies the key challenges of digitalization in pharmaceutica… ▽ More The digital transformation of pharmaceutical industry is a challenging task due to the high complexity of involved elements and the strict regulatory compliance. Maintenance activities in the pharmaceutical industry play an essential role in ensuring product quality and integral functioning of equipment and premises. This paper first identifies the key challenges of digitalization in pharmaceutical industry and creates the corresponding problem space for key involved elements. A literature review is conducted to investigate the mainstream maintenance strategies, digitalization models, tools and official guidance from authorities in pharmaceutical industry. Based on the review result, a semantic-driven digitalization framework is proposed aiming to improve the digital continuity and cohesion of digital resources and technologies for maintenance activities in the pharmaceutical industry. A case study is conducted to verify the feasibility of the proposed framework based on the water sampling activities in Merck Serono facility in Switzerland. A tool-chain is presented to enable the functional modules of the framework. Some of the key functional modules within the framework are implemented and have demonstrated satisfactory performance. As one of the outcomes, a digital sampling assistant with web-based services is created to support the automated workflow of water sampling activities. The implementation result proves the potential of the proposed framework to solve the identified problems of maintenance digitalization in the pharmaceutical industry. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.04791 [pdf, other]

Conditional Diffusion Model for Target Speaker Extraction

Authors: Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland

Abstract: We propose DiffSpEx, a generative target speaker extraction method based on score-based generative modelling through stochastic differential equations. DiffSpEx deploys a continuous-time stochastic diffusion process in the complex short-time Fourier transform domain, starting from the target speaker source and converging to a Gaussian distribution centred on the mixture of sources. For the reverse… ▽ More We propose DiffSpEx, a generative target speaker extraction method based on score-based generative modelling through stochastic differential equations. DiffSpEx deploys a continuous-time stochastic diffusion process in the complex short-time Fourier transform domain, starting from the target speaker source and converging to a Gaussian distribution centred on the mixture of sources. For the reverse-time process, a parametrised score function is conditioned on a target speaker embedding to extract the target speaker from the mixture of sources. We utilise ECAPA-TDNN target speaker embeddings and condition the score function alternately on the SDE time embedding and the target speaker embedding. The potential of DiffSpEx is demonstrated with the WSJ0-2mix dataset, achieving an SI-SDR of 12.9 dB and a NISQA score of 3.56. Moreover, we show that fine-tuning a pre-trained DiffSpEx model to a specific speaker further improves performance, enabling personalisation in target speaker extraction. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: 5 pages, 4 figures, submitted to ICASSP 2024

arXiv:2308.16488 [pdf, other]

doi 10.21437/Interspeech.2023-851

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

Authors: Hui Wang, Shiwan Zhao, Xiguang Zheng, Yong Qin

Abstract: Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality of the synthetic speech. While recent approaches using pre-trained self-supervised learning (SSL) models have shown promising results, they only partly address the data scarcity issue for the feature extractor. This leaves the data scarcity issue for the decoder unresolved and leading to suboptimal performa… ▽ More Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality of the synthetic speech. While recent approaches using pre-trained self-supervised learning (SSL) models have shown promising results, they only partly address the data scarcity issue for the feature extractor. This leaves the data scarcity issue for the decoder unresolved and leading to suboptimal performance. To address this challenge, we propose a retrieval-augmented MOS prediction method, dubbed {\bf RAMP}, to enhance the decoder's ability against the data scarcity issue. A fusing network is also proposed to dynamically adjust the retrieval scope for each instance and the fusion weights based on the predictive confidence. Experimental results show that our proposed method outperforms the existing methods in multiple scenarios. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by Interspeech 2023, oral

Journal ref: INTERSPEECH 2023, 1095-1099

arXiv:2308.07127 [pdf, other]

A Lightweight Sensor Scheduler Based on AoI Function for Remote State Estimation over Lossy Wireless Channels

Authors: Taige Chang, Xianghui Cao, Wei Xing Zheng

Abstract: This paper investigates the problem of sensor scheduling for remotely estimating the states of heterogeneous dynamical systems over resource-limited and lossy wireless channels. Considering the low time complexity and high versatility requirements of schedulers deployed on the transport layer, we propose a lightweight scheduler based on an Age of Information (AoI) function built with the tight sca… ▽ More This paper investigates the problem of sensor scheduling for remotely estimating the states of heterogeneous dynamical systems over resource-limited and lossy wireless channels. Considering the low time complexity and high versatility requirements of schedulers deployed on the transport layer, we propose a lightweight scheduler based on an Age of Information (AoI) function built with the tight scalar upper bound of the remote estimation error. We show that the proposed scheduler is indexable and sub-optimal. We derive an upper and a lower bound of the proposed scheduler and give stability conditions for estimation error. Numerical simulations demonstrate that, compared to existing policies, the proposed scheduler achieves estimation performance very close to the optimal at a much lower computation time. △ Less

Submitted 30 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2306.14143 [pdf, other]

Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines

Authors: Xiang Cheng, Haotian Zhang, Jianan Zhang, Shijian Gao, Sijiang Li, Ziwei Huang, Lu Bai, Zonghui Yang, Xinhu Zheng, Liuqing Yang

Abstract: In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feat… ▽ More In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feature characterization and facing a severe performance bottleneck in dynamic environments. To date, extensive surveys on ISAC have been conducted but are limited to summarizing RF-based radar sensing. Currently, some research efforts have been devoted to exploring multi-modal sensing-communication integration but still lack a comprehensive review. Therefore, we generalize the concept of ISAC inspired by human synesthesia to establish a unified framework of intelligent multi-modal sensing-communication integration and provide a comprehensive review under such a framework in this paper. The so-termed Synesthesia of Machines (SoM) gives the clearest cognition of such intelligent integration and details its paradigm for the first time. We commence by justifying the necessity of the new paradigm. Subsequently, we offer a definition of SoM and zoom into the detailed paradigm, which is summarized as three operation modes. To facilitate SoM research, we overview the prerequisite of SoM research, i.e., mixed multi-modal (MMM) datasets. Then, we introduce the mapping relationships between multi-modal sensing and communications. Afterward, we cover the technological review on SoM-enhance-based and SoM-concert-based applications. To corroborate the superiority of SoM, we also present simulation results related to dual-function waveform and predictive beamforming design. Finally, we propose some potential directions to inspire future research efforts. △ Less

Submitted 20 November, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: This paper has been accepted by IEEE Communications Surveys & Tutorials

arXiv:2306.05358 [pdf, other]

Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System

Authors: Jiwei Guan, Lei Pan, Chen Wang, Shui Yu, Longxiang Gao, Xi Zheng

Abstract: There are increasing concerns about malicious attacks on autonomous vehicles. In particular, inaudible voice command attacks pose a significant threat as voice commands become available in autonomous driving systems. How to empirically defend against these inaudible attacks remains an open question. Previous research investigates utilizing deep learning-based multimodal fusion for defense, without… ▽ More There are increasing concerns about malicious attacks on autonomous vehicles. In particular, inaudible voice command attacks pose a significant threat as voice commands become available in autonomous driving systems. How to empirically defend against these inaudible attacks remains an open question. Previous research investigates utilizing deep learning-based multimodal fusion for defense, without considering the model uncertainty in trustworthiness. As deep learning has been applied to increasingly sensitive tasks, uncertainty measurement is crucial in helping improve model robustness, especially in mission-critical scenarios. In this paper, we propose the Multimodal Fusion Framework (MFF) as an intelligent security system to defend against inaudible voice command attacks. MFF fuses heterogeneous audio-vision modalities using VGG family neural networks and achieves the detection accuracy of 92.25% in the comparative fusion method empirical study. Additionally, extensive experiments on audio-vision tasks reveal the model's uncertainty. Using Expected Calibration Errors, we measure calibration errors and Monte-Carlo Dropout to estimate the predictive distribution for the proposed models. Our findings show empirically to train robust multimodal models, improve standard accuracy and provide a further step toward interpretability. Finally, we discuss the pros and cons of our approach and its applicability for Advanced Driver Assistance Systems. △ Less

Submitted 29 May, 2023; originally announced June 2023.

arXiv:2306.01942 [pdf, other]

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

Authors: Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland

Abstract: End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite the large amount of training data, infrequent content words that occur in a particular task may still exhibit poor ASR performance, with contextual biasing a possible remedy. This paper investigates the effectiveness of neural c… ▽ More End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite the large amount of training data, infrequent content words that occur in a particular task may still exhibit poor ASR performance, with contextual biasing a possible remedy. This paper investigates the effectiveness of neural contextual biasing for Whisper combined with GPT-2. Specifically, this paper proposes integrating an adapted tree-constrained pointer generator (TCPGen) component for Whisper and a dedicated training scheme to dynamically adjust the final output without modifying any Whisper model parameters. Experiments across three datasets show a considerable reduction in errors on biasing words with a biasing list of 1000 words. Contextual biasing was more effective when applied to domain-specific data and can boost the performance of Whisper and GPT-2 without losing their generality. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: To appear in Interspeech 2023

arXiv:2305.04929 [pdf, other]

Impact of Climate Simulation Resolutions on Future Energy System Reliability Assessment: A Texas Case Study

Authors: Xiangtian Zheng, Le Xie, Kiyeob Lee, Dan Fu, Jiahan Wu, Ping Chang

Abstract: The reliability of energy systems is strongly influenced by the prevailing climate conditions. With the increasing prevalence of renewable energy sources, the interdependence between energy and climate systems has become even stronger. This study examines the impact of different spatial resolutions in climate modeling on energy grid reliability assessment, with the Texas interconnection between 20… ▽ More The reliability of energy systems is strongly influenced by the prevailing climate conditions. With the increasing prevalence of renewable energy sources, the interdependence between energy and climate systems has become even stronger. This study examines the impact of different spatial resolutions in climate modeling on energy grid reliability assessment, with the Texas interconnection between 2033 and 2043 serving as a pilot case study. Our preliminary findings indicate that while low-resolution climate simulations can provide a rough estimate of system reliability, high-resolution simulations can provide more informative assessment of low-adequacy extreme events. Furthermore, both high and low-resolution assessments suggest the need to prepare for severe blackout events in winter due to extremely low temperatures. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.04952 [pdf, other]

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

Authors: Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang

Abstract: Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents. To confront this challenge, we in this paper propose a novel BIQA pipeline based on the Transformer architecture, which achieves an efficient quality-aware feature representation with much fewer data. More specific… ▽ More Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents. To confront this challenge, we in this paper propose a novel BIQA pipeline based on the Transformer architecture, which achieves an efficient quality-aware feature representation with much fewer data. More specifically, we consider the traditional fine-tuning in BIQA as an interpretation of the pre-trained model. In this way, we further introduce a Transformer decoder to refine the perceptual information of the CLS token from different perspectives. This enables our model to establish the quality-aware feature manifold efficiently while attaining a strong generalization capability. Meanwhile, inspired by the subjective evaluation behaviors of human, we introduce a novel attention panel mechanism, which improves the model performance and reduces the prediction uncertainty simultaneously. The proposed BIQA method maintains a lightweight design with only one layer of the decoder, yet extensive experiments on eight standard BIQA datasets (both synthetic and authentic) demonstrate its superior performance to the state-of-the-art BIQA methods, i.e., achieving the SRCC values of 0.875 (vs. 0.859 in LIVEC) and 0.980 (vs. 0.969 in LIVE). △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted by AAAI 2023

arXiv:2304.00871 [pdf, other]

Self-Supervised Learning-Based Source Separation for Meeting Data

Authors: Yuang Li, Xianrui Zheng, Philip C. Woodland

Abstract: Source separation can improve automatic speech recognition (ASR) under multi-party meeting scenarios by extracting single-speaker signals from overlapped speech. Despite the success of self-supervised learning models in single-channel source separation, most studies have focused on simulated setups. In this paper, seven SSL models were compared on both simulated and real-world corpora. Then, we pr… ▽ More Source separation can improve automatic speech recognition (ASR) under multi-party meeting scenarios by extracting single-speaker signals from overlapped speech. Despite the success of self-supervised learning models in single-channel source separation, most studies have focused on simulated setups. In this paper, seven SSL models were compared on both simulated and real-world corpora. Then, we propose to integrate the best-performing model WavLM into an automatic transcription system through a novel iterative source selection method. To improve real-world performance, time-domain unsupervised mixture invariant training was adapted to the time-frequency domain. Experiments showed that in the transcription system when source separation was inserted before an ASR model fine-tuned on separated speech, absolute reductions of 1.9% and 1.5% in concatenated minimum-permutation word error rate for an unknown number of speakers (cpWER-us) were observed on the AMI dev and test sets. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: To appear in Proc. ICASSP2023

arXiv:2303.02308 [pdf, other]

A Physics-based and Data-driven Approach for Localized Statistical Channel Modeling

Authors: Shutao Zhang, Xinzhi Ning, Xi Zheng, Qingjiang Shi, Tsung-Hui Chang, Zhi-Quan Luo

Abstract: Localized channel modeling is crucial for offline performance optimization of 5G cellular networks, but the existing channel models are for general scenarios and do not capture local geographical structures. In this paper, we propose a novel physics-based and data-driven localized statistical channel modeling (LSCM), which is capable of sensing the physical geographical structures of the targeted… ▽ More Localized channel modeling is crucial for offline performance optimization of 5G cellular networks, but the existing channel models are for general scenarios and do not capture local geographical structures. In this paper, we propose a novel physics-based and data-driven localized statistical channel modeling (LSCM), which is capable of sensing the physical geographical structures of the targeted cellular environment. The proposed channel modeling solely relies on the reference signal receiving power (RSRP) of the user equipment, unlike the traditional methods which use full channel impulse response matrices. The key is to build the relationship between the RSRP and the channel's angular power spectrum. Based on it, we formulate the task of channel modeling as a sparse recovery problem where the non-zero entries of the sparse vector indicate the channel paths' powers and angles of departure. A computationally efficient weighted non-negative orthogonal matching pursuit (WNOMP) algorithm is devised for solving the formulated problem. Finally, experiments based on synthetic and real RSRP measurements are presented to examine the performance of the proposed method. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: the 34th International Teletraffic Congress (ITC), Shenzhen, China, 2022

arXiv:2212.14189 [pdf, other]

High Resolution Modeling and Analysis of Cryptocurrency Mining's Impact on Power Grids: Carbon Footprint, Reliability, and Electricity Price

Authors: Ali Menati, Xiangtian Zheng, Kiyeob Lee, Ranyu Shi, Pengwei Du, Chanan Singh, Le Xie

Abstract: Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-fa… ▽ More Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-factor impact of such large loads on carbon footprint, grid reliability, and electricity market price in the Texas grid. We release open-source high-resolution data to enable high-resolution modeling of influencing factors such as location and flexibility. We reveal that the per-megawatt-hour carbon footprint of cryptocurrency mining loads across locations can vary by as much as 50% of the crude system average estimate. We show that the flexibility of mining loads can significantly mitigate power shortages and market disruptions that can result from the deployment of mining loads. These findings suggest policymakers to facilitate the participation of large mining facilities in wholesale markets and require them to provide mandatory demand response. △ Less

Submitted 14 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: This paper has been accepted for publication in the journal of "Advances in Applied Energy"

arXiv:2212.04250 [pdf, other]

Adaptive Neural Network Backstepping Control Method for Aerial Manipulator Based on Variable Inertia Parameter Modeling

Authors: Hai Li, Zhan Li, Xiaolong Zheng, Jinhui Liu

Abstract: For the aerial manipulator that performs aerial work tasks, the actual operating environment it faces is very complex, and it is affected by internal and external multi-source disturbances. In this paper, to effectively improve the anti-disturbance control performance of the aerial manipulator, an adaptive neural network backstepping control method based on variable inertia parameter modeling is p… ▽ More For the aerial manipulator that performs aerial work tasks, the actual operating environment it faces is very complex, and it is affected by internal and external multi-source disturbances. In this paper, to effectively improve the anti-disturbance control performance of the aerial manipulator, an adaptive neural network backstepping control method based on variable inertia parameter modeling is proposed. Firstly, for the intense internal coupling disturbance, we analyze and model it from the perspective of the generation mechanism of the coupling disturbance, and derive the dynamics model of the aerial manipulator system and the coupling disturbance model based on the variable inertia parameters. Through the proposed coupling disturbance model, we can compensate the strong coupling disturbance in a way of feedforward. Then, the adaptive neural network is proposed and applid to estimate and compensate the additional disturbances, and the closed-loop controller is designed based on the backstepping control method. Finally, we verify the correctness of the proposed coupling disturbance model through physical experiment under a large range motion of the manipulator. Two sets of comparative simulation results also prove the accurate estimation of the proposed adaptive neural network for additional disturbances and the effectiveness and superiority of the proposed control method. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2211.15127 [pdf]

Safety-quantifiable Line Feature-based Monocular Visual Localization with 3D Prior Map

Authors: Xi Zheng, Weisong Wen, Li-Ta Hsu

Abstract: Accurate and safety-quantifiable localization is of great significance for safety-critical autonomous systems, such as unmanned ground vehicles (UGV) and unmanned aerial vehicles (UAV). The visual odometry-based method can provide accurate positioning in a short period but is subjected to drift over time. Moreover, the quantification of the safety of the localization solution (the error is bounded… ▽ More Accurate and safety-quantifiable localization is of great significance for safety-critical autonomous systems, such as unmanned ground vehicles (UGV) and unmanned aerial vehicles (UAV). The visual odometry-based method can provide accurate positioning in a short period but is subjected to drift over time. Moreover, the quantification of the safety of the localization solution (the error is bounded by a certain value) is still a challenge. To fill the gaps, this paper proposes a safety-quantifiable line feature-based visual localization method with a prior map. The visual-inertial odometry provides a high-frequency local pose estimation which serves as the initial guess for the visual localization. By obtaining a visual line feature pair association, a foot point-based constraint is proposed to construct the cost function between the 2D lines extracted from the real-time image and the 3D lines extracted from the high-precision prior 3D point cloud map. Moreover, a global navigation satellite systems (GNSS) receiver autonomous integrity monitoring (RAIM) inspired method is employed to quantify the safety of the derived localization solution. Among that, an outlier rejection (also well-known as fault detection and exclusion) strategy is employed via the weighted sum of squares residual with a Chi-squared probability distribution. A protection level (PL) scheme considering multiple outliers is derived and utilized to quantify the potential error bound of the localization solution in both position and rotation domains. The effectiveness of the proposed safety-quantifiable localization system is verified using the datasets collected in the UAV indoor and UGV outdoor environments. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.13440 [pdf, other]

Iterative Data Refinement for Self-Supervised MR Image Reconstruction

Authors: Xue Liu, Juan Zou, Xiawu Zheng, Cheng Li, Hairong Zheng, Shanshan Wang

Abstract: Magnetic Resonance Imaging (MRI) has become an important technique in the clinic for the visualization, detection, and diagnosis of various diseases. However, one bottleneck limitation of MRI is the relatively slow data acquisition process. Fast MRI based on k-space undersampling and high-quality image reconstruction has been widely utilized, and many deep learning-based methods have been develope… ▽ More Magnetic Resonance Imaging (MRI) has become an important technique in the clinic for the visualization, detection, and diagnosis of various diseases. However, one bottleneck limitation of MRI is the relatively slow data acquisition process. Fast MRI based on k-space undersampling and high-quality image reconstruction has been widely utilized, and many deep learning-based methods have been developed in recent years. Although promising results have been achieved, most existing methods require fully-sampled reference data for training the deep learning models. Unfortunately, fully-sampled MRI data are difficult if not impossible to obtain in real-world applications. To address this issue, we propose a data refinement framework for self-supervised MR image reconstruction. Specifically, we first analyze the reason of the performance gap between self-supervised and supervised methods and identify that the bias in the training datasets between the two is one major factor. Then, we design an effective self-supervised training data refinement method to reduce this data bias. With the data refinement, an enhanced self-supervised MR image reconstruction framework is developed to prompt accurate MR imaging. We evaluate our method on an in-vivo MRI dataset. Experimental results show that without utilizing any fully sampled MRI data, our self-supervised framework possesses strong capabilities in capturing image details and structures at high acceleration factors. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: 5 pages, 2 figures, 1 table

MSC Class: 68T10 ACM Class: I.4.5

arXiv:2211.07993 [pdf, other]

DIGEST: Deeply supervIsed knowledGE tranSfer neTwork learning for brain tumor segmentation with incomplete multi-modal MRI scans

Authors: Haoran Li, Cheng Li, Weijian Huang, Xiawu Zheng, Yan Xi, Shanshan Wang

Abstract: Brain tumor segmentation based on multi-modal magnetic resonance imaging (MRI) plays a pivotal role in assisting brain cancer diagnosis, treatment, and postoperative evaluations. Despite the achieved inspiring performance by existing automatic segmentation methods, multi-modal MRI data are still unavailable in real-world clinical applications due to quite a few uncontrollable factors (e.g. differe… ▽ More Brain tumor segmentation based on multi-modal magnetic resonance imaging (MRI) plays a pivotal role in assisting brain cancer diagnosis, treatment, and postoperative evaluations. Despite the achieved inspiring performance by existing automatic segmentation methods, multi-modal MRI data are still unavailable in real-world clinical applications due to quite a few uncontrollable factors (e.g. different imaging protocols, data corruption, and patient condition limitations), which lead to a large performance drop during practical applications. In this work, we propose a Deeply supervIsed knowledGE tranSfer neTwork (DIGEST), which achieves accurate brain tumor segmentation under different modality-missing scenarios. Specifically, a knowledge transfer learning frame is constructed, enabling a student model to learn modality-shared semantic information from a teacher model pretrained with the complete multi-modal MRI data. To simulate all the possible modality-missing conditions under the given multi-modal data, we generate incomplete multi-modal MRI samples based on Bernoulli sampling. Finally, a deeply supervised knowledge transfer loss is designed to ensure the consistency of the teacher-student structure at different decoding stages, which helps the extraction of inherent and effective modality representations. Experiments on the BraTS 2020 dataset demonstrate that our method achieves promising results for the incomplete multi-modal MR image segmentation task. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 4 pages,2 figures,2 tables

arXiv:2211.07966 [pdf, other]

Adaptive PromptNet For Auxiliary Glioma Diagnosis without Contrast-Enhanced MRI

Authors: Yeqi Wang, Weijian Huang, Cheng Li, Xiawu Zheng, Yusong Lin, Shanshan Wang

Abstract: Multi-contrast magnetic resonance imaging (MRI)-based automatic auxiliary glioma diagnosis plays an important role in the clinic. Contrast-enhanced MRI sequences (e.g., contrast-enhanced T1-weighted imaging) were utilized in most of the existing relevant studies, in which remarkable diagnosis results have been reported. Nevertheless, acquiring contrast-enhanced MRI data is sometimes not feasible d… ▽ More Multi-contrast magnetic resonance imaging (MRI)-based automatic auxiliary glioma diagnosis plays an important role in the clinic. Contrast-enhanced MRI sequences (e.g., contrast-enhanced T1-weighted imaging) were utilized in most of the existing relevant studies, in which remarkable diagnosis results have been reported. Nevertheless, acquiring contrast-enhanced MRI data is sometimes not feasible due to the patients physiological limitations. Furthermore, it is more time-consuming and costly to collect contrast-enhanced MRI data in the clinic. In this paper, we propose an adaptive PromptNet to address these issues. Specifically, a PromptNet for glioma grading utilizing only non-enhanced MRI data has been constructed. PromptNet receives constraints from features of contrast-enhanced MR data during training through a designed prompt loss. To further boost the performance, an adaptive strategy is designed to dynamically weight the prompt loss in a sample-based manner. As a result, PromptNet is capable of dealing with more difficult samples. The effectiveness of our method is evaluated on a widely-used BraTS2020 dataset, and competitive glioma grading performance on NE-MRI data is achieved. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 5 pages, 2 figures, 2 tables

MSC Class: 68T10 ACM Class: I.4.9

arXiv:2211.04584 [pdf, other]

Energy System Digitization in the Era of AI: A Three-Layered Approach towards Carbon Neutrality

Authors: Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui

Abstract: The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation. The transition to a carbon-neutral electric grid poses significant challenges to conventional paradigms of modern grid planning and operation.… ▽ More The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation. The transition to a carbon-neutral electric grid poses significant challenges to conventional paradigms of modern grid planning and operation. Much of the challenge arises from the scale of the decision making and the uncertainty associated with the energy supply and demand. Artificial Intelligence (AI) could potentially have a transformative impact on accelerating the speed and scale of carbon-neutral transition, as many decision making processes in the power grid can be cast as classic, though challenging, machine learning tasks. We point out that to amplify AI's impact on carbon-neutral transition of the electric energy systems, the AI algorithms originally developed for other applications should be tailored in three layers of technology, markets, and policy. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: To be published in Patterns (Cell Press)

arXiv:2210.11212 [pdf, ps, other]

Robust prescribed-time coordination control of cooperative-antagonistic networks with disturbances

Authors: Zhen-Hua Zhu, Huaiyu Wu, Zhi-Hong Guan, Zhi-Wei Liu, Yang Chen, Xiujuan Zheng

Abstract: This article targets at addressing the robust prescribed-time coordination control (PTCC) problems for single-integrator cooperative-antagonistic networks (CANs) with external disturbances under arbitrary fixed signed digraphs without any structural constraints. Toward this end, the PTCC problems for nominal single-integrator CANs without disturbances are first investigated and a fully distributed… ▽ More This article targets at addressing the robust prescribed-time coordination control (PTCC) problems for single-integrator cooperative-antagonistic networks (CANs) with external disturbances under arbitrary fixed signed digraphs without any structural constraints. Toward this end, the PTCC problems for nominal single-integrator CANs without disturbances are first investigated and a fully distributed control protocol with a time-varying gain, which grows to infinity as the time approaches the settling time, is proposed utilizing the relative states of neighboring agents. Then, based on the proposed control protocol for the nominal single-integrator CANs, a new second-order prescribed-time sliding mode control protocol is constructed to achieve accurate PTCC for single-integrator CANs in the presence of external disturbances. Using Lyapunov based analysis, sufficient conditions to guarantee the prescribed-time stability, bipartite consensus, interval bipartite consensus, and bipartite containment of single-integrator CANs without or with disturbances are, respectively, derived. In the end, numerical simulations are given to confirm the derived results. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 16 pages, 12 figures

arXiv:2210.01337 [pdf, ps, other]

Compressed CPD-Based Channel Estimation and Joint Beamforming for RIS-Assisted Millimeter Wave Communications

Authors: Xi Zheng, Jun Fang, Hongwei Wang, Peilan Wang, Hongbin Li

Abstract: We consider the problem of channel estimation and joint active and passive beamforming for reconfigurable intelligent surface (RIS) assisted millimeter wave (mmWave) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems. We show that, with a well-designed frame-based training protocol, the received pilot signal can be organized into a low-rank third-order… ▽ More We consider the problem of channel estimation and joint active and passive beamforming for reconfigurable intelligent surface (RIS) assisted millimeter wave (mmWave) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems. We show that, with a well-designed frame-based training protocol, the received pilot signal can be organized into a low-rank third-order tensor that admits a canonical polyadic decomposition (CPD). Based on this observation, we propose two CPD-based methods for estimating the cascade channels associated with different subcarriers. The proposed methods exploit the intrinsic low-rankness of the CPD formulation, which is a result of the sparse scattering characteristics of mmWave channels, and thus have the potential to achieve a significant training overhead reduction. Specifically, our analysis shows that the proposed methods have a sample complexity that scales quadratically with the sparsity of the cascade channel. Also, by utilizing the singular value decomposition-like structure of the effective channel, this paper develops a joint active and passive beamforming method based on the estimated cascade channels. Simulation results show that the proposed CPD-based channel estimation methods attain mean square errors that are close to the Cramer-Rao bound (CRB) and present a clear advantage over the compressed sensing-based method. In addition, the proposed joint beamforming method can effectively utilize the estimated channel parameters to achieve superior beamforming performance. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.16164

arXiv:2210.00902 [pdf]

AdaComm: Tracing Channel Dynamics for Reliable Cross-Technology Communication

Authors: Weiguo Wang, Xiaolong Zheng, Yuan He, Xiuzhen Guo

Abstract: Cross-Technology Communication (CTC) is an emerging technology to support direct communication between wireless devices that follow different standards. In spite of the many different proposals from the community to enable CTC, the performance aspect of CTC is an equally important problem but has seldom been studied before. We find this problem is extremely challenging, due to the following reason… ▽ More Cross-Technology Communication (CTC) is an emerging technology to support direct communication between wireless devices that follow different standards. In spite of the many different proposals from the community to enable CTC, the performance aspect of CTC is an equally important problem but has seldom been studied before. We find this problem is extremely challenging, due to the following reasons: on one hand, a link for CTC is essentially different from a conventional wireless link. The conventional link indicators like RSSI (received signal strength indicator) and SNR (signal to noise ratio) cannot be used to directly characterize a CTC link. On the other hand, the indirect indicators like PER (packet error rate), which is adopted by many existing CTC proposals, cannot capture the short-term link behavior. As a result, the existing CTC proposals fail to keep reliable performance under dynamic channel conditions. In order to address the above challenge, we in this paper propose AdaComm, a generic framework to achieve self-adaptive CTC in dynamic channels. Instead of reactively adjusting the CTC sender, AdaComm adopts online learning mechanism to adaptively adjust the decoding model at the CTC receiver. The self-adaptive decoding model automatically learns the effective features directly from the raw received signals that are embedded with the current channel state. With the lossless channel information, AdaComm further adopts the fine tuning and full training modes to cope with the continuous and abrupt channel dynamics. We implement AdaComm and integrate it with two existing CTC approaches that respectively employ CSI (channel state information) and RSSI as the information carrier. The evaluation results demonstrate that AdaComm can significantly reduce the SER (symbol error rate) by 72.9% and 49.2%, respectively, compared with the existing approaches. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.13645 [pdf, other]

PearNet: A Pearson Correlation-based Graph Attention Network for Sleep Stage Recognition

Authors: Jianchao Lu, Yuzhe Tian, Shuang Wang, Michael Sheng, Xi Zheng

Abstract: Sleep stage recognition is crucial for assessing sleep and diagnosing chronic diseases. Deep learning models, such as Convolutional Neural Networks and Recurrent Neural Networks, are trained using grid data as input, making them not capable of learning relationships in non-Euclidean spaces. Graph-based deep models have been developed to address this issue when investigating the external relationsh… ▽ More Sleep stage recognition is crucial for assessing sleep and diagnosing chronic diseases. Deep learning models, such as Convolutional Neural Networks and Recurrent Neural Networks, are trained using grid data as input, making them not capable of learning relationships in non-Euclidean spaces. Graph-based deep models have been developed to address this issue when investigating the external relationship of electrode signals across different brain regions. However, the models cannot solve problems related to the internal relationships between segments of electrode signals within a specific brain region. In this study, we propose a Pearson correlation-based graph attention network, called PearNet, as a solution to this problem. Graph nodes are generated based on the spatial-temporal features extracted by a hierarchical feature extraction method, and then the graph structure is learned adaptively to build node connections. Based on our experiments on the Sleep-EDF-20 and Sleep-EDF-78 datasets, PearNet performs better than the state-of-the-art baselines. △ Less

Submitted 16 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.00805 [pdf, other]

Multi-scale temporal-frequency attention for music source separation

Authors: Lianwu Chen, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu

Abstract: In recent years, deep neural networks (DNNs) based approaches have achieved the start-of-the-art performance for music source separation (MSS). Although previous methods have addressed the large receptive field modeling using various methods, the temporal and frequency correlations of the music spectrogram with repeated patterns have not been explicitly explored for the MSS task. In this paper, a… ▽ More In recent years, deep neural networks (DNNs) based approaches have achieved the start-of-the-art performance for music source separation (MSS). Although previous methods have addressed the large receptive field modeling using various methods, the temporal and frequency correlations of the music spectrogram with repeated patterns have not been explicitly explored for the MSS task. In this paper, a temporal-frequency attention module is proposed to model the spectrogram correlations along both temporal and frequency dimensions. Moreover, a multi-scale attention is proposed to effectively capture the correlations for music signal. The experimental results on MUSDB18 dataset show that the proposed method outperforms the existing state-of-the-art systems with 9.51 dB signal-to-distortion ratio (SDR) on separating the vocal stems, which is the primary practical application of MSS. △ Less

Submitted 1 September, 2022; originally announced September 2022.

arXiv:2208.04661 [pdf, other]

OL-DN: Online learning based dual-domain network for HEVC intra frame quality enhancement

Authors: Renwei Yang, Shuyuan Zhu, Xiaozhen Zheng, Bing Zeng

Abstract: Convolution neural network (CNN) based methods offer effective solutions for enhancing the quality of compressed image and video. However, these methods ignore using the raw data to enhance the quality. In this paper, we adopt the raw data in the quality enhancement for the HEVC intra-coded image by proposing an online learning-based method. When quality enhancement is demanded, we online train ou… ▽ More Convolution neural network (CNN) based methods offer effective solutions for enhancing the quality of compressed image and video. However, these methods ignore using the raw data to enhance the quality. In this paper, we adopt the raw data in the quality enhancement for the HEVC intra-coded image by proposing an online learning-based method. When quality enhancement is demanded, we online train our proposed model at encoder side and then use the parameters to update the model of decoder side. This method not only improves model performance, but also makes one model adoptable to multiple coding scenarios. Besides, quantization error in discrete cosine transform (DCT) coefficients is the root cause of various HEVC compression artifacts. Thus, we combine frequency domain priors to assist image reconstruction. We design a DCT based convolution layer, to produce DCT coefficients that are suitable for CNN learning. Experimental results show that our proposed online learning based dual-domain network (OL-DN) has achieved superior performance, compared with the state-of-the-art methods. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2208.04130 [pdf, other]

Reliability Analysis of Complex Multi-State System Based on Universal Generating Function and Bayesian Network

Authors: Xu Liu, Wen Yao, Xiaohu Zheng, Yingchun Xu

Abstract: In the complex multi-state system (MSS), reliability analysis is a significant research content, both for equipment design, manufacturing, usage and maintenance. Universal Generating Function (UGF) is an important method in the reliability analysis, which efficiently obtains the system reliability by a fast algebraic procedure. However, when structural relationships between subsystems or component… ▽ More In the complex multi-state system (MSS), reliability analysis is a significant research content, both for equipment design, manufacturing, usage and maintenance. Universal Generating Function (UGF) is an important method in the reliability analysis, which efficiently obtains the system reliability by a fast algebraic procedure. However, when structural relationships between subsystems or components are not clear or without explicit expressions, the UGF method is difficult to use or not applicable at all. Bayesian Network (BN) has a natural advantage in terms of uncertainty inference for the relationship without explicit expressions. For the number of components is extremely large, though, it has the defects of low efficiency. To overcome the respective defects of UGF and BN, a novel reliability analysis method called UGF-BN is proposed for the complex MSS. In the UGF-BN framework, the UGF method is firstly used to analyze the bottom components with a large number. Then probability distributions obtained are taken as the input of BN. Finally, the reliability of the complex MSS is modeled by the BN method. This proposed method improves the computational efficiency, especially for the MSS with the large number of bottom components. Besides, the aircraft reliability-based design optimization based on the UGF-BN method is further studied with budget constraints on mass, power, and cost. Finally, two cases are used to demonstrate and verify the proposed method. △ Less

Submitted 15 June, 2022; originally announced August 2022.

arXiv:2207.03852 [pdf, other]

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

Authors: Xianrui Zheng, Chao Zhang, Philip C. Woodland

Abstract: Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2.0 (W2V2), have become the backbone of many speech tasks. In this paper, to achieve speaker diarisation and speech recognition using a single model, a tandem multitask training (TMT) method is proposed to fine-tune W2V2. For speaker diarisation, the tasks of voice activity detection (VAD) and speaker classification… ▽ More Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2.0 (W2V2), have become the backbone of many speech tasks. In this paper, to achieve speaker diarisation and speech recognition using a single model, a tandem multitask training (TMT) method is proposed to fine-tune W2V2. For speaker diarisation, the tasks of voice activity detection (VAD) and speaker classification (SC) are required, and connectionist temporal classification (CTC) is used for ASR. The multitask framework implements VAD, SC, and ASR using an early layer, middle layer, and late layer of W2V2, which coincides with the order of segmenting the audio with VAD, clustering the segments based on speaker embeddings, and transcribing each segment with ASR. Experimental results on the augmented multi-party (AMI) dataset showed that using different W2V2 layers for VAD, SC, and ASR from the earlier to later layers for TMT not only saves computational cost, but also reduces diarisation error rates (DERs). Joint fine-tuning of VAD, SC, and ASR yielded 16%/17% relative reductions of DER with manual/automatic segmentation respectively, and consistent reductions in speaker attributed word error rate, compared to the baseline with separately fine-tuned models. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: To appear in Interspeech 2022

arXiv:2207.02464 [pdf, other]

A Learning System for Motion Planning of Free-Float Dual-Arm Space Manipulator towards Non-Cooperative Object

Authors: Shengjie Wang, Yuxue Cao, Xiang Zheng, Tao Zhang

Abstract: Recent years have seen the emergence of non-cooperative objects in space, like failed satellites and space junk. These objects are usually operated or collected by free-float dual-arm space manipulators. Thanks to eliminating the difficulties of modeling and manual parameter-tuning, reinforcement learning (RL) methods have shown a more promising sign in the trajectory planning of space manipulator… ▽ More Recent years have seen the emergence of non-cooperative objects in space, like failed satellites and space junk. These objects are usually operated or collected by free-float dual-arm space manipulators. Thanks to eliminating the difficulties of modeling and manual parameter-tuning, reinforcement learning (RL) methods have shown a more promising sign in the trajectory planning of space manipulators. Although previous studies demonstrate their effectiveness, they cannot be applied in tracking dynamic targets with unknown rotation (non-cooperative objects). In this paper, we proposed a learning system for motion planning of free-float dual-arm space manipulator (FFDASM) towards non-cooperative objects. Specifically, our method consists of two modules. Module I realizes the multi-target trajectory planning for two end-effectors within a large target space. Next, Module II takes as input the point clouds of the non-cooperative object to estimate the motional property, and then can predict the position of target points on an non-cooperative object. We leveraged the combination of Module I and Module II to track target points on a spinning object with unknown regularity successfully. Furthermore, the experiments also demonstrate the scalability and generalization of our learning system. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: 15 pages, 6 figures

arXiv:2206.00184 [pdf, other]

How Much Demand Flexibility Could Have Spared Texas from the 2021 Outage?

Authors: Dongqi Wu, Xiangtian Zheng, Ali Menati, Lane Smith, Bainan Xia, Yixing Xu, Chanan Singh, Le Xie

Abstract: The February 2021 Texas winter power outage has led to hundreds of deaths and billions of dollars in economic losses, largely due to the generation failure and record-breaking electric demand. In this paper, we study the scaling-up of demand flexibility as a means to avoid load shedding during such an extreme weather event. The three mechanisms considered are interruptible load, residential load r… ▽ More The February 2021 Texas winter power outage has led to hundreds of deaths and billions of dollars in economic losses, largely due to the generation failure and record-breaking electric demand. In this paper, we study the scaling-up of demand flexibility as a means to avoid load shedding during such an extreme weather event. The three mechanisms considered are interruptible load, residential load rationing, and incentive-based demand response. By simulating on a synthetic but realistic large-scale Texas grid model along with demand flexibility modeling and electricity outage data, we identify portfolios of mixing mechanisms that exactly avoid outages, which a single mechanism may fail due to decaying marginal effects. We also reveal a complementary relationship between interruptible load and residential load rationing and find nonlinear impacts of incentive-based demand response on the efficacy of other mechanisms. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: This paper has been submitted to a journal for review

arXiv:2205.05180 [pdf, other]

Massively Digitized Power Grid: Opportunities and Challenges of Use-inspired AI

Authors: Le Xie, Xiangtian Zheng, Yannan Sun, Tong Huang, Tony Bruton

Abstract: This article presents a use-inspired perspective of the opportunities and challenges in a massively digitized power grid. It argues that the intricate interplay of data availability, computing capability, and artificial intelligence (AI) algorithm development are the three key factors driving the adoption of digitized solutions in the power grid. The impact of these three factors on critical funct… ▽ More This article presents a use-inspired perspective of the opportunities and challenges in a massively digitized power grid. It argues that the intricate interplay of data availability, computing capability, and artificial intelligence (AI) algorithm development are the three key factors driving the adoption of digitized solutions in the power grid. The impact of these three factors on critical functions of power system operation and planning practices are reviewed and illustrated with industrial practice case studies. Open challenges and research opportunities for data, computing, and AI algorithms are articulated within the context of the power industry's tremendous decarbonization efforts. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2205.04821 [pdf, other]

Self-supervised regression learning using domain knowledge: Applications to improving self-supervised denoising in imaging

Authors: Il Yong Chun, Dongwon Park, Xuehang Zheng, Se Young Chun, Yong Long

Abstract: Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables lear… ▽ More Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables learning regression neural networks with only input data (but without ground-truth target data), by using a designable pseudo-predictor that encapsulates domain knowledge of a specific application. The paper underlines the importance of using domain knowledge by showing that under different settings, the better pseudo-predictor can lead properties of SSRL closer to those of ordinary supervised learning. Numerical experiments for low-dose computational tomography denoising and camera image denoising demonstrate that proposed SSRL significantly improves the denoising quality over several existing self-supervised denoising methods. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: 17 pages, 16 figures, 2 tables, submitted to IEEE T-IP

arXiv:2204.10636 [pdf]

Ontology-based system to support industrial system design for aircraft assembly

Authors: Xiaodu Hu, Rebeca Arista, Xiaochen Zheng, Joachim Lentes, Jyri Sorvari, Jinzhi Lu, Fernando Ubis, Dimitris Kiritsis

Abstract: The development of an aircraft industrial system is a complex process which faces the challenge of digital discontinuity in multidisciplinary engineering due to various interfaces between different digital tools, leading to extra development time and costs. This paper proposes an ontology-based system, aiming at functionality integration and design process automation, by Models for Manufacturing m… ▽ More The development of an aircraft industrial system is a complex process which faces the challenge of digital discontinuity in multidisciplinary engineering due to various interfaces between different digital tools, leading to extra development time and costs. This paper proposes an ontology-based system, aiming at functionality integration and design process automation, by Models for Manufacturing methodology principles. A tool-agnostic modelling, simulation and validation platform with Discrete Event Simulation and 3D simulation is enabled and demonstrated in a real case study. An ontology layer collecting the domain knowledge enables integration of the proposed system, accelerating the design process and enhancing design quality. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: 6 pages, 9 figures, IFAC IMS 2022

arXiv:2204.01327 [pdf]

Algorithms for Bayesian network modeling and reliability inference of complex multistate systems: Part II-Dependent systems

Authors: Xiaohu Zheng, Wen Yao, Xiaoqian Chen

Abstract: In using the Bayesian network (BN) to construct the complex multistate system's reliability model as described in Part I, the memory storage requirements of the node probability table (NPT) will exceed the random access memory (RAM) of the computer. However, the proposed inference algorithm of Part I is not suitable for the dependent system. This Part II proposes a novel method for BN reliability… ▽ More In using the Bayesian network (BN) to construct the complex multistate system's reliability model as described in Part I, the memory storage requirements of the node probability table (NPT) will exceed the random access memory (RAM) of the computer. However, the proposed inference algorithm of Part I is not suitable for the dependent system. This Part II proposes a novel method for BN reliability modeling and analysis to apply the compression idea to the complex multistate dependent system. In this Part II, the dependent nodes and their parent nodes are equivalent to a block, based on which the multistate joint probability inference algorithm is proposed to calculate the joint probability distribution of a block's all nodes. Then, based on the proposed multistate compression algorithm of Part I, the dependent multistate inference algorithm is proposed for the complex multistate dependent system. The use and accuracy of the proposed algorithms are demonstrated in case 1. Finally, the proposed algorithms are applied to the reliability modeling and analysis of the satellite attitude control system. The results show that both Part I and Part II's proposed algorithms make the reliability modeling and analysis of the complex multistate system feasible. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2203.16164 [pdf, ps, other]

Compressed Channel Estimation for IRS-Assisted Millimeter Wave OFDM Systems: A Low-Rank Tensor Decomposition-Based Approach

Authors: Xi Zheng, Peilan Wang, Jun Fang, Hongbin Li

Abstract: We consider the problem of downlink channel estimation for intelligent reflecting surface (IRS)-assisted millimeter Wave (mmWave) orthogonal frequency division multiplexing (OFDM) systems. By exploring the inherent sparse scattering characteristics of mmWave channels, we show that the received signals can be expressed as a low-rank third-order tensor that admits a tensor rank decomposition, also k… ▽ More We consider the problem of downlink channel estimation for intelligent reflecting surface (IRS)-assisted millimeter Wave (mmWave) orthogonal frequency division multiplexing (OFDM) systems. By exploring the inherent sparse scattering characteristics of mmWave channels, we show that the received signals can be expressed as a low-rank third-order tensor that admits a tensor rank decomposition, also known as canonical polyadic decomposition (CPD). A structured CPD-based method is then developed to estimate the channel parameters. Our analysis reveals that the training overhead required by our proposed method is as low as O(U^2), where U denotes the sparsity of the cascade channel. Simulation results are provided to illustrate the efficiency of the proposed method. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE Wireless Communications Letters

arXiv:2203.15655 [pdf]

Consistency regularization-based Deep Polynomial Chaos Neural Network Method for Reliability Analysis

Authors: Xiaohu Zheng, Wen Yao, Yunyang Zhang, Xiaoya Zhang

Abstract: Polynomial chaos expansion (PCE) is a powerful surrogate model-based reliability analysis method. Generally, a PCE model with a higher expansion order is usually required to obtain an accurate surrogate model for some complex non-linear stochastic systems. However, the high-order PCE increases the number of labeled data required for solving the expansion coefficients. To alleviate this problem, th… ▽ More Polynomial chaos expansion (PCE) is a powerful surrogate model-based reliability analysis method. Generally, a PCE model with a higher expansion order is usually required to obtain an accurate surrogate model for some complex non-linear stochastic systems. However, the high-order PCE increases the number of labeled data required for solving the expansion coefficients. To alleviate this problem, this paper proposes a consistency regularization-based deep polynomial chaos neural network (Deep PCNN) method, including the low-order adaptive PCE model (the auxiliary model) and the high-order polynomial chaos neural network (the main model). The expansion coefficients of the main model are parameterized into the learnable weights of the polynomial chaos neural network, realizing iterative learning of expansion coefficients to obtain more accurate high-order PCE models. The auxiliary model uses a proposed consistency regularization loss function to assist in training the main model. The consistency regularization-based Deep PCNN method can significantly reduce the number of labeled data in constructing a high-order PCE model without losing accuracy by using few labeled data and abundant unlabeled data. A numerical example validates the effectiveness of the consistency regularization-based Deep PCNN method, and then this method is applied to analyze the reliability of two aerospace engineering systems. △ Less

Submitted 4 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.14033 [pdf, other]

Aggressive Quadrotor Flight Using Curiosity-Driven Reinforcement Learning

Authors: Qiyu Sun, Jinbao Fang, Wei Xing Zheng, Yang Tang

Abstract: The ability to perform aggressive movements, which are called aggressive flights, is important for quadrotors during navigation. However, aggressive quadrotor flights are still a great challenge to practical applications. The existing solutions to aggressive flights heavily rely on a predefined trajectory, which is a time-consuming preprocessing step. To avoid such path planning, we propose a curi… ▽ More The ability to perform aggressive movements, which are called aggressive flights, is important for quadrotors during navigation. However, aggressive quadrotor flights are still a great challenge to practical applications. The existing solutions to aggressive flights heavily rely on a predefined trajectory, which is a time-consuming preprocessing step. To avoid such path planning, we propose a curiosity-driven reinforcement learning method for aggressive flight missions and a similarity-based curiosity module is introduced to speed up the training procedure. A branch structure exploration (BSE) strategy is also applied to guarantee the robustness of the policy and to ensure the policy trained in simulations can be performed in real-world experiments directly. The experimental results in simulations demonstrate that our reinforcement learning algorithm performs well in aggressive flight tasks, speeds up the convergence process and improves the robustness of the policy. Besides, our algorithm shows a satisfactory simulated to real transferability and performs well in real-world experiments. △ Less

Submitted 26 March, 2022; originally announced March 2022.

arXiv:2203.03634 [pdf, other]

Remote blood pressure measurement via spatiotemporal mapping of a short-time facial video

Authors: Jialiang Zhuang, Bin Li, Yun Zhang, Yuheng Chen, Xiujuan Zheng

Abstract: Blood pressure (BP) monitoring is vital in daily healthcare, especially for cardiovascular diseases. However, BP values are mainly acquired through the contact sensing method, which is inconvenient and unfriendly to continuous BP measurement. Hence, we propose an efficient end-to-end network to estimate the BP values from a facial video to achieve remote BP measurement in daily life. In this study… ▽ More Blood pressure (BP) monitoring is vital in daily healthcare, especially for cardiovascular diseases. However, BP values are mainly acquired through the contact sensing method, which is inconvenient and unfriendly to continuous BP measurement. Hence, we propose an efficient end-to-end network to estimate the BP values from a facial video to achieve remote BP measurement in daily life. In this study, we first derived a Spatial-temporal map of a short-time (~15s) facial video. According to the Spatial-temporal map, we then regressed the BP ranges by a designed blood pressure classifier and simultaneously calculated the specific value by a blood pressure calculator in each BP range. In addition, we also developed an innovative oversampling training strategy to handle the unbalanced data distribution problem. Finally, we trained the proposed network on a private dataset ASPD and tested it on the popular dataset MMSE-HR. As a result, the proposed network achieved a state-of-the-art MAE of 12.35 mmHg and 9.5 mmHg on systolic and diastolic BP measurements, which is better than the recent works. It concludes that the proposed method has excellent potential for camera-based BP monitoring in real-world scenarios. △ Less

Submitted 23 June, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 7 pages, 7 figures

arXiv:2202.10372 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746872

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

Authors: Eric Guizzo, Christian Marinoni, Marco Pennese, Xinlei Ren, Xiguang Zheng, Chen Zhang, Bruno Masiero, Aurelio Uncini, Danilo Comminiello

Abstract: The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points a… ▽ More The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points and adding constrains that improve the baseline model's efficiency and overcome the major difficulties encountered by the participants of the previous challenge. We updated the baseline model of Task 1, using the architecture that ranked first in the previous challenge edition. We wrote a new supporting API, improving its clarity and ease-of-use. In the end, we present and discuss the results submitted by all participants. L3DAS22 Challenge website: www.l3das.com/icassp2022. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: Accepted to 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022). arXiv admin note: substantial text overlap with arXiv:2104.05499

Journal ref: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 9186-9190

Showing 1–50 of 95 results for author: Zheng, X