Search | arXiv e-print repository

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

Authors: Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Abstract: In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper encoder blocks to derive highly discriminative speaker embeddings.Experimental results demonstrate that using the middle to later blocks of the Whisper encoder keeps… ▽ More In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper encoder blocks to derive highly discriminative speaker embeddings.Experimental results demonstrate that using the middle to later blocks of the Whisper encoder keeps more speaker information. On the VoxCeleb1 and CN-Celeb1 datasets, our system achieves 1.42% and 8.23% equal error rates (EERs) respectively, receiving 0.58% and 1.81% absolute EER reductions over the ECAPA-TDNN baseline, and 0.46% and 0.97% over the ResNet34 baseline. Furthermore, our results indicate that using Whisper models trained on multilingual data can effectively enhance the model's robustness across languages. Finally, the low-rank adaptation approach is evaluated, which reduces the trainable model parameters by approximately 45 times while only slightly increasing EER by 0.2%. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Accepted by Interspeech 2024

arXiv:2408.11562 [pdf, other]

A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification

Authors: Xujiang Xing, Mingxing Xu, Thomas Fang Zheng

Abstract: Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker invariant embedding space. Specifically, the disentanglement module includes two encoders for separating speaker related and irrelevant information, re… ▽ More Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker invariant embedding space. Specifically, the disentanglement module includes two encoders for separating speaker related and irrelevant information, respectively. The reconstruction module serves as a regularization term to constrain the noise. A feature-robust loss is also used to supervise the speaker encoder to learn noise-independent speaker embeddings without losing speaker information. In addition, adversarial training is introduced to discourage the speaker encoder from encoding acoustic condition information for achieving a speaker-invariant embedding space. Experiments on VoxCeleb1 indicate that the proposed method improves the performance of the speaker verification system under both clean and noisy conditions. △ Less

Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: 5 pages, accepted by Interspeech2024

arXiv:2408.09851 [pdf, other]

ISAC-Fi: Enabling Full-fledged Monostatic Sensing over Wi-Fi Communication

Authors: Zhe Chen, Chao Hu, Tianyue Zheng, Hangcheng Cao, Yanbing Yang, Yen Chu, Hongbo Jiang, Jun Luo

Abstract: Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communicati… ▽ More Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communication infrastructure. Specifically, we propose, design, and implement ISAC-Fi as an ISAC-ready Wi-Fi prototype. We first present a novel self-interference cancellation scheme, in order to extract reflected (radio frequency) signals for sensing purpose in the face of transmissions. We then subtly revise existing Wi-Fi framework so as to seamlessly operate monostatic sensing under Wi-Fi communication standard. Finally, we offer two ISAC-Fi designs: while a USRP-based one emulates a totally re-designed ISAC-Fi device, another plug-andplay design allows for backward compatibility by attaching an extra module to an arbitrary Wi-Fi device. We perform extensive experiments to validate the efficacy of ISAC-Fi and also to demonstrate its superiority over existing Wi-Fi sensing proposals. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 14 pages, 22 figures

arXiv:2408.06911 [pdf, other]

Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement

Authors: Tao Zheng, Liejun Wang, Yinfeng Yu

Abstract: Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Considering the aforementioned issues, our study introd… ▽ More Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Considering the aforementioned issues, our study introduces a novel speech enhancement framework, HFSDA, which skillfully integrates heterogeneous spatial features and incorporates a dual-dimension attention mechanism to significantly enhance speech clarity and quality in noisy environments. By leveraging self-supervised learning embeddings in tandem with Short-Time Fourier Transform (STFT) spectrogram features, our model excels at capturing both high-level semantic information and detailed spectral data, enabling a more thorough analysis and refinement of speech signals. Furthermore, we employ the innovative Omni-dimensional Dynamic Convolution (ODConv) technology within the spectrogram input branch, enabling enhanced extraction and integration of crucial information across multiple dimensions. Additionally, we refine the Conformer model by enhancing its feature extraction capabilities not only in the temporal dimension but also across the spectral domain. Extensive experiments on the VCTK-DEMAND dataset show that HFSDA is comparable to existing state-of-the-art models, confirming the validity of our approach. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

arXiv:2408.06185 [pdf, other]

Hi-SAM: A high-scalable authentication model for satellite-ground Zero-Trust system using mean field game

Authors: Xuesong Wu, Tianshuai Zheng, Runfang Wu, Jie Ren, Junyan Guo, Ye Du

Abstract: As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work id… ▽ More As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work idea to authentication, which allows device to obtain the network resource based on frequency. To optimize the frequency, mean field game is used for competition among devices, which can reduce the decision space of large-scale population games. And a dynamic time-range message authentication code is designed for security. From the test at large population scales, Hi-SAM is superior in the optimization of authentication workload and the anomaly detection efficiency. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.03979 [pdf, ps, other]

Speaker Adaptation for Quantised End-to-End ASR Models

Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Abstract: End-to-end models have shown superior performance for automatic speech recognition (ASR). However, such models are often very large in size and thus challenging to deploy on resource-constrained edge devices. While quantisation can reduce model sizes, it can lead to increased word error rates (WERs). Although improved quantisation methods were proposed to address the issue of performance degradati… ▽ More End-to-end models have shown superior performance for automatic speech recognition (ASR). However, such models are often very large in size and thus challenging to deploy on resource-constrained edge devices. While quantisation can reduce model sizes, it can lead to increased word error rates (WERs). Although improved quantisation methods were proposed to address the issue of performance degradation, the fact that quantised models deployed on edge devices often target only on a small group of users is under-explored. To this end, we propose personalisation for quantised models (P4Q), a novel strategy that uses speaker adaptation (SA) to improve quantised end-to-end ASR models by fitting them to the characteristics of the target speakers. In this paper, we study the P4Q strategy based on Whisper and Conformer attention-based encoder-decoder (AED) end-to-end ASR models, which leverages a 4-bit block-wise NormalFloat4 (NF4) approach for quantisation and the low-rank adaptation (LoRA) approach for SA. Experimental results on the LibriSpeech and the TED-LIUM 3 corpora show that, with a 7-time reduction in model size and 1% extra speaker-specific parameters, 15.1% and 23.3% relative WER reductions were achieved on quantised Whisper and Conformer AED models respectively, comparing to the full precision models. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: submitted to ASRU 2023 Workshop

arXiv:2407.06514 [pdf, other]

Asymmetric Mask Scheme for Self-Supervised Real Image Denoising

Authors: Xiangyu Liao, Tianheng Zheng, Jiayu Zhong, Pingping Zhang, Chao Ren

Abstract: In recent years, self-supervised denoising methods have gained significant success and become critically important in the field of image restoration. Among them, the blind spot network based methods are the most typical type and have attracted the attentions of a large number of researchers. Although the introduction of blind spot operations can prevent identity mapping from noise to noise, it imp… ▽ More In recent years, self-supervised denoising methods have gained significant success and become critically important in the field of image restoration. Among them, the blind spot network based methods are the most typical type and have attracted the attentions of a large number of researchers. Although the introduction of blind spot operations can prevent identity mapping from noise to noise, it imposes stringent requirements on the receptive fields in the network design, thereby limiting overall performance. To address this challenge, we propose a single mask scheme for self-supervised denoising training, which eliminates the need for blind spot operation and thereby removes constraints on the network structure design. Furthermore, to achieve denoising across entire image during inference, we propose a multi-mask scheme. Our method, featuring the asymmetric mask scheme in training and inference, achieves state-of-the-art performance on existing real noisy image datasets. All the source code will be made available to the public. △ Less

Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.19706 [pdf, other]

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Abstract: Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters… ▽ More Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters in MoE. Specifically, SAML is applied to the quantised and personalised end-to-end automatic speech recognition models, which combines test-time speaker adaptation to improve the performance of heavily compressed models in speaker-specific scenarios. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size, 29.1% and 31.1% relative word error rate reductions were achieved on the quantised Whisper model and Conformer-based attention-based encoder-decoder ASR model respectively, comparing to the original full precision models. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 5 pages, accepted by Interspeech 2024. arXiv admin note: substantial text overlap with arXiv:2309.09136

arXiv:2406.17286 [pdf]

Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it with the prioritized experience replay method to solve the problem that traditional path planning algorithms often fall into dead zones. A series of simulation experiment results prove that the path planning algorithm based on DDQN is significantly better than other methods in terms of speed and accuracy, especially the ability to break through dead zones in extreme environments. Research shows that the path planning algorithm based on DDQN performs well in terms of path quality and safety. These research results provide an important reference for the research on automatic navigation of autonomous vehicles. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

arXiv:2406.07989 [pdf, other]

Near-Field Wideband Beam Training Based on Distance-Dependent Beam Split

Authors: Tianyue Zheng, Mingyao Cui, Zidong Wu, Linglong Dai

Abstract: Near-field beam training is essential for acquiring channel state information in 6G extremely large-scale multiple input multiple output (XL-MIMO) systems. To achieve low-overhead beam training, existing method has been proposed to leverage the near-field beam split effect, which deploys true-time-delay arrays to simultaneously search multiple angles of the entire angular range in a distance ring… ▽ More Near-field beam training is essential for acquiring channel state information in 6G extremely large-scale multiple input multiple output (XL-MIMO) systems. To achieve low-overhead beam training, existing method has been proposed to leverage the near-field beam split effect, which deploys true-time-delay arrays to simultaneously search multiple angles of the entire angular range in a distance ring with a single pilot. However, the method still requires exhaustive search in the distance domain, which limits its efficiency. To address the problem, we propose a distance-dependent beam-split-based beam training method to further reduce the training overheads. Specifically, we first reveal the new phenomenon of distance-dependent beam split, where by manipulating the configurations of time-delay and phase-shift, beams at different frequencies can simultaneously scan the angular domain in multiple distance rings. Leveraging the phenomenon, we propose a near-field beam training method where both different angles and distances can simultaneously be searched in one time slot. Thus, a few pilots are capable of covering the whole angle-distance space for wideband XL-MIMO. Theoretical analysis and numerical simulations are also displayed to verify the superiority of the proposed method on beamforming gain and training overhead. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2404.06393 [pdf, other]

MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2402.16153 [pdf, other]

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

arXiv:2401.05850 [pdf, other]

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

Authors: Yadong Guan, Jiqing Han, Hongwei Song, Wenjie Song, Guibin Zheng, Tieran Zheng, Yongjun He

Abstract: Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning fram… ▽ More Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: accepted by icassp2024

arXiv:2401.01673 [pdf, other]

Coded Beam Training

Authors: Tianyue Zheng, Jieao Zhu, Qiumo Yu, Yongli Yan, Linglong Dai

Abstract: In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users wi… ▽ More In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users with low signal-to-noise ratio (SNR). To tackle this challenge, leveraging the error-correcting capability of channel codes, we introduce channel coding theory into hierarchical beam training to extend the coverage area. Specifically, we establish the duality between hierarchical beam training and channel coding, and the proposed coded beam training scheme serves as a general framework. Then, we present two specific implementations exemplified by coded beam training methods based on Hamming codes and convolutional codes, during which the beam encoding and decoding processes are refined respectively to better accommodate the beam training problem. Simulation results have demonstrated that the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR while keeping training overhead low. △ Less

Submitted 6 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: In this paper, we introduce channel coding theory into hierarchical beam training and propose a beam training scheme called coded beam training. By leveraging the error-correcting capability of channel codes, the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR, while keeping training overhead low

arXiv:2310.13090 [pdf]

Closed-Loop Motion Planning for Differentially Flat Systems: A Time-Varying Optimization Framework

Authors: Tianqi Zheng, John W. Simpson-Porco, Enrique Mallada

Abstract: Motion planning and control are two core components of the robotic systems autonomy stack. The standard approach to combine these methodologies comprises an offline/open-loop stage, planning, that designs a feasible and safe trajectory to follow, and an online/closed-loop stage, tracking, that corrects for unmodeled dynamics and disturbances. Such an approach generally introduces conservativeness… ▽ More Motion planning and control are two core components of the robotic systems autonomy stack. The standard approach to combine these methodologies comprises an offline/open-loop stage, planning, that designs a feasible and safe trajectory to follow, and an online/closed-loop stage, tracking, that corrects for unmodeled dynamics and disturbances. Such an approach generally introduces conservativeness into the planning stage, which becomes difficult to overcome as the model complexity increases and real-time decisions need to be made in a changing environment. This work addresses these challenges for the class of differentially flat nonlinear systems by integrating planning and control into a cohesive closed-loop task. Precisely, we develop an optimization-based framework that aims to steer a differentially flat system to a trajectory implicitly defined via a constrained time-varying optimization problem. To that end, we generalize the notion of feedback linearization, which makes non-linear systems behave as linear systems, and develop controllers that effectively transform a differentially flat system into an optimization algorithm that seeks to find the optimal solution of a (possibly time-varying) optimization problem. Under sufficient regularity assumptions, we prove global asymptotic convergence for the optimization dynamics to the minimizer of the time-varying optimization problem. We illustrate the effectiveness of our method with two numerical examples: a multi-robot tracking problem and a robot obstacle avoidance problem. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.06336 [pdf, other]

HoloFed: Environment-Adaptive Positioning via Multi-band Reconfigurable Holographic Surfaces and Federated Learning

Authors: Jingzhi Hu, Zhe Chen, Tianyue Zheng, Robert Schober, Jun Luo

Abstract: Positioning is an essential service for various applications and is expected to be integrated with existing communication infrastructures in 5G and 6G. Though current Wi-Fi and cellular base stations (BSs) can be used to support this integration, the resulting precision is unsatisfactory due to the lack of precise control of the wireless signals. Recently, BSs adopting reconfigurable holographic s… ▽ More Positioning is an essential service for various applications and is expected to be integrated with existing communication infrastructures in 5G and 6G. Though current Wi-Fi and cellular base stations (BSs) can be used to support this integration, the resulting precision is unsatisfactory due to the lack of precise control of the wireless signals. Recently, BSs adopting reconfigurable holographic surfaces (RHSs) have been advocated for positioning as RHSs' large number of antenna elements enable generation of arbitrary and highly-focused signal beam patterns. However, existing designs face two major challenges: i) RHSs only have limited operating bandwidth, and ii) the positioning methods cannot adapt to the diverse environments encountered in practice. To overcome these challenges, we present HoloFed, a system providing high-precision environment-adaptive user positioning services by exploiting multi-band(MB)-RHS and federated learning (FL). For improving the positioning performance, a lower bound on the error variance is obtained and utilized for guiding MB-RHS's digital and analog beamforming design. For better adaptability while preserving privacy, an FL framework is proposed for users to collaboratively train a position estimator, where we exploit the transfer learning technique to handle the lack of position labels of the users. Moreover, a scheduling algorithm for the BS to select which users train the position estimator is designed, jointly considering the convergence and efficiency of FL. Our simulation results confirm that HoloFed achieves a 57% lower positioning error variance compared to a beam-scanning baseline and can effectively adapt to diverse environments. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.09136 [pdf, other]

Enhancing Quantised End-to-End ASR Models via Personalisation

Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Abstract: Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker ad… ▽ More Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker adaptive training (SAT) with model quantisation to improve the performance of heavily compressed models. Specifically, PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size and 1% additional speaker-specific parameters, 15.1% and 23.3% relative WER reductions were achieved on quantised Whisper and Conformer-based attention-based encoder-decoder ASR models respectively, comparing to the original full precision models. △ Less

Submitted 16 September, 2023; originally announced September 2023.

Comments: 5 pages, submitted to ICASSP 2024

arXiv:2306.17434 [pdf]

A Motion Assessment Method for Reference Stack Selection in Fetal Brain MRI Reconstruction Based on Tensor Rank Approximation

Authors: Haoan Xu, Wen Shi, Jiwei Sun, Tianshu Zheng, Cong Sun, Sun Yi, Guangbin Wang, Dan Wu

Abstract: Purpose: Slice-to-volume registration and super-resolution reconstruction (SVR-SRR) is commonly used to generate 3D volumes of the fetal brain from 2D stacks of slices acquired in multiple orientations. A critical initial step in this pipeline is to select one stack with the minimum motion as a reference for registration. An accurate and unbiased motion assessment (MA) is thus crucial for successf… ▽ More Purpose: Slice-to-volume registration and super-resolution reconstruction (SVR-SRR) is commonly used to generate 3D volumes of the fetal brain from 2D stacks of slices acquired in multiple orientations. A critical initial step in this pipeline is to select one stack with the minimum motion as a reference for registration. An accurate and unbiased motion assessment (MA) is thus crucial for successful selection. Methods: We presented a MA method that determines the minimum motion stack based on 3D low-rank approximation using CANDECOMP/PARAFAC (CP) decomposition. Compared to the current 2D singular value decomposition (SVD) based method that requires flattening stacks into matrices to obtain ranks, in which the spatial information is lost, the CP-based method can factorize 3D stack into low-rank and sparse components in a computationally efficient manner. The difference between the original stack and its low-rank approximation was proposed as the motion indicator. Results: Compared to SVD-based methods, our proposed CP-based MA demonstrated higher sensitivity in detecting small motion with a lower baseline bias. Experiments on randomly simulated motion illustrated that the proposed CP method achieved a higher success rate of 95.45% in identifying the minimum motion stack, compared to SVD-based method with a success rate of 58.18%. We further demonstrated that combining CP-based MA with existing SRR-SVR pipeline significantly improved 3D volume reconstruction. Conclusion: The proposed CP-based MA method showed superior performance compared to SVD-based methods with higher sensitivity to motion, success rate, and lower baseline bias, and can be used as a prior step to improve fetal brain reconstruction. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 6 figures. Correspondence to: Dan Wu, Ph.D. E-mail: [email protected]

arXiv:2306.07105 [pdf, ps, other]

STAR-RIS Assisted Covert Communications in NOMA Systems

Authors: Han Xiao, Xiaoyan Hu, Tong-Xing Zheng, Kai-Kit Wong

Abstract: Covert communications assisted by simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) in non-orthogonal multiple access (NOMA) systems have been explored in this paper. In particular, the access point (AP) transmitter adopts NOMA to serve a downlink covert user and a public user. The minimum detection error probability (DEP) at the warden is derived considering… ▽ More Covert communications assisted by simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) in non-orthogonal multiple access (NOMA) systems have been explored in this paper. In particular, the access point (AP) transmitter adopts NOMA to serve a downlink covert user and a public user. The minimum detection error probability (DEP) at the warden is derived considering the uncertainty of its background noise, which is used as a covertness constraint. We aim at maximizing the covert rate of the system by jointly optimizing APs transmit power and passive beamforming of STAR-RIS, under the covertness and quality of service (QoS) constraints. An iterative algorithm is proposed to effectively solve the non-convex optimization problem. Simulation results show that the proposed scheme significantly outperforms the conventional RIS-based scheme in ensuring system covert performance. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2305.04930, arXiv:2305.03991

arXiv:2305.03991 [pdf, ps, other]

STAR-RIS Aided Covert Communication

Authors: Han Xiao, Xiaoyan Hu, Pengcheng Mu, Wenjie Wang, Tong-Xing Zheng, Kai-Kit Wong, Kun Yang

Abstract: This paper investigates the multi-antenna covert communications assisted by a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). In particular, to shelter the existence of communications between transmitter and receiver from a warden, a friendly full-duplex receiver with two antennas is leveraged to make contributions to confuse the warden. Considering the wo… ▽ More This paper investigates the multi-antenna covert communications assisted by a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). In particular, to shelter the existence of communications between transmitter and receiver from a warden, a friendly full-duplex receiver with two antennas is leveraged to make contributions to confuse the warden. Considering the worst case, the closed-form expression of the minimum detection error probability (DEP) at the warden is derived and utilized as a covert constraint. Then, we formulate an optimization problem maximizing the covert rate of the system under the covertness constraint and quality of service (QoS) constraint with communication outage analysis. To jointly design the active and passive beamforming of the transmitter and STAR-RIS, an iterative algorithm based on globally convergent version of method of moving asymptotes (GCMMA) is proposed to effectively solve the non-convex optimization problem. Simulation results show that the proposed STAR-RIS-assisted scheme highly outperforms the case with conventional RIS. △ Less

Submitted 30 August, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2305.03328 [pdf, other]

Time-weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection

Authors: Jian Guan, Youde Liu, Qiaoxi Zhu, Tieran Zheng, Jiqing Han, Wenwu Wang

Abstract: Although deep learning is the mainstream method in unsupervised anomalous sound detection, Gaussian Mixture Model (GMM) with statistical audio frequency representation as input can achieve comparable results with much lower model complexity and fewer parameters. Existing statistical frequency representations, e.g, the log-Mel spectrogram's average or maximum over time, do not always work well for… ▽ More Although deep learning is the mainstream method in unsupervised anomalous sound detection, Gaussian Mixture Model (GMM) with statistical audio frequency representation as input can achieve comparable results with much lower model complexity and fewer parameters. Existing statistical frequency representations, e.g, the log-Mel spectrogram's average or maximum over time, do not always work well for different machines. This paper presents Time-Weighted Frequency Domain Representation (TWFR) with the GMM method (TWFR-GMM) for anomalous sound detection. The TWFR is a generalized statistical frequency domain representation that can adapt to different machine types, using the global weighted ranking pooling over time-domain. This allows GMM estimator to recognize anomalies, even under domain-shift conditions, as visualized with a Mahalanobis distance-based metric. Experiments on DCASE 2022 Challenge Task2 dataset show that our method has better detection performance than recent deep learning methods. TWFR-GMM is the core of our submission that achieved the 3rd place in DCASE 2022 Challenge Task2. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: To appear at ICASSP 2023

arXiv:2304.07990 [pdf, other]

Novel Quality Measure and Efficient Resolution of Convex Hull Pricing for Unit Commitment

Authors: Mikhail A. Bragin, Farhan Hyder, Bing Yan, Peter B. Luh, Jinye Zhao, Feng Zhao, Dane A. Schiro, Tongxin Zheng

Abstract: Electricity prices determined by economic dispatch that do not consider fixed costs may lead to significant uplift payments. However, when fixed costs are included, prices become non-monotonic with respect to demand, which can adversely impact market transparency. To overcome this issue, convex hull (CH) pricing has been introduced for unit commitment with fixed costs. Several CH pricing methods h… ▽ More Electricity prices determined by economic dispatch that do not consider fixed costs may lead to significant uplift payments. However, when fixed costs are included, prices become non-monotonic with respect to demand, which can adversely impact market transparency. To overcome this issue, convex hull (CH) pricing has been introduced for unit commitment with fixed costs. Several CH pricing methods have been presented, and a feasible cost has been used as a quality measure for the CH price. However, obtaining a feasible cost requires a computationally intensive optimization procedure, and the associated duality gap may not provide an accurate quality measure. This paper presents a new approach for quantifying the quality of the CH price by establishing an upper bound on the optimal dual value. The proposed approach uses Surrogate Lagrangian Relaxation (SLR) to efficiently obtain near-optimal CH prices, while the upper bound decreases rapidly due to the convergence of SLR. Testing results on the IEEE 118-bus system demonstrate that the novel quality measure is more accurate than the measure provided by a feasible cost, indicating the high quality of the upper bound and the efficiency of SLR. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.06130 [pdf, other]

Full State Estimation of Continuum Robots from Tip Velocities: A Cosserat-Theoretic Boundary Observer

Authors: Tongjia Zheng, Qing Han, Hai Lin

Abstract: State estimation of robotic systems is essential to implementing feedback controllers, which usually provide better robustness to modeling uncertainties than open-loop controllers. However, state estimation of soft robots is very challenging because soft robots have theoretically infinite degrees of freedom while existing sensors only provide a limited number of discrete measurements. This work fo… ▽ More State estimation of robotic systems is essential to implementing feedback controllers, which usually provide better robustness to modeling uncertainties than open-loop controllers. However, state estimation of soft robots is very challenging because soft robots have theoretically infinite degrees of freedom while existing sensors only provide a limited number of discrete measurements. This work focuses on soft robotic manipulators, also known as continuum robots. We design an observer algorithm based on the well-known Cosserat rod theory, which models continuum robots by nonlinear partial differential equations (PDEs) evolving in geometric Lie groups. The observer can estimate all infinite-dimensional continuum robot states, including poses, strains, and velocities, by only sensing the tip velocity of the continuum robot, and hence it is called a ``boundary'' observer. More importantly, the estimation error dynamics is formally proven to be locally input-to-state stable. The key idea is to inject sequential tip velocity measurements into the observer in a way that dissipates the energy of the estimation errors through the boundary. The distinct advantage of this PDE-based design is that it can be implemented using any existing numerical implementation for Cosserat rod models. All theoretical convergence guarantees will be preserved, regardless of the discretization method. We call this property ``one design for any discretization''. Extensive numerical studies are included and suggest that the domain of attraction is large and the observer is robust to uncertainties of tip velocity measurements and model parameters. △ Less

Submitted 31 July, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.14752 [pdf, other]

Multi-Robot-Guided Crowd Evacuation: Two-Scale Modeling and Control

Authors: Tongjia Zheng, Zhenyuan Yuan, Mollik Nayyar, Alan R. Wagner, Minghui Zhu, Hai Lin

Abstract: Emergency evacuation describes a complex situation involving time-critical decision-making by evacuees. Mobile robots are being actively explored as a potential solution to provide timely guidance. In this work, we study a robot-guided crowd evacuation problem where a small group of robots is used to guide a large human crowd to safe locations. The challenge lies in how to use micro-level human-ro… ▽ More Emergency evacuation describes a complex situation involving time-critical decision-making by evacuees. Mobile robots are being actively explored as a potential solution to provide timely guidance. In this work, we study a robot-guided crowd evacuation problem where a small group of robots is used to guide a large human crowd to safe locations. The challenge lies in how to use micro-level human-robot interactions to indirectly influence a population that significantly outnumbers the robots to achieve the collective evacuation objective. To address the challenge, we follow a two-scale modeling strategy and explore hydrodynamic models, which consist of a family of microscopic social force models that describe how human movements are locally affected by other humans, the environment, and robots, and associated macroscopic equations for the temporal and spatial evolution of the crowd density and flow velocity. We design controllers for the robots such that they not only automatically explore the environment (with unknown dynamic obstacles) to cover it as much as possible, but also dynamically adjust the directions of their local navigation force fields based on the real-time macrostates of the crowd to guide the crowd to a safe location. We prove the stability of the proposed evacuation algorithm and conduct extensive simulations to investigate the performance of the algorithm with different combinations of human numbers, robot numbers, and obstacle settings. △ Less

Submitted 11 January, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

arXiv:2212.01505 [pdf, other]

Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Authors: Tianqi Zheng, Pengcheng You, Enrique Mallada

Abstract: In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochas… ▽ More In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm's history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this work, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories converge to the optimal policy almost surely. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2210.05066 [pdf, ps, other]

A Linearly Convergent Algorithm for Rotationally Invariant $\ell_1$-Norm Principal Component Analysis

Authors: Taoli Zheng, Peng Wang, Anthony Man-Cho So

Abstract: To do dimensionality reduction on the datasets with outliers, the $\ell_1$-norm principal component analysis (L1-PCA) as a typical robust alternative of the conventional PCA has enjoyed great popularity over the past years. In this work, we consider a rotationally invariant L1-PCA, which is hardly studied in the literature. To tackle it, we propose a proximal alternating linearized minimization me… ▽ More To do dimensionality reduction on the datasets with outliers, the $\ell_1$-norm principal component analysis (L1-PCA) as a typical robust alternative of the conventional PCA has enjoyed great popularity over the past years. In this work, we consider a rotationally invariant L1-PCA, which is hardly studied in the literature. To tackle it, we propose a proximal alternating linearized minimization method with a nonlinear extrapolation for solving its two-block reformulation. Moreover, we show that the proposed method converges at least linearly to a limiting critical point of the reformulated problem. Such a point is proved to be a critical point of the original problem under a condition imposed on the step size. Finally, we conduct numerical experiments on both synthetic and real datasets to support our theoretical developments and demonstrate the efficacy of our approach. △ Less

Submitted 26 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: 11 pages, 3 figures

arXiv:2210.00976 [pdf, other]

Task Space Tracking of Soft Manipulators: Inner-Outer Loop Control Based on Cosserat-Rod Models

Authors: Tongjia Zheng, Qing Han, Hai Lin

Abstract: Soft robots are robotic systems made of deformable materials and exhibit unique flexibility that can be exploited for complex environments and tasks. However, their control problem has been considered a challenging subject because they are of infinite degrees of freedom and highly under-actuated. Existing studies have mainly relied on simplified and approximated finite-dimensional models. In this… ▽ More Soft robots are robotic systems made of deformable materials and exhibit unique flexibility that can be exploited for complex environments and tasks. However, their control problem has been considered a challenging subject because they are of infinite degrees of freedom and highly under-actuated. Existing studies have mainly relied on simplified and approximated finite-dimensional models. In this work, we exploit infinite-dimensional nonlinear control for soft robots. We adopt the Cosserat-rod theory and employ nonlinear partial differential equations (PDEs) to model the kinematics and dynamics of soft manipulators, including their translational motions (for shear and elongation) and rotational motions (for bending and torsion). The objective is to achieve position tracking of the whole manipulator in a planar task space by controlling the moments (generated by actuators). The control design is inspired by the energy decay property of damped wave equations and has an inner-outer loop structure. In the outer loop, we design desired rotational motions that rotate the translational component into a direction that asymptotically dissipates the energy associated with position tracking errors. In the inner loop, we design inputs for the rotational components to track their desired motions, again by dissipating the rotational energy. We prove that the closed-loop system is exponentially stable and evaluate its performance through simulations. △ Less

Submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.09795 [pdf, other]

Multi-Robot-Assisted Human Crowd Evacuation using Navigation Velocity Fields

Authors: Tongjia Zheng, Zhenyuan Yuan, Mollik Nayyar, Alan R. Wagner, Minghui Zhu, Hai Lin

Abstract: This work studies a robot-assisted crowd evacuation problem where we control a small group of robots to guide a large human crowd to safe locations. The challenge lies in how to model human-robot interactions and design robot controls to indirectly control a human population that significantly outnumbers the robots. To address the challenge, we treat the crowd as a continuum and formulate the evac… ▽ More This work studies a robot-assisted crowd evacuation problem where we control a small group of robots to guide a large human crowd to safe locations. The challenge lies in how to model human-robot interactions and design robot controls to indirectly control a human population that significantly outnumbers the robots. To address the challenge, we treat the crowd as a continuum and formulate the evacuation objective as driving the crowd density to target locations. We propose a novel mean-field model which consists of a family of microscopic equations that explicitly model how human motions are locally guided by the robots and an associated macroscopic equation that describes how the crowd density is controlled by the navigation velocity fields generated by all robots. Then, we design density feedback controllers for the robots to dynamically adjust their states such that the generated navigation velocity fields drive the crowd density to a target density. Stability guarantees of the proposed controllers are proven. Agent-based simulations are included to evaluate the proposed evacuation algorithms. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2207.05896 [pdf, other]

Safe Human-Robot Collaborative Transportation via Trust-Driven Role Adaptation

Authors: Tony Zheng, Monimoy Bujarbaruah, Yvonne R. Stürz, Francesco Borrelli

Abstract: We study a human-robot collaborative transportation task in presence of obstacles. The task for each agent is to carry a rigid object to a common target position, while safely avoiding obstacles and satisfying the compliance and actuation constraints of the other agent. Human and robot do not share the local view of the environment. The human policy either assists the robot when they deem the robo… ▽ More We study a human-robot collaborative transportation task in presence of obstacles. The task for each agent is to carry a rigid object to a common target position, while safely avoiding obstacles and satisfying the compliance and actuation constraints of the other agent. Human and robot do not share the local view of the environment. The human policy either assists the robot when they deem the robot actions safe based on their perception of the environment, or actively leads the task. Using estimated human inputs, the robot plans a trajectory for the transported object by solving a constrained finite time optimal control problem. Sensors on the robot measure the inputs applied by the human. The robot then appropriately applies a weighted combination of the human's applied and its own planned inputs, where the weights are chosen based on the robot's trust value on its estimates of the human's inputs. This allows for a dynamic leader-follower role adaptation of the robot throughout the task. Furthermore, under a low value of trust, if the robot approaches any obstacle potentially unknown to the human, it triggers a safe stopping policy, maintaining safety of the system and signaling a required change in the human's intent. With experimental results, we demonstrate the efficacy of the proposed approach. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2205.09048 [pdf, other]

Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners

Authors: Hao Quan, Xingyu Li, Weixing Chen, Qun Bai, Mingchen Zou, Ruijie Yang, Tingting Zheng, Ruiqun Qi, Xinghua Gao, Xiaoyu Cui

Abstract: Based on digital pathology slice scanning technology, artificial intelligence algorithms represented by deep learning have achieved remarkable results in the field of computational pathology. Compared to other medical images, pathology images are more difficult to annotate, and thus, there is an extreme lack of available datasets for conducting supervised learning to train robust deep learning mod… ▽ More Based on digital pathology slice scanning technology, artificial intelligence algorithms represented by deep learning have achieved remarkable results in the field of computational pathology. Compared to other medical images, pathology images are more difficult to annotate, and thus, there is an extreme lack of available datasets for conducting supervised learning to train robust deep learning models. In this paper, we propose a self-supervised learning (SSL) model, the global contrast-masked autoencoder (GCMAE), which can train the encoder to have the ability to represent local-global features of pathological images, also significantly improve the performance of transfer learning across data sets. In this study, the ability of the GCMAE to learn migratable representations was demonstrated through extensive experiments using a total of three different disease-specific hematoxylin and eosin (HE)-stained pathology datasets: Camelyon16, NCTCRC and BreakHis. In addition, this study designed an effective automated pathology diagnosis process based on the GCMAE for clinical applications. The source code of this paper is publicly available at https://github.com/StarUniversus/gcmae. △ Less

Submitted 15 November, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

arXiv:2205.06450 [pdf, other]

A microstructure estimation Transformer inspired by sparse representation for diffusion MRI

Authors: Tianshu Zheng, Cong Sun, Weihao Zheng, Wen Shi, Haotian Li, Yi Sun, Yi Zhang, Guangbin Wang, Chuyang Ye, Dan Wu

Abstract: Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by th… ▽ More Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by the superior performance of the Transformer, in this work, we present a learning-based framework based on Transformer, namely, a Microstructure Estimation Transformer with Sparse Coding (METSC) for dMRI-based microstructure estimation with downsampled q-space data. To take advantage of the Transformer while addressing its limitation in large training data requirements, we explicitly introduce an inductive bias - model bias into the Transformer using a sparse coding technique to facilitate the training process. Thus, the METSC is composed with three stages, an embedding stage, a sparse representation stage, and a mapping stage. The embedding stage is a Transformer-based structure that encodes the signal to ensure the voxel is represented effectively. In the sparse representation stage, a dictionary is constructed by solving a sparse reconstruction problem that unfolds the Iterative Hard Thresholding (IHT) process. The mapping stage is essentially a decoder that computes the microstructural parameters from the output of the second stage, based on the weighted sum of normalized dictionary coefficients where the weights are also learned. We tested our framework on two dMRI models with downsampled q-space data, including the intravoxel incoherent motion (IVIM) model and the neurite orientation dispersion and density imaging (NODDI) model. The proposed method achieved up to 11.25 folds of acceleration in scan time and outperformed the other state-of-the-art learning-based methods. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2205.05851 [pdf, other]

doi 10.1109/TMI.2022.3208277

AFFIRM: Affinity Fusion-based Framework for Iteratively Random Motion correction of multi-slice fetal brain MRI

Authors: Wen Shi, Haoan Xu, Cong Sun, Jiwei Sun, Yamin Li, Xinyi Xu, Tianshu Zheng, Yi Zhang, Guangbin Wang, Dan Wu

Abstract: Multi-slice magnetic resonance images of the fetal brain are usually contaminated by severe and arbitrary fetal and maternal motion. Hence, stable and robust motion correction is necessary to reconstruct high-resolution 3D fetal brain volume for clinical diagnosis and quantitative analysis. However, the conventional registration-based correction has a limited capture range and is insufficient for… ▽ More Multi-slice magnetic resonance images of the fetal brain are usually contaminated by severe and arbitrary fetal and maternal motion. Hence, stable and robust motion correction is necessary to reconstruct high-resolution 3D fetal brain volume for clinical diagnosis and quantitative analysis. However, the conventional registration-based correction has a limited capture range and is insufficient for detecting relatively large motions. Here, we present a novel Affinity Fusion-based Framework for Iteratively Random Motion (AFFIRM) correction of the multi-slice fetal brain MRI. It learns the sequential motion from multiple stacks of slices and integrates the features between 2D slices and reconstructed 3D volume using affinity fusion, which resembles the iterations between slice-to-volume registration and volumetric reconstruction in the regular pipeline. The method accurately estimates the motion regardless of brain orientations and outperforms other state-of-the-art learning-based methods on the simulated motion-corrupted data, with a 48.4% reduction of mean absolute error for rotation and 61.3% for displacement. We then incorporated AFFIRM into the multi-resolution slice-to-volume registration and tested it on the real-world fetal MRI scans at different gestation stages. The results indicated that adding AFFIRM to the conventional pipeline improved the success rate of fetal brain super-resolution reconstruction from 77.2% to 91.9%. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2205.02850 [pdf]

A Deep Reinforcement Learning Framework for Rapid Diagnosis of Whole Slide Pathological Images

Authors: Tingting Zheng, Weixing chen, Shuqin Li, Hao Quan, Qun Bai, Tianhang Nan, Song Zheng, Xinghua Gao, Yue Zhao, Xiaoyu Cui

Abstract: The deep neural network is a research hotspot for histopathological image analysis, which can improve the efficiency and accuracy of diagnosis for pathologists or be used for disease screening. The whole slide pathological image can reach one gigapixel and contains abundant tissue feature information, which needs to be divided into a lot of patches in the training and inference stages. This will l… ▽ More The deep neural network is a research hotspot for histopathological image analysis, which can improve the efficiency and accuracy of diagnosis for pathologists or be used for disease screening. The whole slide pathological image can reach one gigapixel and contains abundant tissue feature information, which needs to be divided into a lot of patches in the training and inference stages. This will lead to a long convergence time and large memory consumption. Furthermore, well-annotated data sets are also in short supply in the field of digital pathology. Inspired by the pathologist's clinical diagnosis process, we propose a weakly supervised deep reinforcement learning framework, which can greatly reduce the time required for network inference. We use neural network to construct the search model and decision model of reinforcement learning agent respectively. The search model predicts the next action through the image features of different magnifications in the current field of view, and the decision model is used to return the predicted probability of the current field of view image. In addition, an expert-guided model is constructed by multi-instance learning, which not only provides rewards for search model, but also guides decision model learning by the knowledge distillation method. Experimental results show that our proposed method can achieve fast inference and accurate prediction of whole slide images without any pixel-level annotations. △ Less

Submitted 5 May, 2022; originally announced May 2022.

arXiv:2204.06257 [pdf, ps, other]

Physical layer security in large-scale random multiple access wireless sensor networks: a stochastic geometry approach

Authors: Tong-Xing Zheng, Xin Chen, Chao Wang, Kai-Kit Wong, Jinhong Yuan

Abstract: This paper investigates physical layer security for a large-scale WSN with random multiple access, where each fusion center in the network randomly schedules a number of sensors to upload their sensed data subject to the overhearing of randomly distributed eavesdroppers. We propose an uncoordinated random jamming scheme in which those unscheduled sensors send jamming signals with a certain probabi… ▽ More This paper investigates physical layer security for a large-scale WSN with random multiple access, where each fusion center in the network randomly schedules a number of sensors to upload their sensed data subject to the overhearing of randomly distributed eavesdroppers. We propose an uncoordinated random jamming scheme in which those unscheduled sensors send jamming signals with a certain probability to defeat the eavesdroppers. With the aid of stochastic geometry theory and order statistics, we derive analytical expressions for the connection outage probability and secrecy outage probability to characterize transmission reliability and secrecy, respectively. Based on the obtained analytical results, we formulate an optimization problem for maximizing the sum secrecy throughput subject to both reliability and secrecy constraints, considering a joint design of the wiretap code rates for each scheduled sensor and the jamming probability for the unscheduled sensors. We provide both optimal and low-complexity sub-optimal algorithms to tackle the above problem, and further reveal various properties on the optimal parameters which are useful to guide practical designs. In particular, we demonstrate that the proposed random jamming scheme is beneficial for improving the sum secrecy throughput, and the optimal jamming probability is the result of trade-off between secrecy and throughput. We also show that the throughput performance of the sub-optimal scheme approaches that of the optimal one when facing a stringent reliability constraint or a loose secrecy constraint. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: accepted by the IEEE Transactions on Communications

arXiv:2203.13724 [pdf, other]

PDE-based Dynamic Control and Estimation of Soft Robotic Arms

Authors: Tongjia Zheng, Hai Lin

Abstract: Compared with traditional rigid-body robots, soft robots not only exhibit unprecedented adaptation and flexibility but also present novel challenges in their modeling and control because of their infinite degrees of freedom. Most of the existing approaches have mainly relied on approximated models so that the well-developed finite-dimensional control theory can be exploited. However, this may brin… ▽ More Compared with traditional rigid-body robots, soft robots not only exhibit unprecedented adaptation and flexibility but also present novel challenges in their modeling and control because of their infinite degrees of freedom. Most of the existing approaches have mainly relied on approximated models so that the well-developed finite-dimensional control theory can be exploited. However, this may bring in modeling uncertainty and performance degradation. Hence, we propose to exploit infinite-dimensional analysis for soft robotic systems. Our control design is based on the increasingly adopted Cosserat rod model, which describes the kinematics and dynamics of soft robotic arms using nonlinear partial differential equations (PDE). We design infinite-dimensional state feedback control laws for the Cosserat PDE model to achieve trajectory tracking (consisting of position, rotation, linear and angular velocities) and prove their uniform tracking convergence. We also design an infinite-dimensional extended Kalman filter on Lie groups for the PDE system to estimate all the state variables (including position, rotation, strains, curvature, linear and angular velocities) using only position measurements. The proposed algorithms are evaluated using simulations. △ Less

Submitted 20 September, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2201.03053 [pdf, other]

doi 10.1007/978-3-030-90874-4_9

COVID-19 Infection Segmentation from Chest CT Images Based on Scale Uncertainty

Authors: Masahiro Oda, Tong Zheng, Yuichiro Hayashi, Yoshito Otake, Masahiro Hashimoto, Toshiaki Akashi, Shigeki Aoki, Kensaku Mori

Abstract: This paper proposes a segmentation method of infection regions in the lung from CT volumes of COVID-19 patients. COVID-19 spread worldwide, causing many infected patients and deaths. CT image-based diagnosis of COVID-19 can provide quick and accurate diagnosis results. An automated segmentation method of infection regions in the lung provides a quantitative criterion for diagnosis. Previous method… ▽ More This paper proposes a segmentation method of infection regions in the lung from CT volumes of COVID-19 patients. COVID-19 spread worldwide, causing many infected patients and deaths. CT image-based diagnosis of COVID-19 can provide quick and accurate diagnosis results. An automated segmentation method of infection regions in the lung provides a quantitative criterion for diagnosis. Previous methods employ whole 2D image or 3D volume-based processes. Infection regions have a considerable variation in their sizes. Such processes easily miss small infection regions. Patch-based process is effective for segmenting small targets. However, selecting the appropriate patch size is difficult in infection region segmentation. We utilize the scale uncertainty among various receptive field sizes of a segmentation FCN to obtain infection regions. The receptive field sizes can be defined as the patch size and the resolution of volumes where patches are clipped from. This paper proposes an infection segmentation network (ISNet) that performs patch-based segmentation and a scale uncertainty-aware prediction aggregation method that refines the segmentation result. We design ISNet to segment infection regions that have various intensity values. ISNet has multiple encoding paths to process patch volumes normalized by multiple intensity ranges. We collect prediction results generated by ISNets having various receptive field sizes. Scale uncertainty among the prediction results is extracted by the prediction aggregation method. We use an aggregation FCN to generate a refined segmentation result considering scale uncertainty among the predictions. In our experiments using 199 chest CT volumes of COVID-19 cases, the prediction aggregation method improved the dice similarity score from 47.6% to 62.1%. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: Accepted paper as a oral presentation at CILP2021, 10th MICCAI CLIP Workshop

Journal ref: DCL 2021, PPML 2021, LL-COVID19 2021, CLIP 2021, Lecture Notes in Computer Science (LNCS) 12969, pp.88-97

arXiv:2112.10894 [pdf]

doi 10.1109/CW52790.2021.00041

Subject-Independent Drowsiness Recognition from Single-Channel EEG with an Interpretable CNN-LSTM model

Authors: Jian Cui, Zirui Lan, Tianhu Zheng, Yisi Liu, Olga Sourina, Lipo Wang, Wolfgang Müller-Wittig

Abstract: For EEG-based drowsiness recognition, it is desirable to use subject-independent recognition since conducting calibration on each subject is time-consuming. In this paper, we propose a novel Convolutional Neural Network (CNN)-Long Short-Term Memory (LSTM) model for subject-independent drowsiness recognition from single-channel EEG signals. Different from existing deep learning models that are most… ▽ More For EEG-based drowsiness recognition, it is desirable to use subject-independent recognition since conducting calibration on each subject is time-consuming. In this paper, we propose a novel Convolutional Neural Network (CNN)-Long Short-Term Memory (LSTM) model for subject-independent drowsiness recognition from single-channel EEG signals. Different from existing deep learning models that are mostly treated as black-box classifiers, the proposed model can explain its decisions for each input sample by revealing which parts of the sample contain important features identified by the model for classification. This is achieved by a visualization technique by taking advantage of the hidden states output by the LSTM layer. Results show that the model achieves an average accuracy of 72.97% on 11 subjects for leave-one-out subject-independent drowsiness recognition on a public dataset, which is higher than the conventional baseline methods of 55.42%-69.27%, and state-of-the-art deep learning methods. Visualization results show that the model has discovered meaningful patterns of EEG signals related to different mental states across different subjects. △ Less

Submitted 21 November, 2021; originally announced December 2021.

Journal ref: 2021 International Conference on Cyberworlds (CW), 2021, pp. 201-208

arXiv:2111.12324 [pdf, other]

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Authors: Haoran Sun, Lantian Li, Thomas Fang Zheng, Dong Wang

Abstract: The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we… ▽ More The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even worse than guess), demonstrating that the present speech emotion recognition model is rather weak. Interestingly, by removing one or several unimportant components, the cross-corpus results can be improved. This demonstrates the potential of the decomposition approach towards a generalizable emotion recognition. △ Less

Submitted 24 November, 2021; originally announced November 2021.

arXiv:2111.08195 [pdf, other]

doi 10.1145/3485730.3485932

MoRe-Fi: Motion-robust and Fine-grained Respiration Monitoring via Deep-Learning UWB Radar

Authors: Tianyue Zheng, Zhe Chen, Shujie Zhang, Chao Cai, Jun Luo

Abstract: Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static, largely confining their adoptions in… ▽ More Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static, largely confining their adoptions in everyday environments where body movements are inevitable. Fortunately, radio-frequency (RF) enabled contact-free sensing, though suffering motion interference inseparable by conventional filtering, may offer a potential to distill respiratory waveform with the help of deep learning. To realize this potential, we introduce MoRe-Fi to conduct fine-grained respiration monitoring under body movements. MoRe-Fi leverages an IR-UWB radar to achieve contact-free sensing, and it fully exploits the complex radar signal for data augmentation. The core of MoRe-Fi is a novel variational encoder-decoder network; it aims to single out the respiratory waveforms that are modulated by body movements in a non-linear manner. Our experiments with 12 subjects and 66-hour data demonstrate that MoRe-Fi accurately recovers respiratory waveform despite the interference caused by body movements. We also discuss potential applications of MoRe-Fi for pulmonary disease diagnoses. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 14 pages

Journal ref: SenSys '21: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems November 2021

arXiv:2111.04566 [pdf, other]

doi 10.1145/3384419.3430735

RF-Net: a Unified Meta-learning Framework for RF-enabled One-shot Human Activity Recognition

Authors: Shuya Ding, Zhe Chen, Tianyue Zheng, Jun Luo

Abstract: Radio-Frequency (RF) based device-free Human Activity Recognition (HAR) rises as a promising solution for many applications. However, device-free (or contactless) sensing is often more sensitive to environment changes than device-based (or wearable) sensing. Also, RF datasets strictly require on-line labeling during collection, starkly different from image and text data collections where human int… ▽ More Radio-Frequency (RF) based device-free Human Activity Recognition (HAR) rises as a promising solution for many applications. However, device-free (or contactless) sensing is often more sensitive to environment changes than device-based (or wearable) sensing. Also, RF datasets strictly require on-line labeling during collection, starkly different from image and text data collections where human interpretations can be leveraged to perform off-line labeling. Therefore, existing solutions to RF-HAR entail a laborious data collection process for adapting to new environments. To this end, we propose RF-Net as a meta-learning based approach to one-shot RF-HAR; it reduces the labeling efforts for environment adaptation to the minimum level. In particular, we first examine three representative RF sensing techniques and two major meta-learning approaches. The results motivate us to innovate in two designs: i) a dual-path base HAR network, where both time and frequency domains are dedicated to learning powerful RF features including spatial and attention-based temporal ones, and ii) a metric-based meta-learning framework to enhance the fast adaption capability of the base network, including an RF-specific metric module along with a residual classification module. We conduct extensive experiments based on all three RF sensing techniques in multiple real-world indoor environments; all results strongly demonstrate the efficacy of RF-Net compared with state-of-the-art baselines. △ Less

Submitted 28 October, 2021; originally announced November 2021.

Comments: 14 pages

Journal ref: SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor Systems, November 2020

arXiv:2110.14849 [pdf, other]

doi 10.1109/MCOM.001.2000288

Enhancing RF Sensing with Deep Learning: A Layered Approach

Authors: Tianyue Zheng, Zhe Chen, Shuya Ding, Jun Luo

Abstract: In recent years, radio frequency (RF) sensing has gained increasing popularity due to its pervasiveness, low cost, non-intrusiveness, and privacy preservation. However, realizing the promises of RF sensing is highly nontrivial, given typical challenges such as multipath and interference. One potential solution leverages deep learning to build direct mappings from the RF domain to target domains, h… ▽ More In recent years, radio frequency (RF) sensing has gained increasing popularity due to its pervasiveness, low cost, non-intrusiveness, and privacy preservation. However, realizing the promises of RF sensing is highly nontrivial, given typical challenges such as multipath and interference. One potential solution leverages deep learning to build direct mappings from the RF domain to target domains, hence avoiding complex RF physical modeling. While earlier solutions exploit only simple feature extraction and classification modules, an emerging trend adds functional layers on top of elementary modules for more powerful generalizability and flexible applicability. To better understand this potential, this article takes a layered approach to summarize RF sensing enabled by deep learning. Essentially, we present a four-layer framework: physical, backbone, generalization, and application. While this layered framework provides readers a systematic methodology for designing deep interpreted RF sensing, it also facilitates making improvement proposals and hints at future research opportunities. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 7 pages

Journal ref: IEEE Communications Magazine ( Volume: 59, Issue: 2, February 2021)

arXiv:2110.14848 [pdf, other]

doi 10.1145/3397321

V2iFi: in-Vehicle Vital Sign Monitoring via Compact RF Sensing

Authors: Tianyue Zheng, Zhe Chen, Chao Cai, Jun Luo, Xu Zhang

Abstract: Given the significant amount of time people spend in vehicles, health issues under driving condition have become a major concern. Such issues may vary from fatigue, asthma, stroke, to even heart attack, yet they can be adequately indicated by vital signs and abnormal activities. Therefore, in-vehicle vital sign monitoring can help us predict and hence prevent these issues. Whereas existing sensor-… ▽ More Given the significant amount of time people spend in vehicles, health issues under driving condition have become a major concern. Such issues may vary from fatigue, asthma, stroke, to even heart attack, yet they can be adequately indicated by vital signs and abnormal activities. Therefore, in-vehicle vital sign monitoring can help us predict and hence prevent these issues. Whereas existing sensor-based (including camera) methods could be used to detect these indicators, privacy concern and system complexity both call for a convenient yet effective and robust alternative. This paper aims to develop V2iFi, an intelligent system performing monitoring tasks using a COTS impulse radio mounted on the windshield. V2iFi is capable of reliably detecting driver's vital signs under driving condition and with the presence of passengers, thus allowing for potentially inferring corresponding health issues. Compared with prior work based on Wi-Fi CSI, V2iFi is able to distinguish reflected signals from multiple users, and hence provide finer-grained measurements under more realistic settings. We evaluate V2iFi both in lab environments and during real-life road tests; the results demonstrate that respiratory rate, heart rate, and heart rate variability can all be estimated accurately. Based on these estimation results, we further discuss how machine learning models can be applied on top of V2iFi so as to improve both physiological and psychological wellbeing in driving environments. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 27 pages

Journal ref: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 4, Issue 2, June 2020

arXiv:2110.14307 [pdf, other]

doi 10.1109/TMC.2021.3073969

RF-Based Human Activity Recognition Using Signal Adapted Convolutional Neural Network

Authors: Zhe Chen, Chao Cai, Tianyue Zheng, Jun Luo, Jie Xiong, Xin Wang

Abstract: Human Activity Recognition (HAR) plays a critical role in a wide range of real-world applications, and it is traditionally achieved via wearable sensing. Recently, to avoid the burden and discomfort caused by wearable devices, device-free approaches exploiting RF signals arise as a promising alternative for HAR. Most of the latest device-free approaches require training a large deep neural network… ▽ More Human Activity Recognition (HAR) plays a critical role in a wide range of real-world applications, and it is traditionally achieved via wearable sensing. Recently, to avoid the burden and discomfort caused by wearable devices, device-free approaches exploiting RF signals arise as a promising alternative for HAR. Most of the latest device-free approaches require training a large deep neural network model in either time or frequency domain, entailing extensive storage to contain the model and intensive computations to infer activities. Consequently, even with some major advances on device-free HAR, current device-free approaches are still far from practical in real-world scenarios where the computation and storage resources possessed by, for example, edge devices, are limited. Therefore, we introduce HAR-SAnet which is a novel RF-based HAR framework. It adopts an original signal adapted convolutional neural network architecture: instead of feeding the handcraft features of RF signals into a classifier, HAR-SAnet fuses them adaptively from both time and frequency domains to design an end-to-end neural network model. We apply point-wise grouped convolution and depth-wise separable convolutions to confine the model scale and to speed up the inference execution time. The experiment results show that the recognition accuracy of HAR-SAnet outperforms state-of-the-art algorithms and systems. △ Less

Submitted 27 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 13 pages

Journal ref: IEEE Transactions on Mobile Computing, 19 April 2021

arXiv:2110.14279 [pdf, other]

doi 10.1145/3447993.3483258

SiWa: See into Walls via Deep UWB Radar

Authors: Tianyue Zheng, Zhe Chen, Jun Luo, Lin Ke, Chaoyang Zhao, Yaowen Yang

Abstract: Being able to see into walls is crucial for diagnostics of building health; it enables inspections of wall structure without undermining the structural integrity. However, existing sensing devices do not seem to offer a full capability in mapping the in-wall structure while identifying their status (e.g., seepage and corrosion). In this paper, we design and implement SiWa as a low-cost and portabl… ▽ More Being able to see into walls is crucial for diagnostics of building health; it enables inspections of wall structure without undermining the structural integrity. However, existing sensing devices do not seem to offer a full capability in mapping the in-wall structure while identifying their status (e.g., seepage and corrosion). In this paper, we design and implement SiWa as a low-cost and portable system for wall inspections. Built upon a customized IR-UWB radar, SiWa scans a wall as a user swipes its probe along the wall surface; it then analyzes the reflected signals to synthesize an image and also to identify the material status. Although conventional schemes exist to handle these problems individually, they require troublesome calibrations that largely prevent them from practical adoptions. To this end, we equip SiWa with a deep learning pipeline to parse the rich sensory data. With an ingenious construction and innovative training, the deep learning modules perform structural imaging and the subsequent analysis on material status, without the need for parameter tuning and calibrations. We build SiWa as a prototype and evaluate its performance via extensive experiments and field studies; results confirm that SiWa accurately maps in-wall structures, identifies their materials, and detects possible failures, suggesting a promising solution for diagnosing building health with lower effort and cost. △ Less

Submitted 27 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 14 pages

Journal ref: MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking October 2021

arXiv:2110.05087 [pdf]

A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing

Authors: Wei Liu, Meng Sun, Xiongwei Zhang, Hugo Van hamme, Thomas Fang Zheng

Abstract: The choice of an optimal time-frequency resolution is usually a difficult but important step in tasks involving speech signal classification, e.g., speech anti-spoofing. The variations of the performance with different choices of timefrequency resolutions can be as large as those with different model architectures, which makes it difficult to judge what the improvement actually comes from when a n… ▽ More The choice of an optimal time-frequency resolution is usually a difficult but important step in tasks involving speech signal classification, e.g., speech anti-spoofing. The variations of the performance with different choices of timefrequency resolutions can be as large as those with different model architectures, which makes it difficult to judge what the improvement actually comes from when a new network architecture is invented and introduced as the classifier. In this paper, we propose a multi-resolution front-end for feature extraction in an end-to-end classification framework. Optimal weighted combinations of multiple time-frequency resolutions will be learned automatically given the objective of a classification task. Features extracted with different time-frequency resolutions are weighted and concatenated as inputs to the successive networks, where the weights are predicted by a learnable neural network inspired by the weighting block in squeeze-and-excitation networks (SENet). Furthermore, the refinement of the chosen timefrequency resolutions is investigated by pruning the ones with relatively low importance, which reduces the complexity and size of the model. The proposed method is evaluated on the tasks of speech anti-spoofing in ASVSpoof 2019 and its superiority has been justified by comparing with similar baselines. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: submitted to ICASSP 2022

arXiv:2109.00605 [pdf, ps, other]

Backstepping Mean-Field Density Control for Large-Scale Heterogeneous Nonlinear Stochastic Systems

Authors: Tongjia Zheng, Qing Han, Hai Lin

Abstract: This work studies the problem of controlling the mean-field density of large-scale stochastic systems, which has applications in various fields such as swarm robotics. Recently, there is a growing amount of literature that employs mean-field partial differential equations (PDEs) to model the density evolution and uses density feedback to design control laws which, by acting on individual systems,… ▽ More This work studies the problem of controlling the mean-field density of large-scale stochastic systems, which has applications in various fields such as swarm robotics. Recently, there is a growing amount of literature that employs mean-field partial differential equations (PDEs) to model the density evolution and uses density feedback to design control laws which, by acting on individual systems, stabilize their density towards a target profile. In spite of its stability property and computational efficiency, the success of density feedback relies on assuming the systems to be homogeneous first-order integrators (plus white noise) and ignores higher-order dynamics, making it less applicable in practice. In this work, we present a backstepping design algorithm that extends density control to heterogeneous and higher-order stochastic systems in strict-feedback forms. We show that the strict-feedback form in the individual level corresponds to, in the collective level, a PDE (of densities) distributedly driven by a collection of heterogeneous stochastic systems. The presented backstepping design then starts with a density feedback design for the PDE, followed by a sequence of stabilizing design for the remaining stochastic systems. We present a candidate control law with stability proof and apply it to nonholonomic mobile robots. A simulation is included to verify the effectiveness of the algorithm. △ Less

Submitted 24 March, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

arXiv:2107.01322 [pdf, other]

Physical Layer Security for NOMA-Enabled Multi-Access Edge Computing Wireless Networks

Authors: Yating Wen, Tong-Xing Zheng, Yongxia Tong, Hao-Wen Liu, Xin Chen, Pengcheng Mu, Hui-Ming Wang

Abstract: Multi-access edge computing (MEC) has been regarded as a promising technique for enhancing computation capabilities for wireless networks. In this paper, we study physical layer security in an MEC system where multiple users offload partial of their computation tasks to a base station simultaneously based on non-orthogonal multiple access (NOMA), in the presence of a malicious eavesdropper. Secrec… ▽ More Multi-access edge computing (MEC) has been regarded as a promising technique for enhancing computation capabilities for wireless networks. In this paper, we study physical layer security in an MEC system where multiple users offload partial of their computation tasks to a base station simultaneously based on non-orthogonal multiple access (NOMA), in the presence of a malicious eavesdropper. Secrecy outage probability is adopted to measure the security performance of the computation offloading against eavesdropping attacks. We aim to minimize the sum energy consumption of all the users, subject to constraints in terms of the secrecy offloading rate, the secrecy outage probability, and the decoding order of NOMA. Although the original optimization problem is non-convex and challenging to solve, we put forward an efficient algorithm based on sequential convex approximation and penalty dual decomposition. Numerical results are eventually provided to validate the convergence of the proposed algorithm and its superior energy efficiency with secrecy requirements. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: 6 pages, 3 figures, and Accepted to present at IEEE/CIC ICCC 2021

arXiv:2106.05318 [pdf, other]

doi 10.1109/TAC.2021.3123239

Distributed Mean-Field Density Estimation for Large-Scale Systems

Authors: Tongjia Zheng, Qing Han, Hai Lin

Abstract: This work studies how to estimate the mean-field density of large-scale systems in a distributed manner. Such problems are motivated by the recent swarm control technique that uses mean-field approximations to represent the collective effect of the swarm, wherein the mean-field density (especially its gradient) is usually used in feedback control design. In the first part, we formulate the density… ▽ More This work studies how to estimate the mean-field density of large-scale systems in a distributed manner. Such problems are motivated by the recent swarm control technique that uses mean-field approximations to represent the collective effect of the swarm, wherein the mean-field density (especially its gradient) is usually used in feedback control design. In the first part, we formulate the density estimation problem as a filtering problem of the associated mean-field partial differential equation (PDE), for which we employ kernel density estimation (KDE) to construct noisy observations and use filtering theory of PDE systems to design an optimal (centralized) density filter. It turns out that the covariance operator of observation noise depends on the unknown density. Hence, we use approximations for the covariance operator to obtain a suboptimal density filter, and prove that both the density estimates and their gradient are convergent and remain close to the optimal one using the notion of input-to-state stability (ISS). In the second part, we continue to study how to decentralize the density filter such that each agent can estimate the mean-field density based on only its own position and local information exchange with neighbors. We prove that the local density filter is also convergent and remains close to the centralized one in the sense of ISS. Simulation results suggest that the centralized suboptimal density filter is able to generate convergent density estimates, and the local density filter is able to converge and remain close to the centralized filter. △ Less

Submitted 10 October, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: arXiv admin note: text overlap with arXiv:2009.05366

arXiv:2106.00899 [pdf, other]

Feedback Interconnected Mean-Field Density Estimation and Control

Authors: Tongjia Zheng, Qing Han, Hai Lin

Abstract: Swarm robotic systems have foreseeable applications in the near future. Recently, there has been an increasing amount of literature that employs mean-field partial differential equations (PDEs) to model the time-evolution of the probability density of swarm robotic systems and uses density feedback to design stabilizing control laws that act on individuals such that their density converges to a ta… ▽ More Swarm robotic systems have foreseeable applications in the near future. Recently, there has been an increasing amount of literature that employs mean-field partial differential equations (PDEs) to model the time-evolution of the probability density of swarm robotic systems and uses density feedback to design stabilizing control laws that act on individuals such that their density converges to a target profile. However, it remains largely unexplored considering problems of how to estimate the mean-field density, how the density estimation algorithms affect the control performance, and whether the estimation performance in turn depends on the control algorithms. In this work, we focus on studying the interplay of these algorithms. Specifically, we propose new density control laws which use the mean-field density and its gradient as feedback, and prove that they are globally input-to-state stable (ISS) with respect to estimation errors. Then, we design filtering algorithms to estimate the density and its gradient separately, and prove that these estimates are convergent assuming the control laws are known. Finally, we show that the feedback interconnection of these estimation and control algorithms is still globally ISS, which is attributed to the bilinearity of the PDE system. An agent-based simulation is included to verify the stability of these algorithms and their feedback interconnection. △ Less

Submitted 23 March, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

arXiv:2106.00895 [pdf, ps, other]

Field Estimation using Robotic Swarms through Bayesian Regression and Mean-Field Feedback

Authors: Tongjia Zheng, Hai Lin

Abstract: Recent years have seen an increased interest in using mean-field density based modelling and control strategy for deploying robotic swarms. In this paper, we study how to dynamically deploy the robots subject to their physical constraints to efficiently measure and reconstruct certain unknown spatial field (e.g. the air pollution index over a city). Specifically, the evolution of the robots' densi… ▽ More Recent years have seen an increased interest in using mean-field density based modelling and control strategy for deploying robotic swarms. In this paper, we study how to dynamically deploy the robots subject to their physical constraints to efficiently measure and reconstruct certain unknown spatial field (e.g. the air pollution index over a city). Specifically, the evolution of the robots' density is modelled by mean-field partial differential equations (PDEs) which are uniquely determined by the robots' individual dynamics. Bayesian regression models are used to obtain predictions and return a variance function that represents the confidence of the prediction. We formulate a PDE constrained optimization problem based on this variance function to dynamically generate a reference density signal which guides the robots to uncertain areas to collect new data, and design mean-field feedback-based control laws such that the robots' density converges to this reference signal. We also show that the proposed feedback law is robust to density estimation errors in the sense of input-to-state stability. Simulations are included to verify the effectiveness of the algorithms. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Showing 1–50 of 78 results for author: Zheng, T