Search | arXiv e-print repository

End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Abstract: This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu… ▽ More This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formulate the E2E design of feature encoding, MIMO precoding, and classification as a conditional mutual information maximization problem. However, it is notoriously difficult to design and train an E2E network that can be adaptive to both the task dataset and different channel realizations. Regarding network training, we propose a decoupled pretraining framework that separately trains the feature encoder and the MIMO precoder, with a maximum a posteriori (MAP) classifier employed at the server to generate the inference result. The feature encoder is pretrained exclusively using the task dataset, while the MIMO precoder is pretrained solely based on the channel and noise distributions. Nevertheless, we manage to align the pretraining objectives of each individual component with the E2E learning objective, so as to approach the performance bound of E2E learning. By leveraging the decoupled pretraining results for initialization, the E2E learning can be conducted with minimal training overhead. Regarding network architecture design, we develop two deep unfolded precoding networks that effectively incorporate the domain knowledge of the solution to the decoupled precoding problem. Simulation results on both the CIFAR-10 and ModelNet10 datasets verify that the proposed method achieves significantly higher classification accuracy compared to various baselines. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: major revision in IEEE JSAC

arXiv:2408.15481 [pdf, ps, other]

Joint Offloading and Beamforming Design in Integrating Sensing, Communication, and Computing Systems: A Distributed Approach

Authors: Peng Liu, Zesong Fei, Xinyi Wang, Jingxuan Huang, Jie Hu, J. Andrew Zhang

Abstract: When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and… ▽ More When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and the cloud server to address the sensing tasks of ISAC terminals. Specifically, we propose a novel three-tier integrated sensing, communication, and computing (ISCC) framework composed of one cloud server, multiple MEC servers, and multiple terminals, where the terminals can optionally offload sensing data to the MEC server or the cloud server. The offload message is sent via the ISAC waveform, whose echo is used for sensing. We jointly optimize the computation offloading and beamforming strategies to minimize the average execution latency while satisfying sensing requirements. In particular, we propose a low-complexity distributed algorithm to solve the problem. Firstly, we use the alternating direction method of multipliers (ADMM) and derive the closed-form solution for offloading decision variables. Subsequently, we convert the beamforming optimization sub-problem into a weighted minimum mean-square error (WMMSE) problem and propose a fractional programming based algorithm. Numerical results demonstrate that the proposed ISCC framework and distributed algorithm significantly reduce the execution latency and the energy consumption of sensing tasks at a lower computational complexity compared to existing schemes. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 15 pages, 12 figures, submitted to IEEE journals for possible publication

arXiv:2408.13593 [pdf, ps, other]

Learning Multi-Rate Task-Oriented Communications Over Symmetric Discrete Memoryless Channels

Authors: Anbang Zhang, Shuaishuai Guo

Abstract: This letter introduces a multi-rate task-oriented communication (MR-ToC) framework. This framework dynamically adapts to variations in affordable data rate within the communication pipeline. It conceptualizes communication pipelines as symmetric, discrete, memoryless channels. We employ a progressive learning strategy to train the system, comprising a nested codebook for encoding and task inferenc… ▽ More This letter introduces a multi-rate task-oriented communication (MR-ToC) framework. This framework dynamically adapts to variations in affordable data rate within the communication pipeline. It conceptualizes communication pipelines as symmetric, discrete, memoryless channels. We employ a progressive learning strategy to train the system, comprising a nested codebook for encoding and task inference. This configuration allows for the adjustment of multiple rate levels in response to evolving channel conditions. The results from our experiments show that this system not only supports edge inference across various coding levels but also excels in adapting to variable communication environments. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.11982 [pdf, other]

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html. △ Less

Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.04188 [pdf, ps, other]

Trustworthy Semantic-Enabled 6G Communication: A Task-oriented and Privacy-preserving Perspective

Authors: Shuaishuai Guo, Anbang Zhang, Yanhu Wang, Chenyuan Feng, Tony Q. S. Quek

Abstract: Trustworthy task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the… ▽ More Trustworthy task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the original data from the transmitted features. This article provides an in-depth analysis of privacy-preserving strategies specifically designed for ToSC relying on deep neural network-based joint source and channel coding (DeepJSCC). The study encompasses a detailed comparative assessment of trustworthy feature perturbation methods such as differential privacy and encryption, alongside intrinsic security incorporation approaches like adversarial learning to train the JSCC and learning-based vector quantization (LBVQ). This comparative analysis underscores the integration of advanced explainable learning algorithms into communication systems, positing a new benchmark for privacy standards in the forthcoming 6G era. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.02859 [pdf, other]

Multistain Pretraining for Slide Representation Learning in Pathology

Authors: Guillaume Jaume, Anurag Vaidya, Andrew Zhang, Andrew H. Song, Richard J. Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long Phi Le, Faisal Mahmood

Abstract: Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learnin… ▽ More Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: ECCV'24

arXiv:2407.21514 [pdf]

Wireless Communications in Doubly Selective Channels with Domain Adaptivity

Authors: J. Andrew Zhang, Hongyang Zhang, Kai Wu, Xiaojing Huang, Jinhong Yuan, Y. Jay Guo

Abstract: Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article expl… ▽ More Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article explores domain-adaptive system design, dynamically selecting best-fit domains for modulation, pilot placement, and equalization based on channel conditions, to enhance performance across diverse environments. We examine domain classifications and connections, signal designs, and equalization techniques with domain adaptivity, and finally highlight future research opportunities. △ Less

Submitted 31 July, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: Magazine article, 7 pages, 4 figures, 2 tables

arXiv:2407.17057 [pdf, other]

Efffcient Sensing Parameter Estimation with Direct Clutter Mitigation in Perceptive Mobile Networks

Authors: Hang Li, Hongming Yang, Qinghua Guo, J. Andrew Zhang, Yang Xiang, Yashan Pang

Abstract: In this work, we investigate sensing parameter estimation in the presence of clutter in perceptive mobile networks (PMNs) that integrate radar sensing into mobile communications. Performing clutter suppression before sensing parameter estimation is generally desirable as the number of sensing parameters can be signiffcantly reduced. However, existing methods require high-complexity clutter mitigat… ▽ More In this work, we investigate sensing parameter estimation in the presence of clutter in perceptive mobile networks (PMNs) that integrate radar sensing into mobile communications. Performing clutter suppression before sensing parameter estimation is generally desirable as the number of sensing parameters can be signiffcantly reduced. However, existing methods require high-complexity clutter mitigation and sensing parameter estimation, where clutter is ffrstly identiffed and then removed. In this correspondence, we propose a much simpler but more effective method by incorporating a clutter cancellation mechanism in formulating a sparse signal model for sensing parameter estimation. In particular, clutter mitigation is performed directly on the received signals and the unitary approximate message passing (UAMP) is leveraged to exploit the common support for sensing parameter estimation in the formulated sparse signal recovery problem. Simulation results show that, compared to state-of-theart methods, the proposed method delivers signiffcantly better performance while with substantially reduced complexity. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.13491 [pdf, other]

Performance Analysis and Low-Complexity Beamforming Design for Near-Field Physical Layer Security

Authors: Yunpu Zhang, Yuan Fang, Xianghao Yu, Changsheng You, Ying-Jun Angela Zhang

Abstract: Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing ex… ▽ More Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing extra degrees of freedom (DoFs) to wireless system design. In this paper, we explore the beam focusing-based physical layer security (PLS) in the near field, where multiple legitimate users and one eavesdropper are situated in the near-field region of the XL-array base station (BS). First, we consider a special case with one legitimate user and one eavesdropper to shed useful insights into near-field PLS. In particular, it is shown that 1) Artificial noise (AN) is crucial to near-field security provisioning, transforming an insecure system to a secure one; 2) AN can yield numerous security gains, which considerably enhances PLS in the near field as compared to the case without AN taken into account. Next, for the general case with multiple legitimate users, we propose an efficient low-complexity approach to design the beamforming with AN to guarantee near-field secure transmission. Specifically, the low-complexity approach is conceived starting by introducing the concept of interference domain to capture the inter-user interference level, followed by a three-step identification framework for designing the beamforming. Finally, numerical results reveal that 1) the PLS enhancement in the near field is pronounced thanks to the additional spatial DoFs; 2) the proposed approach can achieve close performance to that of the computationally-extensive conventional approach yet with a significantly lower computational complexity. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 13 pages, 13 figures

arXiv:2406.12426 [pdf, other]

Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design

Authors: Yuan Fang, Xianghao Yu, Jie Xu, Ying-Jun Angela Zhang

Abstract: This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to… ▽ More This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cramér-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation. △ Less

Submitted 18 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2404.13536

arXiv:2406.09190 [pdf, other]

Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation

Authors: Zhiqiang Xiao, Xianda Liu, Yong Zeng, J. Andrew Zhang, Shi Jin, Rui Zhang

Abstract: Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design… ▽ More Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design. In this article, by leveraging the super spatial resolution of large antenna arrays and the multi-path spatial sparsity of highfrequency wireless channels, we introduce a new approach for waveform design based on the recently proposed delay-Doppler alignment modulation (DDAM). In particular, DDAM makes a paradigm shift of waveform design from the conventional manner of tolerating channel delay and Doppler spreads to actively manipulating them. First, we review the fundamental constraints and performance limitations of orthogonal frequency division multiplexing (OFDM) and introduce new opportunities for 6G waveform design. Next, the motivations and basic principles of DDAM are presented, followed by its various extensions to different wireless system setups. Finally, the main design considerations for DDAM are discussed and the new opportunities for future research are highlighted. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.05700 [pdf, other]

HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model

Authors: Hang Fu, Genyun Sun, Yinhe Li, Jinchang Ren, Aizhu Zhang, Cheng Jing, Pedram Ghamisi

Abstract: Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Ins… ▽ More Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Inspired by the ability of Mamba to model long-range dependencies with linear complexity, we explore its potential for HSI dehazing and propose the first HSI Dehazing Mamba (HDMba) network. Specifically, we design a novel window selective scan module (WSSM) that captures local dependencies within windows and global correlations between windows by partitioning them. This approach improves the ability of conventional Mamba in local feature extraction. By modeling the local and global spectral-spatial information flow, we achieve a comprehensive analysis of hazy regions. The DehazeMamba layer (DML), constructed by WSSM, and residual DehazeMamba (RDM) blocks, composed of DMLs, are the core components of the HDMba framework. These components effectively characterize the complex distribution of haze in HSIs, aiding in scene reconstruction and dehazing. Experimental results on the Gaofen-5 HSI dataset demonstrate that HDMba outperforms other state-of-the-art methods in dehazing performance. The code will be available at https://github.com/RsAI-lab/HDMba. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.16011 [pdf, ps, other]

Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models

Authors: Shuaishuai Guo, Yanhu Wang, Jia Ye, Anbang Zhang, Kun Xu

Abstract: Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua… ▽ More Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visual data, expressed as natural language descriptions. These are further refined using a pre-trained large language model (LLM) for importance quantification and semantic error correction. The subsequent semantic importance-aware communications (SIAC) aim to minimize semantic loss while respecting transmission delay constraints, exemplified through adaptive modulation and coding strategies. At the receiving end, LLM-based semantic error correction is utilized. If visual data recreation is desired, a pre-trained generative artificial intelligence (AI) model can regenerate it using the corrected descriptions. We assess semantic similarities between transmitted and recovered content, demonstrating ULSC's superior ability to convey semantic understanding compared to feature-level semantic communications (FLSC). ULSC's conversion of visual data to natural language facilitates various cognitive tasks, leveraging human knowledge bases. Additionally, this method enhances privacy, as neither original data nor features are directly transmitted. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.10553 [pdf, other]

Revealing the Trade-off in ISAC Systems: The KL Divergence Perspective

Authors: Zesong Fei, Shuntian Tang, Xinyi Wang, Fanghao Xia, Fan Liu, J. Andrew Zhang

Abstract: Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detecti… ▽ More Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detection. Thereafter, we investigate the impact of constellation and beamforming design on the Pareto bound via deep learning and semi-definite relaxation (SDR) techniques. Simulation results show the trade-off between sensing and communication performance in terms of bit error rate (BER) and probability of detection under different parameter set-ups. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 5 pages, 5 figures; submitted to IEEE journals for possible publication

arXiv:2404.09149 [pdf, other]

Heuristic Solution to Joint Deployment and Beamforming Design for STAR-RIS Aided Networks

Authors: Bai Yan, Qi Zhao, Jin Zhang, J. Andrew Zhang

Abstract: This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities… ▽ More This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities and fully boosting the system's performance. We consider a sum rate maximization problem with joint optimization and hybrid beamforming design. An offline heuristic solution is proposed for the problem, developed based on differential evolution and semi-definite programming methods. In particular, a point-point representation is proposed for characterizing and exploiting the user-grouping. A balanced grouping method is designed to achieve a desired user grouping with low complexity. Numerical results demonstrate the substantial performance gains achievable through optimal deployment design. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 30 pages

arXiv:2404.05984 [pdf, ps, other]

Interference Management for Full-Duplex ISAC in B5G/6G Networks: Architectures, Challenges, and Solutions

Authors: Aimin Tang, Xudong Wang, J. Andrew Zhang

Abstract: Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the a… ▽ More Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the ability of full-duplex communications is usually ignored. In this article, we provide an overview of full-duplex ISAC (FD-ISAC), where a full-duplex radio is used for both wireless sensing and full-duplex communications in B5G/6G networks, with a focus on the fundamental interference management problem in such networks. First, different ISAC architectures are introduced, considering different full-duplex communication modes and wireless sensing modes. Next, the challenging issues of link-level interference and network-level interference are analyzed, illustrating a critical demand on interference management for FD-ISAC. Potential solutions to interference management are then reviewed from the perspective of radio architecture design, beamforming, mode selection, and resource allocation. The corresponding open problems are also highlighted. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Communications Magazine

arXiv:2403.12630 [pdf, other]

Reproducing the Acoustic Velocity Vectors in a Circular Listening Area

Authors: Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

Abstract: Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas meas… ▽ More Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas measuring the acoustic velocity vectors on the boundary requires complicated measurement setup. This paper proposes the cylindrical harmonic coefficients of the acoustic velocity vectors in a circular area (CHV coefficients), which are calculated from the cylindrical harmonic coefficients of the global pressure (global CHP coefficients) by using the sound field translation formula. The global CHP coefficients can be measured by a circular microphone array, which can be bought off-the-shelf. By matching the CHV coefficients, the acoustic velocity vectors are reproduced throughout the listening area. Hence, listener's movements are allowed. Simulations show that at low frequency, where the acoustic velocity vectors are the dominant factor for localization, the proposed reproduction method based on the CHV coefficients results in higher accuracy in reproduced acoustic velocity vectors when compared with traditional method based on the global CHP coefficients. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Submitted to EUSIPCO 2024

arXiv:2403.11940 [pdf, other]

Multistep Inverse Is Not All You Need

Authors: Alexander Levine, Peter Stone, Amy Zhang

Abstract: In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the… ▽ More In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.05793 [pdf, ps, other]

Performance Bounds for Passive Sensing in Asynchronous ISAC Systems -- Appendices

Authors: Jingbo Zhao, Zhaoming Lu, J. Andrew Zhang, Weicai Li, Yifeng Xiong, Zijun Han, Xiangming Wen, Tao Gu

Abstract: This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the f… ▽ More This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the findings and methodologies presented in the main text. All external references to equations, theorems, and so forth, are directed towards the corresponding elements within the main paper. △ Less

Submitted 29 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 5 pages

arXiv:2402.17533 [pdf, other]

Black-box Adversarial Attacks Against Image Quality Assessment Models

Authors: Yu Ran, Ao-Xiang Zhang, Mingjie Li, Weixuan Tang, Yuan-Gen Wang

Abstract: The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the atta… ▽ More The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the attack problem as maximizing the deviation between the estimated quality scores of original and perturbed images, while restricting the perturbed image distortions for visual quality preservation. Under such formulation, we then design a Bi-directional loss function to mislead the estimated quality scores of adversarial examples towards an opposite direction with maximum deviation. On this basis, we finally develop an efficient and effective black-box attack method against NR-IQA models. Extensive experiments reveal that all the evaluated NR-IQA models are vulnerable to the proposed attack method. And the generated perturbations are not transferable, enabling them to serve the investigation of specialities of disparate IQA models. △ Less

Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.09048 [pdf, other]

Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiaojing Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks at far-separated transmitters and receivers. This causes the received signal to be affected by time-varying random phase offsets, severely degrading, or even failing, direct sensing. Hence, to effectively enable ISAC, considerable research has been directed toward addressing the clock asynchronism issue in bi-static sensing. This paper provides an overview of the issue and existing techniques developed in an ISAC background. Based on the review and comparison, we also draw insights into the future research directions and open problems, aiming to nurture the maturation of bi-static sensing in ISAC. △ Less

Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 20 pages, 6 figures, 1 table

arXiv:2401.15183 [pdf, other]

Moment-based metrics for molecules computable from cryo-EM images

Authors: Andy Zhang, Oscar Mickelin, Joe Kileel, Eric J. Verbeke, Nicholas F. Marshall, Marc Aurèle Gilles, Amit Singer

Abstract: Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric… ▽ More Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric between two molecular structures, which does not require 3-D alignment. Further, we introduce a metric between a stack of projection images and a molecular structure, which is invariant to rotations and reflections and does not require performing 3-D reconstruction. Additionally, the latter metric does not assume a uniform distribution of viewing angles. We demonstrate uses of the new metrics on synthetic and experimental datasets, highlighting their ability to measure structural similarity. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 21 Pages, 9 Figures, 2 Algorithms, and 3 Tables

arXiv:2401.09119 [pdf, other]

Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks

Authors: Yanmo Hu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

Abstract: Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef… ▽ More Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and effective uplink sensing scheme with the assistance of static anchor points. Two major algorithms are proposed in the scheme. The first algorithm estimates the relative timing and carrier frequency offsets due to clock asynchronism, with respect to those at a randomly selected reference snapshot. Theoretical performance analysis is provided for the algorithm. The estimates from the first algorithm are then used to compensate for the offsets and generate the angle-Doppler maps. Using the maps, the second algorithm identifies the anchor points, and then locates the UE and dynamic targets. Feasibility of UE localization is also analyzed. Simulation results are provided and demonstrate the effectiveness of the proposed algorithms. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 14 pages, 12 figures, journal paper

arXiv:2401.09064 [pdf, other]

Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

Abstract: Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r… ▽ More Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the research gap. Focusing on a single dynamic path in high-SNR scenarios, we derive the closed-form CRB. Then, through analyzing the mutual interference between dynamic and static paths, we simplify the CRB results by deriving close approximations, further unveiling new insights of the impact of numerous physical parameters on Doppler sensing. Moreover, utilizing the new CRB and analyses, we propose novel waveform optimization strategies for noise- and interference-limited sensing scenarios, which are also empowered by closed-form and efficient solutions. Extensive simulation results are provided to validate the preciseness of the derived CRB results and analyses, with the aid of the maximum-likelihood estimator. The results also demonstrate the substantial enhanced Doppler sensing accuracy and the sensing capabilities for low-speed target achieved by the proposed waveform design. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: 14 pages, 15 figures, journal paper

arXiv:2401.03473 [pdf, ps, other]

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively. △ Less

Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: Accepted at ICASSP 2024

arXiv:2312.09760 [pdf, other]

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41% compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ASRU2023

arXiv:2312.03255 [pdf, other]

doi 10.1109/JSAC.2024.3389122

Densifying MIMO: Channel Modeling, Physical Constraints, and Performance Evaluation for Holographic Communications

Authors: Y. Liu, M. Zhang, T. Wang, A. Zhang, M. Debbah

Abstract: As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevert… ▽ More As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevertheless, some challenges must be addressed to actualize this innovative concept. One is the mutual coupling among antenna elements within an array. When the element spacing is small, near-field coupling becomes the dominant factor that strongly restricts the array performance. Another is the polarization of electromagnetic waves. As an intrinsic property, it was not fully considered in the previous channel modeling of holographic communication. The third is the lack of real-world experiments to show the potential and possible defects of a holographic communication system. In this paper, we propose an electromagnetic channel model based on the characteristics of electromagnetic waves. This model encompasses the impact of mutual coupling in the transceiver sides and the depolarization in the propagation environment. Furthermore, by approximating an infinite array, the performance restrictions of large-scale dense antenna arrays are also studied theoretically to exploit the potential of the proposed channel. In addition, numerical simulations and a channel measurement experiment are conducted. The findings reveal that within limited spaces, the coupling effect, particularly for element spacing smaller than half of the wavelength, is the primary factor leading to the inflection point for the performance of holographic communications. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 14 pages, 20 figures, accepted by JSAC-SI-ESIT

arXiv:2310.07141 [pdf, ps, other]

Time and Frequency Offset Estimation and Intercarrier Interference Cancellation for AFDM Systems

Authors: Yuankun Tang, Anjie Zhang, Miaowen Wen, Yu Huang, Fei Ji, Jinming Wen

Abstract: Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency o… ▽ More Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency offset by comparing the correlations of samples. Moreover, we propose the other so-called stepwise ML estimator to reduce the complexity. Both proposed estimators exploit the redundant information contained within the chirp-periodic prefix inherent in AFDM symbols, thus dispensing with any additional pilots. To further mitigate the intercarrier interference resulting from the residual frequency offset, we design a mirror-mapping-based scheme for AFDM systems. Numerical results verify the effectiveness of the proposed time and carrier frequency offset estimation criteria and the mirror-mapping-based modulation for AFDM systems. △ Less

Submitted 28 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: accepted by IEEE Wireless Communications and Networking Conference (WCNC) 2024

arXiv:2310.05444 [pdf, other]

Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach

Authors: Zhiqing Wei, Jinghui Piao, Xin Yuan, Huici Wu, J. Andrew Zhang, Zhiyong Feng, Lin Wang, Ping Zhang

Abstract: Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified perfor… ▽ More Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified performance metrics in information theory, namely mutual information (MI), to measure the communication and sensing performance in multicarrier ISAC system. In multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system, we first derive the sensing and communication MI with subcarrier correlation and spatial correlation. Then, we propose optimal waveform designs for maximizing the sensing MI, communication MI and the weighted sum of sensing and communication MI, respectively. The optimization results are validated by Monte Carlo simulations. Our work provides effective closed-form expressions for waveform design, enabling the realization of MIMO-OFDM ISAC system with balanced performance in communication and sensing. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.04657 [pdf, other]

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Authors: Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

Abstract: The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t… ▽ More The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted by ASRU2023

arXiv:2310.03265 [pdf, other]

Integrated Communication, Sensing, and Computation Framework for 6G Networks

Authors: Xu Chen, Zhiyong Feng, J. Andrew Zhang, Zhaohui Yang, Xin Yuan, Xinxin He, Ping Zhang

Abstract: In the sixth generation (6G) era, intelligent machine network (IMN) applications, such as intelligent transportation, require collaborative machines with communication, sensing, and computation (CSC) capabilities. This article proposes an integrated communication, sensing, and computation (ICSAC) framework for 6G to achieve the reciprocity among CSC functions to enhance the reliability and latency… ▽ More In the sixth generation (6G) era, intelligent machine network (IMN) applications, such as intelligent transportation, require collaborative machines with communication, sensing, and computation (CSC) capabilities. This article proposes an integrated communication, sensing, and computation (ICSAC) framework for 6G to achieve the reciprocity among CSC functions to enhance the reliability and latency of communication, accuracy and timeliness of sensing information acquisition, and privacy and security of computing to realize the IMN applications. Specifically, the sensing and communication functions can merge into unified platforms using the same transmit signals, and the acquired real-time sensing information can be exploited as prior information for intelligent algorithms to enhance the performance of communication networks. This is called the computing-empowered integrated sensing and communications (ISAC) reciprocity. Such reciprocity can further improve the performance of distributed computation with the assistance of networked sensing capability, which is named the sensing-empowered integrated communications and computation (ICAC) reciprocity. The above ISAC and ICAC reciprocities can enhance each other iteratively and finally lead to the ICSAC reciprocity. To achieve these reciprocities, we explore the potential enabling technologies for the ICSAC framework. Finally, we present the evaluation results of crucial enabling technologies to show the feasibility of the ICSAC framework. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 8 pages, 5 figures, submitted to IEEE VTM

arXiv:2309.13609 [pdf, other]

Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks

Authors: Ao-Xiang Zhang, Yu Ran, Weixuan Tang, Yuan-Gen Wang

Abstract: No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has… ▽ More No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has received little attention in the academic community. In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack. Specifically, considering both the attack effect on quality score and the visual quality of adversarial video, the attack problem is formulated as misleading the estimated quality score under the constraint of just-noticeable difference (JND). Built upon such formulation, a novel loss function called Score-Reversed Boundary Loss is designed to push the adversarial video's estimated quality score far away from its ground-truth score towards a specific boundary, and the JND constraint is modeled as a strict $L_2$ and $L_\infty$ norm restriction. By this means, both white-box and black-box attacks can be launched in an effective and imperceptible manner. The source code is available at https://github.com/GZHU-DVL/AttackVQA. △ Less

Submitted 20 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.10605 [pdf, other]

An Active Noise Control System Based on Soundfield Interpolation Using a Physics-informed Neural Network

Authors: Yile Angela Zhang, Fei Ma, Thushara Abhayapala, Prasanga Samarasinghe, Amy Bastine

Abstract: Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural… ▽ More Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural network (PINN). PINN exploits the acoustic wave equation to assist soundfield interpolation under a limited number of monitoring microphones, and demonstrates better interpolation performance than the spherical harmonic method in simulations. An ANC system is designed to take advantage of the interpolated signal to reduce noise signal within the ROI. The PINN-assisted ANC system reduces noise more than that of the multiple-point ANC system in simulations. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.02888 [pdf, other]

Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Abstract: In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency betwee… ▽ More In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency between the design objectives may hinder the exploitation of the full benefits of task-oriented communications. In this paper, we consider a task-oriented multi-device edge inference system over a multiple-input multiple-output (MIMO) multiple-access channel, where the learning (i.e., feature encoding and classification) and communication (i.e., precoding) modules are designed with the same goal of inference accuracy maximization. Instead of end-to-end learning which involves both the task dataset and wireless channel during training, we advocate a separate design of learning and communication to achieve the consistent goal. Specifically, we leverage the maximal coding rate reduction (MCR2) objective as a surrogate to represent the inference accuracy, which allows us to explicitly formulate the precoding optimization problem. We cast valuable insights into this formulation and develop a block coordinate ascent (BCA) algorithm for efficient problem-solving. Moreover, the MCR2 objective serves the loss function for feature encoding and guides the classification design. Simulation results on the synthetic features explain the mechanism of MCR2 precoding at different SNRs. We also validate on the CIFAR-10 and ModelNet10 datasets that the proposed design achieves a better latency-accuracy tradeoff compared to various baselines. As such, our work paves the way for further exploration into the synergistic alignment of learning and communication objectives in task-oriented communication systems. △ Less

Submitted 28 May, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: under minor revision in IEEE Transactions on Wireless Communications

arXiv:2308.02915 [pdf, other]

doi 10.1145/3581783.3612307

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

Authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan

Abstract: When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we… ▽ More When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: Accepted at ACM MM 2023

arXiv:2307.14907 [pdf, other]

Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples

Authors: Andrew H. Song, Mane Williams, Drew F. K. Williamson, Guillaume Jaume, Andrew Zhang, Bowen Chen, Robert Serafin, Jonathan T. C. Liu, Alex Baras, Anil V. Parwani, Faisal Mahmood

Abstract: Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation… ▽ More Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation to clinical practice; manual and computational evaluations of such large 3D data have so far been impractical and/or unable to provide patient-level clinical insights. Here we present Modality-Agnostic Multiple instance learning for volumetric Block Analysis (MAMBA), a deep-learning-based platform for processing 3D tissue images from diverse imaging modalities and predicting patient outcomes. Archived prostate cancer specimens were imaged with open-top light-sheet microscopy or microcomputed tomography and the resulting 3D datasets were used to train risk-stratification networks based on 5-year biochemical recurrence outcomes via MAMBA. With the 3D block-based approach, MAMBA achieves an area under the receiver operating characteristic curve (AUC) of 0.86 and 0.74, superior to 2D traditional single-slice-based prognostication (AUC of 0.79 and 0.57), suggesting superior prognostication with 3D morphological features. Further analyses reveal that the incorporation of greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, suggesting the value of capturing larger extents of heterogeneous 3D morphology. With the rapid growth and adoption of 3D spatial biology and pathology techniques by researchers and clinicians, MAMBA provides a general and efficient framework for 3D weakly supervised learning for clinical decision support and can help to reveal novel 3D morphological biomarkers for prognosis and therapeutic response. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.11345 [pdf, other]

Sensing Aided Covert Communications: Turning Interference into Allies

Authors: Xinyi Wang, Zesong Fei, Peng Liu, J. Andrew Zhang, Qingqing Wu, Nan Wu

Abstract: In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended… ▽ More In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended Kalman filtering technique is employed to predict its trajectory as well as the corresponding channels. Depending on the maneuvering altitude of adversary target, two channel state information (CSI) models are considered, with the aim of maximizing the covert transmission rate by jointly designing the radar waveform and communication transmit beamforming vector based on the constructed channels. For perfect CSI under the free-space propagation model, by decoupling the joint design, we propose an efficient algorithm to guarantee that the target cannot detect the transmission. For imperfect CSI due to the multi-path components, a robust joint transmission scheme is proposed based on the property of the Kullback-Leibler divergence. The convergence behaviour, tracking MSE, false alarm and missed detection probabilities, and covert transmission rate are evaluated. Simulation results show that the proposed algorithms achieve accurate tracking. For both channel models, the proposed sensing-assisted covert transmission design is able to guarantee the covertness, and significantly outperforms the conventional schemes. △ Less

Submitted 3 January, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: 13 pages, 12 figures, submitted to IEEE journals for potential publication

arXiv:2307.07200 [pdf, other]

Reproducing the Acoustic Velocity Vectors in a Spherical Listening Region

Authors: Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

Abstract: Acoustic velocity vectors (AVVs) are related to the human's perception of sound at low frequencies and are widely used in Ambisonics. This paper proposes a spatial sound field reproduction algorithm called velocity matching, which reproduces the AVVs in the spherical listening region by matching the AVVs' spherical harmonic coefficients. Using the sound field translation formula, the spherical har… ▽ More Acoustic velocity vectors (AVVs) are related to the human's perception of sound at low frequencies and are widely used in Ambisonics. This paper proposes a spatial sound field reproduction algorithm called velocity matching, which reproduces the AVVs in the spherical listening region by matching the AVVs' spherical harmonic coefficients. Using the sound field translation formula, the spherical harmonic coefficients of the AVVs are derived from the spherical harmonic coefficients of the pressure, which can be measured by a higher-order microphone array. Unlike algorithms that only control the AVVs at discrete sweet spots, the proposed velocity matching algorithm manipulates the AVVs in the whole spherical listening region and allows the listener to move beyond the sweet spots. Simulations show the proposed velocity matching algorithm accurately reproduces the AVVs in the spherical listening region and requires fewer number of loudspeakers than pressure matching algorithm. △ Less

Submitted 6 June, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: Submitted to IEEE Signal Processing Letters

arXiv:2306.10982 [pdf, other]

Differentially Private Over-the-Air Federated Learning Over MIMO Fading Channels

Authors: Hang Liu, Jia Yan, Ying-Jun Angela Zhang

Abstract: Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data… ▽ More Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data privacy in FL. Previous research has focused on over-the-air FL with a single-antenna server, leveraging communication noise to enhance user-level DP. This approach achieves the so-called "free DP" by controlling transmit power rather than introducing additional DP-preserving mechanisms at devices, such as adding artificial noise. In this paper, we study differentially private over-the-air FL over a multiple-input multiple-output (MIMO) fading channel. We show that FL model communication with a multiple-antenna server amplifies privacy leakage as the multiple-antenna server employs separate receive combining for model aggregation and information inference. Consequently, relying solely on communication noise, as done in the multiple-input single-output system, cannot meet high privacy requirements, and a device-side privacy-preserving mechanism is necessary for optimal DP design. We analyze the learning convergence and privacy loss of the studied FL system and propose a transceiver design algorithm based on alternating optimization. Numerical results demonstrate that the proposed method achieves a better privacy-learning trade-off compared to prior work. △ Less

Submitted 25 December, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: This work has been accepted by the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2306.09135 [pdf, other]

Time-Domain Wideband Image Source Method for Spherical Microphone Arrays

Authors: Jiarui Wang, Prasanga Samarasinghe, Thushara Abhayapala, Jihui Aimee Zhang

Abstract: This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts wh… ▽ More This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts whose amplitudes are related to the loudspeaker directional impulse responses measured in the far-field. The TDW-SMIR generator uses geometric models to derive the time-domain signals recorded by the spherical microphone array. Comparisons are made with frequency-domain single band ISMs. Simulation results prove the results of the TDW-SMIR generator are similar to those of frequency-domain single band ISMs. △ Less

Submitted 9 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted for publication in the IEEE 25th International Workshop on Multimedia Signal Processing (IEEE MMSP 2023)

arXiv:2306.04512 [pdf, other]

doi 10.1364/BOE.512337

Cross-attention learning enables real-time nonuniform rotational distortion correction in OCT

Authors: Haoran Zhang, Jianlong Yang, Jingqian Zhang, Shiqing Zhao, Aili Zhang

Abstract: Nonuniform rotational distortion (NURD) correction is vital for endoscopic optical coherence tomography (OCT) imaging and its functional extensions, such as angiography and elastography. Current NURD correction methods require time-consuming feature tracking or cross-correlation calculations and thus sacrifice temporal resolution. Here we propose a cross-attention learning method for the NURD corr… ▽ More Nonuniform rotational distortion (NURD) correction is vital for endoscopic optical coherence tomography (OCT) imaging and its functional extensions, such as angiography and elastography. Current NURD correction methods require time-consuming feature tracking or cross-correlation calculations and thus sacrifice temporal resolution. Here we propose a cross-attention learning method for the NURD correction in OCT. Our method is inspired by the recent success of the self-attention mechanism in natural language processing and computer vision. By leveraging its ability to model long-range dependencies, we can directly obtain the correlation between OCT A-lines at any distance, thus accelerating the NURD correction. We develop an end-to-end stacked cross-attention network and design three types of optimization constraints. We compare our method with two traditional feature-based methods and a CNN-based method, on two publicly-available endoscopic OCT datasets and a private dataset collected on our home-built endoscopic OCT system. Our method achieved a $\sim3\times$ speedup to real time ($26\pm 3$ fps), and superior correction performance. △ Less

Submitted 5 January, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Journal ref: Biomedical Optics Express 15.1 (2024): 319-335

arXiv:2306.00804 [pdf, other]

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

Authors: Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

Abstract: By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words. However, for real-world voice assistants, always biasing on such personalized words with high prediction scores can significantly degrade the performance of recognizing common words. To address this issue, we propose an adaptive contextual bias… ▽ More By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words. However, for real-world voice assistants, always biasing on such personalized words with high prediction scores can significantly degrade the performance of recognizing common words. To address this issue, we propose an adaptive contextual biasing method based on Context-Aware Transformer Transducer (CATT) that utilizes the biased encoder and predictor embeddings to perform streaming prediction of contextual phrase occurrences. Such prediction is then used to dynamically switch the bias list on and off, enabling the model to adapt to both personalized and common scenarios. Experiments on Librispeech and internal voice assistant datasets show that our approach can achieve up to 6.7% and 20.7% relative reduction in WER and CER compared to the baseline respectively, mitigating up to 96.7% and 84.9% of the relative WER and CER increase for common cases. Furthermore, our approach has a minimal performance impact in personalized scenarios while maintaining a streaming inference pipeline with negligible RTF increase. △ Less

Submitted 15 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.17938 [pdf, other]

Complex CNN CSI Enhancer for Integrated Sensing and Communications

Authors: Xu Chen, Zhiyong Feng, J. Andrew Zhang, Feifei Gao, Xin Yuan, Zhaohui Yang, Ping Zhang

Abstract: In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhance… ▽ More In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhancer, we use the complex-valued computation layers to form the CNN, which maintains the phase information of CSI. We also transform the CSI into the sparse angle-delay domain, leading to heatmap images with prominent peaks that can be efficiently processed by CNN. Based on the enhanced CSI outputs, we further propose a novel biased fast Fourier transform (FFT)-based sensing scheme for improving the range sensing accuracy, by artificially introducing phase biasing terms. Extensive simulation results show that the ISAC complex CNN CSI enhancer can converge within 30 training epochs. The normalized mean square error (NMSE) of its CSI estimates is about 17 dB lower than that of the linear minimum mean square error (LMMSE) estimator, and the bit error rate (BER) of demodulation using the enhanced CSI estimation approaches that with perfect CSI. Finally, the range estimation MSE of the proposed biased FFT-based sensing method approaches that of the subspace-based sensing method, at a much lower complexity. △ Less

Submitted 19 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 13 pages, 15 figures, submitted to IEEE Journal of Selected Topics in Signal Processing

arXiv:2305.12493 [pdf, other]

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

Authors: Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

Abstract: Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context… ▽ More Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list. △ Less

Submitted 12 July, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: Accepted by interspeech2023

arXiv:2305.11548 [pdf, ps, other]

Sensing Aided Uplink Transmission in OTFS ISAC with Joint Parameter Association, Channel Estimation and Signal Detection

Authors: Xi Yang, Hang Li, Qinghua Guo, J. Andrew Zhang, Xiaojing Huang, Zhiqun Cheng

Abstract: In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts paramet… ▽ More In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts parameter association, channel estimation and signal detection by formulating it as a constrained bilinear recovery problem. Then we develop a message passing algorithm to solve the problem, leveraging the bilinear unitary approximate message passing (Bi-UAMP) algorithm. Numerical results validate the proposed scheme, which show that relevant performance bounds can be closely approached. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.11057 [pdf, other]

Vital Sign Monitoring in Dynamic Environment via mmWave Radar and Camera Fusion

Authors: Yingqi Wang, Zhongqin Wang, J. Andrew Zhang, Haimin Zhang, Min Xu

Abstract: Contact-free vital sign monitoring, which uses wireless signals for recognizing human vital signs (i.e, breath and heartbeat), is an attractive solution to health and security. However, the subject's body movement and the change in actual environments can result in inaccurate frequency estimation of heartbeat and respiratory. In this paper, we propose a robust mmWave radar and camera fusion system… ▽ More Contact-free vital sign monitoring, which uses wireless signals for recognizing human vital signs (i.e, breath and heartbeat), is an attractive solution to health and security. However, the subject's body movement and the change in actual environments can result in inaccurate frequency estimation of heartbeat and respiratory. In this paper, we propose a robust mmWave radar and camera fusion system for monitoring vital signs, which can perform consistently well in dynamic scenarios, e.g., when some people move around the subject to be tracked, or a subject waves his/her arms and marches on the spot. Three major processing modules are developed in the system, to enable robust sensing. Firstly, we utilize a camera to assist a mmWave radar to accurately localize the subjects of interest. Secondly, we exploit the calculated subject position to form transmitting and receiving beamformers, which can improve the reflected power from the targets and weaken the impact of dynamic interference. Thirdly, we propose a weighted multi-channel Variational Mode Decomposition (WMC-VMD) algorithm to separate the weak vital sign signals from the dynamic ones due to subject's body movement. Experimental results show that, the 90${^{th}}$ percentile errors in respiration rate (RR) and heartbeat rate (HR) are less than 0.5 RPM (respirations per minute) and 6 BPM (beats per minute), respectively. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2303.06341 [pdf, other]

The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge

Authors: Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen

Abstract: This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. Specifically, the weighted prediction error (WPE) and guided source separation (GSS) techniques are used to reduce reverberation and generate clean signals for each single speaker first. Then, we explore the effectivenes… ▽ More This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. Specifically, the weighted prediction error (WPE) and guided source separation (GSS) techniques are used to reduce reverberation and generate clean signals for each single speaker first. Then, we explore the effectiveness of Branchformer and E-Branchformer based ASR systems. To better make use of the visual modality, a cross-attention based multi-modal fusion module is proposed, which explicitly learns the contextual relationship between different modalities. Experiments show that our system achieves a concatenated minimum-permutation character error rate (cpCER) of 28.13\% and 31.21\% on the Dev and Eval set, and obtains second place in the challenge. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 2 pages, accepted by ICASSP 2023

arXiv:2303.04696 [pdf, other]

VOLTA: an Environment-Aware Contrastive Cell Representation Learning for Histopathology

Authors: Ramin Nakhli, Allen Zhang, Hossein Farahani, Amirali Darbandsari, Elahe Shenasa, Sidney Thiessen, Katy Milne, Jessica McAlpine, Brad Nelson, C Blake Gilks, Ali Bashashati

Abstract: In clinical practice, many diagnosis tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques require labels, providing manual cell annotations is time-consuming due to the large number of cells. In this paper, we propose a self-supervised framework (VOLTA) for cell representation learning in histopathology images using a novel technique that… ▽ More In clinical practice, many diagnosis tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques require labels, providing manual cell annotations is time-consuming due to the large number of cells. In this paper, we propose a self-supervised framework (VOLTA) for cell representation learning in histopathology images using a novel technique that accounts for the cell's mutual relationship with its environment for improved cell representations. We subjected our model to extensive experiments on the data collected from multiple institutions around the world comprising of over 700,000 cells, four cancer types, and cell types ranging from three to six categories for each dataset. The results show that our model outperforms the state-of-the-art models in cell representation learning. To showcase the potential power of our proposed framework, we applied VOLTA to ovarian and endometrial cancers with very small sample sizes (10-20 samples) and demonstrated that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide novel insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised deep learning models that require large sample sizes for training, we provide a framework that can empower new discoveries without any annotation data in situations where sample sizes are limited. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.01771 [pdf, other]

doi 10.1109/TCOMM.2023.3344143

Joint Beamforming for RIS-Assisted Integrated Sensing and Communication Systems

Authors: Yongqing Xu, Yong Li, J. Andrew Zhang, Marco Di Renzo, Tony Q. S. Quek

Abstract: Integrated sensing and communications (ISAC) is an emerging critical technique for the next generation of communication systems. However, due to multiple performance metrics used for communication and sensing, the limited degrees-of-freedom (DoF) in optimizing ISAC systems poses a challenge. Reconfigurable intelligent surfaces (RIS) can introduce new DoF for beamforming in ISAC systems, thereby en… ▽ More Integrated sensing and communications (ISAC) is an emerging critical technique for the next generation of communication systems. However, due to multiple performance metrics used for communication and sensing, the limited degrees-of-freedom (DoF) in optimizing ISAC systems poses a challenge. Reconfigurable intelligent surfaces (RIS) can introduce new DoF for beamforming in ISAC systems, thereby enhancing the performance of communication and sensing simultaneously. In this paper, we propose two optimization techniques for beamforming in RIS-assisted ISAC systems. The first technique is an alternating optimization (AO) algorithm based on the semidefinite relaxation (SDR) method and a one-dimension iterative (ODI) algorithm, which can maximize the radar mutual information (MI) while imposing constraints on the communication rates. The second technique is an AO algorithm based on the Riemannian gradient (RG) method, which can maximize the weighted ISAC performance metrics. Simulation results verify the effectiveness of the proposed schemes. The AO-SDR-ODI method is shown to achieve better communication and sensing performance, than the AO-RG method, at a higher complexity. It is also shown that the mean-squared-error (MSE) of the estimates of the sensing parameters decreases as the radar MI increases. △ Less

Submitted 24 January, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 30 pages, 8 figures. This paper has been accepted by IEEE Transactions on Communications

arXiv:2302.13523 [pdf, other]

VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting

Authors: Ao Zhang, He Wang, Pengcheng Guo, Yihui Fu, Lei Xie, Yingying Gao, Shilei Zhang, Junlan Feng

Abstract: The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the e… ▽ More The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the exclusively learned representations of different modalities, instead of exploring the modal relationships during each respective modeling. In this paper, we propose a novel visual modality enhanced end-to-end KWS framework (VE-KWS), which fuses audio and visual modalities from two aspects. The first one is utilizing the speaker location information obtained from the lip region in videos to assist the training of multi-channel audio beamformer. By involving the beamformer as an audio enhancement module, the acoustic distortions, caused by the far field or noisy environments, could be significantly suppressed. The other one is conducting cross-attention between different modalities to capture the inter-modal relationships and help the representation learning of each modality. Experiments on the MSIP challenge corpus show that our proposed model achieves 2.79% false rejection rate and 2.95% false alarm rate on the Eval set, resulting in a new SOTA performance compared with the top-ranking systems in the ICASSP2022 MISP challenge. △ Less

Submitted 14 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 5 pages. Accepted at ICASSP2023

Showing 1–50 of 121 results for author: Zhang, A