-
End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework
Authors:
Chang Cai,
Xiaojun Yuan,
Ying-Jun Angela Zhang
Abstract:
This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu…
▽ More
This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formulate the E2E design of feature encoding, MIMO precoding, and classification as a conditional mutual information maximization problem. However, it is notoriously difficult to design and train an E2E network that can be adaptive to both the task dataset and different channel realizations. Regarding network training, we propose a decoupled pretraining framework that separately trains the feature encoder and the MIMO precoder, with a maximum a posteriori (MAP) classifier employed at the server to generate the inference result. The feature encoder is pretrained exclusively using the task dataset, while the MIMO precoder is pretrained solely based on the channel and noise distributions. Nevertheless, we manage to align the pretraining objectives of each individual component with the E2E learning objective, so as to approach the performance bound of E2E learning. By leveraging the decoupled pretraining results for initialization, the E2E learning can be conducted with minimal training overhead. Regarding network architecture design, we develop two deep unfolded precoding networks that effectively incorporate the domain knowledge of the solution to the decoupled precoding problem. Simulation results on both the CIFAR-10 and ModelNet10 datasets verify that the proposed method achieves significantly higher classification accuracy compared to various baselines.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Joint Offloading and Beamforming Design in Integrating Sensing, Communication, and Computing Systems: A Distributed Approach
Authors:
Peng Liu,
Zesong Fei,
Xinyi Wang,
Jingxuan Huang,
Jie Hu,
J. Andrew Zhang
Abstract:
When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and…
▽ More
When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and the cloud server to address the sensing tasks of ISAC terminals. Specifically, we propose a novel three-tier integrated sensing, communication, and computing (ISCC) framework composed of one cloud server, multiple MEC servers, and multiple terminals, where the terminals can optionally offload sensing data to the MEC server or the cloud server. The offload message is sent via the ISAC waveform, whose echo is used for sensing. We jointly optimize the computation offloading and beamforming strategies to minimize the average execution latency while satisfying sensing requirements. In particular, we propose a low-complexity distributed algorithm to solve the problem. Firstly, we use the alternating direction method of multipliers (ADMM) and derive the closed-form solution for offloading decision variables. Subsequently, we convert the beamforming optimization sub-problem into a weighted minimum mean-square error (WMMSE) problem and propose a fractional programming based algorithm. Numerical results demonstrate that the proposed ISCC framework and distributed algorithm significantly reduce the execution latency and the energy consumption of sensing tasks at a lower computational complexity compared to existing schemes.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Learning Multi-Rate Task-Oriented Communications Over Symmetric Discrete Memoryless Channels
Authors:
Anbang Zhang,
Shuaishuai Guo
Abstract:
This letter introduces a multi-rate task-oriented communication (MR-ToC) framework. This framework dynamically adapts to variations in affordable data rate within the communication pipeline. It conceptualizes communication pipelines as symmetric, discrete, memoryless channels. We employ a progressive learning strategy to train the system, comprising a nested codebook for encoding and task inferenc…
▽ More
This letter introduces a multi-rate task-oriented communication (MR-ToC) framework. This framework dynamically adapts to variations in affordable data rate within the communication pipeline. It conceptualizes communication pipelines as symmetric, discrete, memoryless channels. We employ a progressive learning strategy to train the system, comprising a nested codebook for encoding and task inference. This configuration allows for the adjustment of multiple rate levels in response to evolving channel conditions. The results from our experiments show that this system not only supports edge inference across various coding levels but also excels in adapting to variable communication environments.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results
Authors:
Maksim Smirnov,
Aleksandr Gushchin,
Anastasia Antsiferova,
Dmitry Vatolin,
Radu Timofte,
Ziheng Jia,
Zicheng Zhang,
Wei Sun,
Jiaying Qian,
Yuqin Cao,
Yinan Sun,
Yuxin Zhu,
Xiongkuo Min,
Guangtao Zhai,
Kanjar De,
Qing Luo,
Ao-Xiang Zhang,
Peng Zhang,
Haibo Lei,
Linyan Jiang,
Yaqing Li,
Wenhui Meng,
Xiaoheng Tan,
Haiqiang Wang,
Xiaozhong Xu
, et al. (11 additional authors not shown)
Abstract:
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat…
▽ More
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html.
△ Less
Submitted 28 August, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Trustworthy Semantic-Enabled 6G Communication: A Task-oriented and Privacy-preserving Perspective
Authors:
Shuaishuai Guo,
Anbang Zhang,
Yanhu Wang,
Chenyuan Feng,
Tony Q. S. Quek
Abstract:
Trustworthy task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the…
▽ More
Trustworthy task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the original data from the transmitted features. This article provides an in-depth analysis of privacy-preserving strategies specifically designed for ToSC relying on deep neural network-based joint source and channel coding (DeepJSCC). The study encompasses a detailed comparative assessment of trustworthy feature perturbation methods such as differential privacy and encryption, alongside intrinsic security incorporation approaches like adversarial learning to train the JSCC and learning-based vector quantization (LBVQ). This comparative analysis underscores the integration of advanced explainable learning algorithms into communication systems, positing a new benchmark for privacy standards in the forthcoming 6G era.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Multistain Pretraining for Slide Representation Learning in Pathology
Authors:
Guillaume Jaume,
Anurag Vaidya,
Andrew Zhang,
Andrew H. Song,
Richard J. Chen,
Sharifa Sahai,
Dandan Mo,
Emilio Madrigal,
Long Phi Le,
Faisal Mahmood
Abstract:
Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learnin…
▽ More
Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Wireless Communications in Doubly Selective Channels with Domain Adaptivity
Authors:
J. Andrew Zhang,
Hongyang Zhang,
Kai Wu,
Xiaojing Huang,
Jinhong Yuan,
Y. Jay Guo
Abstract:
Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article expl…
▽ More
Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article explores domain-adaptive system design, dynamically selecting best-fit domains for modulation, pilot placement, and equalization based on channel conditions, to enhance performance across diverse environments. We examine domain classifications and connections, signal designs, and equalization techniques with domain adaptivity, and finally highlight future research opportunities.
△ Less
Submitted 31 July, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Efffcient Sensing Parameter Estimation with Direct Clutter Mitigation in Perceptive Mobile Networks
Authors:
Hang Li,
Hongming Yang,
Qinghua Guo,
J. Andrew Zhang,
Yang Xiang,
Yashan Pang
Abstract:
In this work, we investigate sensing parameter estimation in the presence of clutter in perceptive mobile networks (PMNs) that integrate radar sensing into mobile communications. Performing clutter suppression before sensing parameter estimation is generally desirable as the number of sensing parameters can be signiffcantly reduced. However, existing methods require high-complexity clutter mitigat…
▽ More
In this work, we investigate sensing parameter estimation in the presence of clutter in perceptive mobile networks (PMNs) that integrate radar sensing into mobile communications. Performing clutter suppression before sensing parameter estimation is generally desirable as the number of sensing parameters can be signiffcantly reduced. However, existing methods require high-complexity clutter mitigation and sensing parameter estimation, where clutter is ffrstly identiffed and then removed. In this correspondence, we propose a much simpler but more effective method by incorporating a clutter cancellation mechanism in formulating a sparse signal model for sensing parameter estimation.
In particular, clutter mitigation is performed directly on the received signals and the unitary approximate message passing (UAMP) is leveraged to exploit the common support for sensing parameter estimation in the formulated sparse signal recovery problem. Simulation results show that, compared to state-of-theart methods, the proposed method delivers signiffcantly better performance while with substantially reduced complexity.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Performance Analysis and Low-Complexity Beamforming Design for Near-Field Physical Layer Security
Authors:
Yunpu Zhang,
Yuan Fang,
Xianghao Yu,
Changsheng You,
Ying-Jun Angela Zhang
Abstract:
Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing ex…
▽ More
Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing extra degrees of freedom (DoFs) to wireless system design. In this paper, we explore the beam focusing-based physical layer security (PLS) in the near field, where multiple legitimate users and one eavesdropper are situated in the near-field region of the XL-array base station (BS). First, we consider a special case with one legitimate user and one eavesdropper to shed useful insights into near-field PLS. In particular, it is shown that 1) Artificial noise (AN) is crucial to near-field security provisioning, transforming an insecure system to a secure one; 2) AN can yield numerous security gains, which considerably enhances PLS in the near field as compared to the case without AN taken into account. Next, for the general case with multiple legitimate users, we propose an efficient low-complexity approach to design the beamforming with AN to guarantee near-field secure transmission. Specifically, the low-complexity approach is conceived starting by introducing the concept of interference domain to capture the inter-user interference level, followed by a three-step identification framework for designing the beamforming. Finally, numerical results reveal that 1) the PLS enhancement in the near field is pronounced thanks to the additional spatial DoFs; 2) the proposed approach can achieve close performance to that of the computationally-extensive conventional approach yet with a significantly lower computational complexity.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design
Authors:
Yuan Fang,
Xianghao Yu,
Jie Xu,
Ying-Jun Angela Zhang
Abstract:
This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to…
▽ More
This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cramér-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation.
△ Less
Submitted 18 July, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation
Authors:
Zhiqiang Xiao,
Xianda Liu,
Yong Zeng,
J. Andrew Zhang,
Shi Jin,
Rui Zhang
Abstract:
Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design…
▽ More
Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design. In this article, by leveraging the super spatial resolution of large antenna arrays and the multi-path spatial sparsity of highfrequency wireless channels, we introduce a new approach for waveform design based on the recently proposed delay-Doppler alignment modulation (DDAM). In particular, DDAM makes a paradigm shift of waveform design from the conventional manner of tolerating channel delay and Doppler spreads to actively manipulating them. First, we review the fundamental constraints and performance limitations of orthogonal frequency division multiplexing (OFDM) and introduce new opportunities for 6G waveform design. Next, the motivations and basic principles of DDAM are presented, followed by its various extensions to different wireless system setups. Finally, the main design considerations for DDAM are discussed and the new opportunities for future research are highlighted.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model
Authors:
Hang Fu,
Genyun Sun,
Yinhe Li,
Jinchang Ren,
Aizhu Zhang,
Cheng Jing,
Pedram Ghamisi
Abstract:
Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Ins…
▽ More
Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Inspired by the ability of Mamba to model long-range dependencies with linear complexity, we explore its potential for HSI dehazing and propose the first HSI Dehazing Mamba (HDMba) network. Specifically, we design a novel window selective scan module (WSSM) that captures local dependencies within windows and global correlations between windows by partitioning them. This approach improves the ability of conventional Mamba in local feature extraction. By modeling the local and global spectral-spatial information flow, we achieve a comprehensive analysis of hazy regions. The DehazeMamba layer (DML), constructed by WSSM, and residual DehazeMamba (RDM) blocks, composed of DMLs, are the core components of the HDMba framework. These components effectively characterize the complex distribution of haze in HSIs, aiding in scene reconstruction and dehazing. Experimental results on the Gaofen-5 HSI dataset demonstrate that HDMba outperforms other state-of-the-art methods in dehazing performance. The code will be available at https://github.com/RsAI-lab/HDMba.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models
Authors:
Shuaishuai Guo,
Yanhu Wang,
Jia Ye,
Anbang Zhang,
Kun Xu
Abstract:
Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua…
▽ More
Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visual data, expressed as natural language descriptions. These are further refined using a pre-trained large language model (LLM) for importance quantification and semantic error correction. The subsequent semantic importance-aware communications (SIAC) aim to minimize semantic loss while respecting transmission delay constraints, exemplified through adaptive modulation and coding strategies. At the receiving end, LLM-based semantic error correction is utilized. If visual data recreation is desired, a pre-trained generative artificial intelligence (AI) model can regenerate it using the corrected descriptions. We assess semantic similarities between transmitted and recovered content, demonstrating ULSC's superior ability to convey semantic understanding compared to feature-level semantic communications (FLSC). ULSC's conversion of visual data to natural language facilitates various cognitive tasks, leveraging human knowledge bases. Additionally, this method enhances privacy, as neither original data nor features are directly transmitted.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Revealing the Trade-off in ISAC Systems: The KL Divergence Perspective
Authors:
Zesong Fei,
Shuntian Tang,
Xinyi Wang,
Fanghao Xia,
Fan Liu,
J. Andrew Zhang
Abstract:
Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detecti…
▽ More
Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detection. Thereafter, we investigate the impact of constellation and beamforming design on the Pareto bound via deep learning and semi-definite relaxation (SDR) techniques. Simulation results show the trade-off between sensing and communication performance in terms of bit error rate (BER) and probability of detection under different parameter set-ups.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Heuristic Solution to Joint Deployment and Beamforming Design for STAR-RIS Aided Networks
Authors:
Bai Yan,
Qi Zhao,
Jin Zhang,
J. Andrew Zhang
Abstract:
This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities…
▽ More
This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities and fully boosting the system's performance. We consider a sum rate maximization problem with joint optimization and hybrid beamforming design. An offline heuristic solution is proposed for the problem, developed based on differential evolution and semi-definite programming methods. In particular, a point-point representation is proposed for characterizing and exploiting the user-grouping. A balanced grouping method is designed to achieve a desired user grouping with low complexity. Numerical results demonstrate the substantial performance gains achievable through optimal deployment design.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Interference Management for Full-Duplex ISAC in B5G/6G Networks: Architectures, Challenges, and Solutions
Authors:
Aimin Tang,
Xudong Wang,
J. Andrew Zhang
Abstract:
Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the a…
▽ More
Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the ability of full-duplex communications is usually ignored. In this article, we provide an overview of full-duplex ISAC (FD-ISAC), where a full-duplex radio is used for both wireless sensing and full-duplex communications in B5G/6G networks, with a focus on the fundamental interference management problem in such networks. First, different ISAC architectures are introduced, considering different full-duplex communication modes and wireless sensing modes. Next, the challenging issues of link-level interference and network-level interference are analyzed, illustrating a critical demand on interference management for FD-ISAC. Potential solutions to interference management are then reviewed from the perspective of radio architecture design, beamforming, mode selection, and resource allocation. The corresponding open problems are also highlighted.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Reproducing the Acoustic Velocity Vectors in a Circular Listening Area
Authors:
Jiarui Wang,
Thushara Abhayapala,
Jihui Aimee Zhang,
Prasanga Samarasinghe
Abstract:
Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas meas…
▽ More
Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Sweet spots restrict listener's movement, whereas measuring the acoustic velocity vectors on the boundary requires complicated measurement setup. This paper proposes the cylindrical harmonic coefficients of the acoustic velocity vectors in a circular area (CHV coefficients), which are calculated from the cylindrical harmonic coefficients of the global pressure (global CHP coefficients) by using the sound field translation formula. The global CHP coefficients can be measured by a circular microphone array, which can be bought off-the-shelf. By matching the CHV coefficients, the acoustic velocity vectors are reproduced throughout the listening area. Hence, listener's movements are allowed. Simulations show that at low frequency, where the acoustic velocity vectors are the dominant factor for localization, the proposed reproduction method based on the CHV coefficients results in higher accuracy in reproduced acoustic velocity vectors when compared with traditional method based on the global CHP coefficients.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Multistep Inverse Is Not All You Need
Authors:
Alexander Levine,
Peter Stone,
Amy Zhang
Abstract:
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the…
▽ More
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Performance Bounds for Passive Sensing in Asynchronous ISAC Systems -- Appendices
Authors:
Jingbo Zhao,
Zhaoming Lu,
J. Andrew Zhang,
Weicai Li,
Yifeng Xiong,
Zijun Han,
Xiangming Wen,
Tao Gu
Abstract:
This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the f…
▽ More
This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the findings and methodologies presented in the main text. All external references to equations, theorems, and so forth, are directed towards the corresponding elements within the main paper.
△ Less
Submitted 29 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Black-box Adversarial Attacks Against Image Quality Assessment Models
Authors:
Yu Ran,
Ao-Xiang Zhang,
Mingjie Li,
Weixuan Tang,
Yuan-Gen Wang
Abstract:
The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the atta…
▽ More
The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the attack problem as maximizing the deviation between the estimated quality scores of original and perturbed images, while restricting the perturbed image distortions for visual quality preservation. Under such formulation, we then design a Bi-directional loss function to mislead the estimated quality scores of adversarial examples towards an opposite direction with maximum deviation. On this basis, we finally develop an efficient and effective black-box attack method against NR-IQA models. Extensive experiments reveal that all the evaluated NR-IQA models are vulnerable to the proposed attack method. And the generated perturbations are not transferable, enabling them to serve the investigation of specialities of disparate IQA models.
△ Less
Submitted 28 February, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective
Authors:
Kai Wu,
Jacopo Pegoraro,
Francesca Meneghello,
J. Andrew Zhang,
Jesus O. Lacruz,
Joerg Widmer,
Francesco Restuccia,
Michele Rossi,
Xiaojing Huang,
Daqing Zhang,
Giuseppe Caire,
Y. Jay Guo
Abstract:
Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a…
▽ More
Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks at far-separated transmitters and receivers. This causes the received signal to be affected by time-varying random phase offsets, severely degrading, or even failing, direct sensing. Hence, to effectively enable ISAC, considerable research has been directed toward addressing the clock asynchronism issue in bi-static sensing. This paper provides an overview of the issue and existing techniques developed in an ISAC background. Based on the review and comparison, we also draw insights into the future research directions and open problems, aiming to nurture the maturation of bi-static sensing in ISAC.
△ Less
Submitted 24 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Moment-based metrics for molecules computable from cryo-EM images
Authors:
Andy Zhang,
Oscar Mickelin,
Joe Kileel,
Eric J. Verbeke,
Nicholas F. Marshall,
Marc Aurèle Gilles,
Amit Singer
Abstract:
Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric…
▽ More
Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric between two molecular structures, which does not require 3-D alignment. Further, we introduce a metric between a stack of projection images and a molecular structure, which is invariant to rotations and reflections and does not require performing 3-D reconstruction. Additionally, the latter metric does not assume a uniform distribution of viewing angles. We demonstrate uses of the new metrics on synthetic and experimental datasets, highlighting their ability to measure structural similarity.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks
Authors:
Yanmo Hu,
J. Andrew Zhang,
Weibo Deng,
Y. Jay Guo
Abstract:
Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef…
▽ More
Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and effective uplink sensing scheme with the assistance of static anchor points. Two major algorithms are proposed in the scheme. The first algorithm estimates the relative timing and carrier frequency offsets due to clock asynchronism, with respect to those at a randomly selected reference snapshot. Theoretical performance analysis is provided for the algorithm. The estimates from the first algorithm are then used to compensate for the offsets and generate the angle-Doppler maps. Using the maps, the second algorithm identifies the anchor points, and then locates the UE and dynamic targets. Feasibility of UE localization is also analyzed. Simulation results are provided and demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems
Authors:
Yanmo Hu,
Kai Wu,
J. Andrew Zhang,
Weibo Deng,
Y. Jay Guo
Abstract:
Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r…
▽ More
Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the research gap. Focusing on a single dynamic path in high-SNR scenarios, we derive the closed-form CRB. Then, through analyzing the mutual interference between dynamic and static paths, we simplify the CRB results by deriving close approximations, further unveiling new insights of the impact of numerous physical parameters on Doppler sensing. Moreover, utilizing the new CRB and analyses, we propose novel waveform optimization strategies for noise- and interference-limited sensing scenarios, which are also empowered by closed-form and efficient solutions. Extensive simulation results are provided to validate the preciseness of the derived CRB results and analyses, with the aid of the maximum-likelihood estimator. The results also demonstrate the substantial enhanced Doppler sensing accuracy and the sensing capabilities for low-speed target achieved by the proposed waveform design.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
Authors:
He Wang,
Pengcheng Guo,
Yue Li,
Ao Zhang,
Jiayao Sun,
Lei Xie,
Wei Chen,
Pan Zhou,
Hui Bu,
Xin Xu,
Binbin Zhang,
Zhuo Chen,
Jian Wu,
Longbiao Wang,
Eng Siong Chng,
Sun Li
Abstract:
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours…
▽ More
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.
△ Less
Submitted 20 February, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias
Authors:
Ao Zhang,
Pan Zhou,
Kaixun Huang,
Yong Zou,
Ming Liu,
Lei Xie
Abstract:
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu…
▽ More
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41% compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Densifying MIMO: Channel Modeling, Physical Constraints, and Performance Evaluation for Holographic Communications
Authors:
Y. Liu,
M. Zhang,
T. Wang,
A. Zhang,
M. Debbah
Abstract:
As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevert…
▽ More
As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevertheless, some challenges must be addressed to actualize this innovative concept. One is the mutual coupling among antenna elements within an array. When the element spacing is small, near-field coupling becomes the dominant factor that strongly restricts the array performance. Another is the polarization of electromagnetic waves. As an intrinsic property, it was not fully considered in the previous channel modeling of holographic communication. The third is the lack of real-world experiments to show the potential and possible defects of a holographic communication system. In this paper, we propose an electromagnetic channel model based on the characteristics of electromagnetic waves. This model encompasses the impact of mutual coupling in the transceiver sides and the depolarization in the propagation environment. Furthermore, by approximating an infinite array, the performance restrictions of large-scale dense antenna arrays are also studied theoretically to exploit the potential of the proposed channel. In addition, numerical simulations and a channel measurement experiment are conducted. The findings reveal that within limited spaces, the coupling effect, particularly for element spacing smaller than half of the wavelength, is the primary factor leading to the inflection point for the performance of holographic communications.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Time and Frequency Offset Estimation and Intercarrier Interference Cancellation for AFDM Systems
Authors:
Yuankun Tang,
Anjie Zhang,
Miaowen Wen,
Yu Huang,
Fei Ji,
Jinming Wen
Abstract:
Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency o…
▽ More
Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency offset by comparing the correlations of samples. Moreover, we propose the other so-called stepwise ML estimator to reduce the complexity. Both proposed estimators exploit the redundant information contained within the chirp-periodic prefix inherent in AFDM symbols, thus dispensing with any additional pilots. To further mitigate the intercarrier interference resulting from the residual frequency offset, we design a mirror-mapping-based scheme for AFDM systems. Numerical results verify the effectiveness of the proposed time and carrier frequency offset estimation criteria and the mirror-mapping-based modulation for AFDM systems.
△ Less
Submitted 28 December, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach
Authors:
Zhiqing Wei,
Jinghui Piao,
Xin Yuan,
Huici Wu,
J. Andrew Zhang,
Zhiyong Feng,
Lin Wang,
Ping Zhang
Abstract:
Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified perfor…
▽ More
Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified performance metrics in information theory, namely mutual information (MI), to measure the communication and sensing performance in multicarrier ISAC system. In multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system, we first derive the sensing and communication MI with subcarrier correlation and spatial correlation. Then, we propose optimal waveform designs for maximizing the sensing MI, communication MI and the weighted sum of sensing and communication MI, respectively. The optimization results are validated by Monte Carlo simulations. Our work provides effective closed-form expressions for waveform design, enabling the realization of MIMO-OFDM ISAC system with balanced performance in communication and sensing.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition
Authors:
Kaixun Huang,
Ao Zhang,
Binbin Zhang,
Tianyi Xu,
Xingchen Song,
Lei Xie
Abstract:
The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t…
▽ More
The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Integrated Communication, Sensing, and Computation Framework for 6G Networks
Authors:
Xu Chen,
Zhiyong Feng,
J. Andrew Zhang,
Zhaohui Yang,
Xin Yuan,
Xinxin He,
Ping Zhang
Abstract:
In the sixth generation (6G) era, intelligent machine network (IMN) applications, such as intelligent transportation, require collaborative machines with communication, sensing, and computation (CSC) capabilities. This article proposes an integrated communication, sensing, and computation (ICSAC) framework for 6G to achieve the reciprocity among CSC functions to enhance the reliability and latency…
▽ More
In the sixth generation (6G) era, intelligent machine network (IMN) applications, such as intelligent transportation, require collaborative machines with communication, sensing, and computation (CSC) capabilities. This article proposes an integrated communication, sensing, and computation (ICSAC) framework for 6G to achieve the reciprocity among CSC functions to enhance the reliability and latency of communication, accuracy and timeliness of sensing information acquisition, and privacy and security of computing to realize the IMN applications. Specifically, the sensing and communication functions can merge into unified platforms using the same transmit signals, and the acquired real-time sensing information can be exploited as prior information for intelligent algorithms to enhance the performance of communication networks. This is called the computing-empowered integrated sensing and communications (ISAC) reciprocity. Such reciprocity can further improve the performance of distributed computation with the assistance of networked sensing capability, which is named the sensing-empowered integrated communications and computation (ICAC) reciprocity. The above ISAC and ICAC reciprocities can enhance each other iteratively and finally lead to the ICSAC reciprocity. To achieve these reciprocities, we explore the potential enabling technologies for the ICSAC framework. Finally, we present the evaluation results of crucial enabling technologies to show the feasibility of the ICSAC framework.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks
Authors:
Ao-Xiang Zhang,
Yu Ran,
Weixuan Tang,
Yuan-Gen Wang
Abstract:
No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has…
▽ More
No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has received little attention in the academic community. In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack. Specifically, considering both the attack effect on quality score and the visual quality of adversarial video, the attack problem is formulated as misleading the estimated quality score under the constraint of just-noticeable difference (JND). Built upon such formulation, a novel loss function called Score-Reversed Boundary Loss is designed to push the adversarial video's estimated quality score far away from its ground-truth score towards a specific boundary, and the JND constraint is modeled as a strict $L_2$ and $L_\infty$ norm restriction. By this means, both white-box and black-box attacks can be launched in an effective and imperceptible manner. The source code is available at https://github.com/GZHU-DVL/AttackVQA.
△ Less
Submitted 20 October, 2023; v1 submitted 24 September, 2023;
originally announced September 2023.
-
An Active Noise Control System Based on Soundfield Interpolation Using a Physics-informed Neural Network
Authors:
Yile Angela Zhang,
Fei Ma,
Thushara Abhayapala,
Prasanga Samarasinghe,
Amy Bastine
Abstract:
Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural…
▽ More
Conventional multiple-point active noise control (ANC) systems require placing error microphones within the region of interest (ROI), inconveniencing users. This paper designs a feasible monitoring microphone arrangement placed outside the ROI, providing a user with more freedom of movement. The soundfield within the ROI is interpolated from the microphone signals using a physics-informed neural network (PINN). PINN exploits the acoustic wave equation to assist soundfield interpolation under a limited number of monitoring microphones, and demonstrates better interpolation performance than the spherical harmonic method in simulations. An ANC system is designed to take advantage of the interpolated signal to reduce noise signal within the ROI. The PINN-assisted ANC system reduces noise more than that of the multiple-point ANC system in simulations.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction
Authors:
Chang Cai,
Xiaojun Yuan,
Ying-Jun Angela Zhang
Abstract:
In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency betwee…
▽ More
In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency between the design objectives may hinder the exploitation of the full benefits of task-oriented communications. In this paper, we consider a task-oriented multi-device edge inference system over a multiple-input multiple-output (MIMO) multiple-access channel, where the learning (i.e., feature encoding and classification) and communication (i.e., precoding) modules are designed with the same goal of inference accuracy maximization. Instead of end-to-end learning which involves both the task dataset and wireless channel during training, we advocate a separate design of learning and communication to achieve the consistent goal. Specifically, we leverage the maximal coding rate reduction (MCR2) objective as a surrogate to represent the inference accuracy, which allows us to explicitly formulate the precoding optimization problem. We cast valuable insights into this formulation and develop a block coordinate ascent (BCA) algorithm for efficient problem-solving. Moreover, the MCR2 objective serves the loss function for feature encoding and guides the classification design. Simulation results on the synthetic features explain the mechanism of MCR2 precoding at different SNRs. We also validate on the CIFAR-10 and ModelNet10 datasets that the proposed design achieves a better latency-accuracy tradeoff compared to various baselines. As such, our work paves the way for further exploration into the synergistic alignment of learning and communication objectives in task-oriented communication systems.
△ Less
Submitted 28 May, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation
Authors:
Qiaosong Qi,
Le Zhuo,
Aixi Zhang,
Yue Liao,
Fei Fang,
Si Liu,
Shuicheng Yan
Abstract:
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we…
▽ More
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.
△ Less
Submitted 5 August, 2023;
originally announced August 2023.
-
Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples
Authors:
Andrew H. Song,
Mane Williams,
Drew F. K. Williamson,
Guillaume Jaume,
Andrew Zhang,
Bowen Chen,
Robert Serafin,
Jonathan T. C. Liu,
Alex Baras,
Anil V. Parwani,
Faisal Mahmood
Abstract:
Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation…
▽ More
Human tissue and its constituent cells form a microenvironment that is fundamentally three-dimensional (3D). However, the standard-of-care in pathologic diagnosis involves selecting a few two-dimensional (2D) sections for microscopic evaluation, risking sampling bias and misdiagnosis. Diverse methods for capturing 3D tissue morphologies have been developed, but they have yet had little translation to clinical practice; manual and computational evaluations of such large 3D data have so far been impractical and/or unable to provide patient-level clinical insights. Here we present Modality-Agnostic Multiple instance learning for volumetric Block Analysis (MAMBA), a deep-learning-based platform for processing 3D tissue images from diverse imaging modalities and predicting patient outcomes. Archived prostate cancer specimens were imaged with open-top light-sheet microscopy or microcomputed tomography and the resulting 3D datasets were used to train risk-stratification networks based on 5-year biochemical recurrence outcomes via MAMBA. With the 3D block-based approach, MAMBA achieves an area under the receiver operating characteristic curve (AUC) of 0.86 and 0.74, superior to 2D traditional single-slice-based prognostication (AUC of 0.79 and 0.57), suggesting superior prognostication with 3D morphological features. Further analyses reveal that the incorporation of greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, suggesting the value of capturing larger extents of heterogeneous 3D morphology. With the rapid growth and adoption of 3D spatial biology and pathology techniques by researchers and clinicians, MAMBA provides a general and efficient framework for 3D weakly supervised learning for clinical decision support and can help to reveal novel 3D morphological biomarkers for prognosis and therapeutic response.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Sensing Aided Covert Communications: Turning Interference into Allies
Authors:
Xinyi Wang,
Zesong Fei,
Peng Liu,
J. Andrew Zhang,
Qingqing Wu,
Nan Wu
Abstract:
In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended…
▽ More
In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended Kalman filtering technique is employed to predict its trajectory as well as the corresponding channels. Depending on the maneuvering altitude of adversary target, two channel state information (CSI) models are considered, with the aim of maximizing the covert transmission rate by jointly designing the radar waveform and communication transmit beamforming vector based on the constructed channels. For perfect CSI under the free-space propagation model, by decoupling the joint design, we propose an efficient algorithm to guarantee that the target cannot detect the transmission. For imperfect CSI due to the multi-path components, a robust joint transmission scheme is proposed based on the property of the Kullback-Leibler divergence. The convergence behaviour, tracking MSE, false alarm and missed detection probabilities, and covert transmission rate are evaluated. Simulation results show that the proposed algorithms achieve accurate tracking. For both channel models, the proposed sensing-assisted covert transmission design is able to guarantee the covertness, and significantly outperforms the conventional schemes.
△ Less
Submitted 3 January, 2024; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Reproducing the Acoustic Velocity Vectors in a Spherical Listening Region
Authors:
Jiarui Wang,
Thushara Abhayapala,
Jihui Aimee Zhang,
Prasanga Samarasinghe
Abstract:
Acoustic velocity vectors (AVVs) are related to the human's perception of sound at low frequencies and are widely used in Ambisonics. This paper proposes a spatial sound field reproduction algorithm called velocity matching, which reproduces the AVVs in the spherical listening region by matching the AVVs' spherical harmonic coefficients. Using the sound field translation formula, the spherical har…
▽ More
Acoustic velocity vectors (AVVs) are related to the human's perception of sound at low frequencies and are widely used in Ambisonics. This paper proposes a spatial sound field reproduction algorithm called velocity matching, which reproduces the AVVs in the spherical listening region by matching the AVVs' spherical harmonic coefficients. Using the sound field translation formula, the spherical harmonic coefficients of the AVVs are derived from the spherical harmonic coefficients of the pressure, which can be measured by a higher-order microphone array. Unlike algorithms that only control the AVVs at discrete sweet spots, the proposed velocity matching algorithm manipulates the AVVs in the whole spherical listening region and allows the listener to move beyond the sweet spots. Simulations show the proposed velocity matching algorithm accurately reproduces the AVVs in the spherical listening region and requires fewer number of loudspeakers than pressure matching algorithm.
△ Less
Submitted 6 June, 2024; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Differentially Private Over-the-Air Federated Learning Over MIMO Fading Channels
Authors:
Hang Liu,
Jia Yan,
Ying-Jun Angela Zhang
Abstract:
Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data…
▽ More
Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, uploading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data privacy in FL. Previous research has focused on over-the-air FL with a single-antenna server, leveraging communication noise to enhance user-level DP. This approach achieves the so-called "free DP" by controlling transmit power rather than introducing additional DP-preserving mechanisms at devices, such as adding artificial noise. In this paper, we study differentially private over-the-air FL over a multiple-input multiple-output (MIMO) fading channel. We show that FL model communication with a multiple-antenna server amplifies privacy leakage as the multiple-antenna server employs separate receive combining for model aggregation and information inference. Consequently, relying solely on communication noise, as done in the multiple-input single-output system, cannot meet high privacy requirements, and a device-side privacy-preserving mechanism is necessary for optimal DP design. We analyze the learning convergence and privacy loss of the studied FL system and propose a transceiver design algorithm based on alternating optimization. Numerical results demonstrate that the proposed method achieves a better privacy-learning trade-off compared to prior work.
△ Less
Submitted 25 December, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Time-Domain Wideband Image Source Method for Spherical Microphone Arrays
Authors:
Jiarui Wang,
Prasanga Samarasinghe,
Thushara Abhayapala,
Jihui Aimee Zhang
Abstract:
This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts wh…
▽ More
This paper presents the time-domain wideband spherical microphone array impulse response generator (TDW-SMIR generator), which is a time-domain wideband image source method (ISM) for generating the room impulse responses captured by an open spherical microphone array. To incorporate loudspeaker directivity, the TDW-SMIR generator considers a source that emits a sequence of spherical wave fronts whose amplitudes are related to the loudspeaker directional impulse responses measured in the far-field. The TDW-SMIR generator uses geometric models to derive the time-domain signals recorded by the spherical microphone array. Comparisons are made with frequency-domain single band ISMs. Simulation results prove the results of the TDW-SMIR generator are similar to those of frequency-domain single band ISMs.
△ Less
Submitted 9 August, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Cross-attention learning enables real-time nonuniform rotational distortion correction in OCT
Authors:
Haoran Zhang,
Jianlong Yang,
Jingqian Zhang,
Shiqing Zhao,
Aili Zhang
Abstract:
Nonuniform rotational distortion (NURD) correction is vital for endoscopic optical coherence tomography (OCT) imaging and its functional extensions, such as angiography and elastography. Current NURD correction methods require time-consuming feature tracking or cross-correlation calculations and thus sacrifice temporal resolution. Here we propose a cross-attention learning method for the NURD corr…
▽ More
Nonuniform rotational distortion (NURD) correction is vital for endoscopic optical coherence tomography (OCT) imaging and its functional extensions, such as angiography and elastography. Current NURD correction methods require time-consuming feature tracking or cross-correlation calculations and thus sacrifice temporal resolution. Here we propose a cross-attention learning method for the NURD correction in OCT. Our method is inspired by the recent success of the self-attention mechanism in natural language processing and computer vision. By leveraging its ability to model long-range dependencies, we can directly obtain the correlation between OCT A-lines at any distance, thus accelerating the NURD correction. We develop an end-to-end stacked cross-attention network and design three types of optimization constraints. We compare our method with two traditional feature-based methods and a CNN-based method, on two publicly-available endoscopic OCT datasets and a private dataset collected on our home-built endoscopic OCT system. Our method achieved a $\sim3\times$ speedup to real time ($26\pm 3$ fps), and superior correction performance.
△ Less
Submitted 5 January, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Authors:
Tianyi Xu,
Zhanheng Yang,
Kaixun Huang,
Pengcheng Guo,
Ao Zhang,
Biao Li,
Changru Chen,
Chao Li,
Lei Xie
Abstract:
By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words. However, for real-world voice assistants, always biasing on such personalized words with high prediction scores can significantly degrade the performance of recognizing common words. To address this issue, we propose an adaptive contextual bias…
▽ More
By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words. However, for real-world voice assistants, always biasing on such personalized words with high prediction scores can significantly degrade the performance of recognizing common words. To address this issue, we propose an adaptive contextual biasing method based on Context-Aware Transformer Transducer (CATT) that utilizes the biased encoder and predictor embeddings to perform streaming prediction of contextual phrase occurrences. Such prediction is then used to dynamically switch the bias list on and off, enabling the model to adapt to both personalized and common scenarios. Experiments on Librispeech and internal voice assistant datasets show that our approach can achieve up to 6.7% and 20.7% relative reduction in WER and CER compared to the baseline respectively, mitigating up to 96.7% and 84.9% of the relative WER and CER increase for common cases. Furthermore, our approach has a minimal performance impact in personalized scenarios while maintaining a streaming inference pipeline with negligible RTF increase.
△ Less
Submitted 15 August, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Complex CNN CSI Enhancer for Integrated Sensing and Communications
Authors:
Xu Chen,
Zhiyong Feng,
J. Andrew Zhang,
Feifei Gao,
Xin Yuan,
Zhaohui Yang,
Ping Zhang
Abstract:
In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhance…
▽ More
In this paper, we propose a novel complex convolutional neural network (CNN) CSI enhancer for integrated sensing and communications (ISAC), which exploits the correlation between the sensing parameters (such as angle-of-arrival and range) and the channel state information (CSI) to significantly improve the CSI estimation accuracy and further enhance the sensing accuracy. Within the CNN CSI enhancer, we use the complex-valued computation layers to form the CNN, which maintains the phase information of CSI. We also transform the CSI into the sparse angle-delay domain, leading to heatmap images with prominent peaks that can be efficiently processed by CNN. Based on the enhanced CSI outputs, we further propose a novel biased fast Fourier transform (FFT)-based sensing scheme for improving the range sensing accuracy, by artificially introducing phase biasing terms. Extensive simulation results show that the ISAC complex CNN CSI enhancer can converge within 30 training epochs. The normalized mean square error (NMSE) of its CSI estimates is about 17 dB lower than that of the linear minimum mean square error (LMMSE) estimator, and the bit error rate (BER) of demodulation using the enhanced CSI estimation approaches that with perfect CSI. Finally, the range estimation MSE of the proposed biased FFT-based sensing method approaches that of the subspace-based sensing method, at a much lower complexity.
△ Less
Submitted 19 June, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Authors:
Kaixun Huang,
Ao Zhang,
Zhanheng Yang,
Pengcheng Guo,
Bingshen Mu,
Tianyi Xu,
Lei Xie
Abstract:
Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context…
▽ More
Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.
△ Less
Submitted 12 July, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Sensing Aided Uplink Transmission in OTFS ISAC with Joint Parameter Association, Channel Estimation and Signal Detection
Authors:
Xi Yang,
Hang Li,
Qinghua Guo,
J. Andrew Zhang,
Xiaojing Huang,
Zhiqun Cheng
Abstract:
In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts paramet…
▽ More
In this work, we study sensing-aided uplink transmission in an integrated sensing and communication (ISAC) vehicular network with the use of orthogonal time frequency space (OTFS) modulation. To exploit sensing parameters for improving uplink communications, the parameters must be first associated with the transmitters, which is a challenging task. We propose a scheme that jointly conducts parameter association, channel estimation and signal detection by formulating it as a constrained bilinear recovery problem. Then we develop a message passing algorithm to solve the problem, leveraging the bilinear unitary approximate message passing (Bi-UAMP) algorithm. Numerical results validate the proposed scheme, which show that relevant performance bounds can be closely approached.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Vital Sign Monitoring in Dynamic Environment via mmWave Radar and Camera Fusion
Authors:
Yingqi Wang,
Zhongqin Wang,
J. Andrew Zhang,
Haimin Zhang,
Min Xu
Abstract:
Contact-free vital sign monitoring, which uses wireless signals for recognizing human vital signs (i.e, breath and heartbeat), is an attractive solution to health and security. However, the subject's body movement and the change in actual environments can result in inaccurate frequency estimation of heartbeat and respiratory. In this paper, we propose a robust mmWave radar and camera fusion system…
▽ More
Contact-free vital sign monitoring, which uses wireless signals for recognizing human vital signs (i.e, breath and heartbeat), is an attractive solution to health and security. However, the subject's body movement and the change in actual environments can result in inaccurate frequency estimation of heartbeat and respiratory. In this paper, we propose a robust mmWave radar and camera fusion system for monitoring vital signs, which can perform consistently well in dynamic scenarios, e.g., when some people move around the subject to be tracked, or a subject waves his/her arms and marches on the spot. Three major processing modules are developed in the system, to enable robust sensing. Firstly, we utilize a camera to assist a mmWave radar to accurately localize the subjects of interest. Secondly, we exploit the calculated subject position to form transmitting and receiving beamformers, which can improve the reflected power from the targets and weaken the impact of dynamic interference. Thirdly, we propose a weighted multi-channel Variational Mode Decomposition (WMC-VMD) algorithm to separate the weak vital sign signals from the dynamic ones due to subject's body movement. Experimental results show that, the 90${^{th}}$ percentile errors in respiration rate (RR) and heartbeat rate (HR) are less than 0.5 RPM (respirations per minute) and 6 BPM (beats per minute), respectively.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge
Authors:
Pengcheng Guo,
He Wang,
Bingshen Mu,
Ao Zhang,
Peikun Chen
Abstract:
This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. Specifically, the weighted prediction error (WPE) and guided source separation (GSS) techniques are used to reduce reverberation and generate clean signals for each single speaker first. Then, we explore the effectivenes…
▽ More
This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge. Specifically, the weighted prediction error (WPE) and guided source separation (GSS) techniques are used to reduce reverberation and generate clean signals for each single speaker first. Then, we explore the effectiveness of Branchformer and E-Branchformer based ASR systems. To better make use of the visual modality, a cross-attention based multi-modal fusion module is proposed, which explicitly learns the contextual relationship between different modalities. Experiments show that our system achieves a concatenated minimum-permutation character error rate (cpCER) of 28.13\% and 31.21\% on the Dev and Eval set, and obtains second place in the challenge.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
VOLTA: an Environment-Aware Contrastive Cell Representation Learning for Histopathology
Authors:
Ramin Nakhli,
Allen Zhang,
Hossein Farahani,
Amirali Darbandsari,
Elahe Shenasa,
Sidney Thiessen,
Katy Milne,
Jessica McAlpine,
Brad Nelson,
C Blake Gilks,
Ali Bashashati
Abstract:
In clinical practice, many diagnosis tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques require labels, providing manual cell annotations is time-consuming due to the large number of cells. In this paper, we propose a self-supervised framework (VOLTA) for cell representation learning in histopathology images using a novel technique that…
▽ More
In clinical practice, many diagnosis tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques require labels, providing manual cell annotations is time-consuming due to the large number of cells. In this paper, we propose a self-supervised framework (VOLTA) for cell representation learning in histopathology images using a novel technique that accounts for the cell's mutual relationship with its environment for improved cell representations. We subjected our model to extensive experiments on the data collected from multiple institutions around the world comprising of over 700,000 cells, four cancer types, and cell types ranging from three to six categories for each dataset. The results show that our model outperforms the state-of-the-art models in cell representation learning. To showcase the potential power of our proposed framework, we applied VOLTA to ovarian and endometrial cancers with very small sample sizes (10-20 samples) and demonstrated that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide novel insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised deep learning models that require large sample sizes for training, we provide a framework that can empower new discoveries without any annotation data in situations where sample sizes are limited.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Joint Beamforming for RIS-Assisted Integrated Sensing and Communication Systems
Authors:
Yongqing Xu,
Yong Li,
J. Andrew Zhang,
Marco Di Renzo,
Tony Q. S. Quek
Abstract:
Integrated sensing and communications (ISAC) is an emerging critical technique for the next generation of communication systems. However, due to multiple performance metrics used for communication and sensing, the limited degrees-of-freedom (DoF) in optimizing ISAC systems poses a challenge. Reconfigurable intelligent surfaces (RIS) can introduce new DoF for beamforming in ISAC systems, thereby en…
▽ More
Integrated sensing and communications (ISAC) is an emerging critical technique for the next generation of communication systems. However, due to multiple performance metrics used for communication and sensing, the limited degrees-of-freedom (DoF) in optimizing ISAC systems poses a challenge. Reconfigurable intelligent surfaces (RIS) can introduce new DoF for beamforming in ISAC systems, thereby enhancing the performance of communication and sensing simultaneously. In this paper, we propose two optimization techniques for beamforming in RIS-assisted ISAC systems. The first technique is an alternating optimization (AO) algorithm based on the semidefinite relaxation (SDR) method and a one-dimension iterative (ODI) algorithm, which can maximize the radar mutual information (MI) while imposing constraints on the communication rates. The second technique is an AO algorithm based on the Riemannian gradient (RG) method, which can maximize the weighted ISAC performance metrics. Simulation results verify the effectiveness of the proposed schemes. The AO-SDR-ODI method is shown to achieve better communication and sensing performance, than the AO-RG method, at a higher complexity. It is also shown that the mean-squared-error (MSE) of the estimates of the sensing parameters decreases as the radar MI increases.
△ Less
Submitted 24 January, 2024; v1 submitted 3 March, 2023;
originally announced March 2023.
-
VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting
Authors:
Ao Zhang,
He Wang,
Pengcheng Guo,
Yihui Fu,
Lei Xie,
Yingying Gao,
Shilei Zhang,
Junlan Feng
Abstract:
The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the e…
▽ More
The performance of the keyword spotting (KWS) system based on audio modality, commonly measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. Therefore, audio-visual keyword spotting, which leverages complementary relationships over multiple modalities, has recently gained much attention. However, current studies mainly focus on combining the exclusively learned representations of different modalities, instead of exploring the modal relationships during each respective modeling. In this paper, we propose a novel visual modality enhanced end-to-end KWS framework (VE-KWS), which fuses audio and visual modalities from two aspects. The first one is utilizing the speaker location information obtained from the lip region in videos to assist the training of multi-channel audio beamformer. By involving the beamformer as an audio enhancement module, the acoustic distortions, caused by the far field or noisy environments, could be significantly suppressed. The other one is conducting cross-attention between different modalities to capture the inter-modal relationships and help the representation learning of each modality. Experiments on the MSIP challenge corpus show that our proposed model achieves 2.79% false rejection rate and 2.95% false alarm rate on the Eval set, resulting in a new SOTA performance compared with the top-ranking systems in the ICASSP2022 MISP challenge.
△ Less
Submitted 14 March, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.