Search | arXiv e-print repository

Causally-Aware Spatio-Temporal Multi-Graph Convolution Network for Accurate and Reliable Traffic Prediction

Authors: Pingping Dong, Xiao-Lin Wang, Indranil Bose, Kam K. H. Ng, Xiaoning Zhang, Xiaoge Zhang

Abstract: Accurate and reliable prediction has profound implications to a wide range of applications. In this study, we focus on an instance of spatio-temporal learning problem--traffic prediction--to demonstrate an advanced deep learning model developed for making accurate and reliable forecast. Despite the significant progress in traffic prediction, limited studies have incorporated both explicit and impl… ▽ More Accurate and reliable prediction has profound implications to a wide range of applications. In this study, we focus on an instance of spatio-temporal learning problem--traffic prediction--to demonstrate an advanced deep learning model developed for making accurate and reliable forecast. Despite the significant progress in traffic prediction, limited studies have incorporated both explicit and implicit traffic patterns simultaneously to improve prediction performance. Meanwhile, the variability nature of traffic states necessitates quantifying the uncertainty of model predictions in a statistically principled way; however, extant studies offer no provable guarantee on the statistical validity of confidence intervals in reflecting its actual likelihood of containing the ground truth. In this paper, we propose an end-to-end traffic prediction framework that leverages three primary components to generate accurate and reliable traffic predictions: dynamic causal structure learning for discovering implicit traffic patterns from massive traffic data, causally-aware spatio-temporal multi-graph convolution network (CASTMGCN) for learning spatio-temporal dependencies, and conformal prediction for uncertainty quantification. CASTMGCN fuses several graphs that characterize different important aspects of traffic networks and an auxiliary graph that captures the effect of exogenous factors on the road network. On this basis, a conformal prediction approach tailored to spatio-temporal data is further developed for quantifying the uncertainty in node-wise traffic predictions over varying prediction horizons. Experimental results on two real-world traffic datasets demonstrate that the proposed method outperforms several state-of-the-art models in prediction accuracy; moreover, it generates more efficient prediction regions than other methods while strictly satisfying the statistical validity in coverage. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.11871 [pdf, other]

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Authors: Lionel Z. Wang, Yiming Ma, Renfei Gao, Beichen Guo, Zhuoran Li, Han Zhu, Wenqi Fan, Zexin Lu, Ka Chung Ng

Abstract: The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psyc… ▽ More The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.09935 [pdf, other]

Privacy Technologies for Financial Intelligence

Authors: Yang Li, Thilina Ranbaduge, Kee Siong Ng

Abstract: Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related… ▽ More Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related to different pieces of the overall puzzle is usually distributed across a network of financial institutions, regulators, and law-enforcement agencies and they cannot be easily shared due to privacy constraints. Recent advances in Privacy-Preserving Data Matching and Machine Learning provide an opportunity for regulators and the financial industry to come together to solve the risk-discovery problem with technology. This paper provides a survey of the financial intelligence landscape and where opportunities lie for privacy technologies to improve the state-of-the-art in financial-crime detection. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.06275 [pdf, other]

Robust Instance Optimal Phase-Only Compressed Sensing

Authors: Junren Chen, Zhaoqiang Liu, Michael K. Ng, Jonathan Scarlett

Abstract: Phase-only compressed sensing (PO-CS) is concerned with the recovery of structured signals from the phases of complex measurements. Recent results show that structured signals in the standard sphere $\mathbb{S}^{n-1}$ can be exactly recovered from complex Gaussian phases, by recasting PO-CS as linear compressed sensing and then applying existing solvers such as basis pursuit. Known guarantees are… ▽ More Phase-only compressed sensing (PO-CS) is concerned with the recovery of structured signals from the phases of complex measurements. Recent results show that structured signals in the standard sphere $\mathbb{S}^{n-1}$ can be exactly recovered from complex Gaussian phases, by recasting PO-CS as linear compressed sensing and then applying existing solvers such as basis pursuit. Known guarantees are either non-uniform or do not tolerate model error. We show that this linearization approach is more powerful than the prior results indicate. First, it achieves uniform instance optimality: Under complex Gaussian matrix with a near-optimal number of rows, this approach uniformly recovers all signals in $\mathbb{S}^{n-1}$ with errors proportional to the model errors of the signals. Specifically, for sparse recovery there exists an efficient estimator $\mathbf{x}^\sharp$ and some universal constant $C$ such that $\|\mathbf{x}^\sharp-\mathbf{x}\|_2\le \frac{Cσ_s(\mathbf{x})_1}{\sqrt{s}}~(\forall\mathbf{x}\in\mathbb{S}^{n-1})$, where $σ_s(\mathbf{x})_1=\min_{\mathbf{u}\inΣ^n_s}\|\mathbf{u}-\mathbf{x}\|_1$ is the model error under $\ell_1$-norm. Second, the instance optimality is robust to small dense disturbances and sparse corruptions that arise before or after capturing the phases. As an extension, we also propose to recast sparsely corrupted PO-CS as a linear corrupted sensing problem and show that this achieves perfect reconstruction of the signals. Our results resemble the instance optimal guarantees in linear compressed sensing and, to our knowledge, are the first results of this kind for a non-linear sensing scenario. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.05582 [pdf, other]

Non-Negative Reduced Biquaternion Matrix Factorization with Applications in Color Face Recognition

Authors: Jifei Miao, Junjun Pan, Michael K. Ng

Abstract: Reduced biquaternion (RB), as a four-dimensional algebra highly suitable for representing color pixels, has recently garnered significant attention from numerous scholars. In this paper, for color image processing problems, we introduce a concept of the non-negative RB matrix and then use the multiplication properties of RB to propose a non-negative RB matrix factorization (NRBMF) model. The NRBMF… ▽ More Reduced biquaternion (RB), as a four-dimensional algebra highly suitable for representing color pixels, has recently garnered significant attention from numerous scholars. In this paper, for color image processing problems, we introduce a concept of the non-negative RB matrix and then use the multiplication properties of RB to propose a non-negative RB matrix factorization (NRBMF) model. The NRBMF model is introduced to address the challenge of reasonably establishing a non-negative quaternion matrix factorization model, which is primarily hindered by the multiplication properties of traditional quaternions. Furthermore, this paper transforms the problem of solving the NRBMF model into an RB alternating non-negative least squares (RB-ANNLS) problem. Then, by introducing a method to compute the gradient of the real function with RB matrix variables, we solve the RB-ANNLS optimization problem using the RB projected gradient algorithm and conduct a convergence analysis of the algorithm. Finally, we validate the effectiveness and superiority of the proposed NRBMF model in color face recognition. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2408.04837 [pdf, ps, other]

Multi-User MISO with Stacked Intelligent Metasurfaces: A DRL-Based Sum-Rate Optimization Approach

Authors: Hao Liu, Jiancheng An, George C. Alexandropoulos, Derrick Wing Kwan Ng, Chau Yuen, Lu Gan

Abstract: Stacked intelligent metasurfaces (SIMs) represent a novel signal processing paradigm that enables over-the-air processing of electromagnetic waves at the speed of light. Their multi-layer architecture exhibits customizable computational capabilities compared to conventional single-layer reconfigurable intelligent surfaces and metasurface lenses. In this paper, we deploy SIM to improve the performa… ▽ More Stacked intelligent metasurfaces (SIMs) represent a novel signal processing paradigm that enables over-the-air processing of electromagnetic waves at the speed of light. Their multi-layer architecture exhibits customizable computational capabilities compared to conventional single-layer reconfigurable intelligent surfaces and metasurface lenses. In this paper, we deploy SIM to improve the performance of multi-user multiple-input single-output (MISO) wireless systems through a low complexity manner with reduced numbers of transmit radio frequency chains. In particular, an optimization formulation for the joint design of the SIM phase shifts and the transmit power allocation is presented, which is efficiently tackled via a customized deep reinforcement learning (DRL) approach that systematically explores pre-designed states of the SIM-parametrized smart wireless environment. The presented performance evaluation results demonstrate the proposed method's capability to effectively learn from the wireless environment, while consistently outperforming conventional precoding schemes under low transmit power conditions. Furthermore, the implementation of hyperparameter tuning and whitening process significantly enhance the robustness of the proposed DRL framework. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 34 pages, 11 figures, 2 tables. arXiv admin note: text overlap with arXiv:2402.09006

arXiv:2407.14815 [pdf, ps, other]

Unified Far-Field and Near-Field in Holographic MIMO: A Wavenumber-Domain Perspective

Authors: Yuanbin Chen, Xufeng Guo, Gui Zhou, Shi Jin, Derrick Wing Kwan Ng, Zhaocheng Wang

Abstract: This article conceives a unified representation for near-field and far-field holographic multiple-input multiple-output (HMIMO) channels, addressing a practical design dilemma: "Why does the angular-domain representation no longer function effectively?" To answer this question, we pivot from the angular domain to the wavenumber domain and present a succinct overview of its underlying philosophy. I… ▽ More This article conceives a unified representation for near-field and far-field holographic multiple-input multiple-output (HMIMO) channels, addressing a practical design dilemma: "Why does the angular-domain representation no longer function effectively?" To answer this question, we pivot from the angular domain to the wavenumber domain and present a succinct overview of its underlying philosophy. In re-examining the Fourier plane-wave series expansion that recasts spherical propagation waves into a series of plane waves represented by Fourier harmonics, we characterize the HMIMO channel employing these Fourier harmonics having different wavenumbers. This approach, referred to as the wavenumebr-domain representation, facilitates a unified view across the far-field and the near-field. Furthermore, the limitations of the DFT basis are demonstrated when identifying the sparsity inherent to the HMIMO channel, motivating the development of a wavenumber-domain basis as an alternative. We then present some preliminary applications of the proposed wavenumber-domain basis in signal processing across both the far-field and near-field, along with several prospects for future HMIMO system designs based on the wavenumber domain. △ Less

Submitted 20 July, 2024; originally announced July 2024.

Comments: This article has been accepted for publication in IEEE Commag (7 pages, 5 figures)

arXiv:2407.06894 [pdf, other]

RIS-Assisted Received Adaptive Spatial Modulation for Wireless Communication

Authors: Chaorong Zhang, Hui Xu, Benjamin K. Ng, Chan-Tong Lam

Abstract: A novel wireless transmission scheme, as named the reconfigurable intelligent surface (RIS)-assisted received adaptive spatial modulation (RASM) scheme, is proposed in this paper. In this scheme, the adaptive spatial modulation (ASM)-based antennas selection works at the receiver by employing the characteristics of the RIS in each time slot, where the signal-to-noise ratio at specific selected ant… ▽ More A novel wireless transmission scheme, as named the reconfigurable intelligent surface (RIS)-assisted received adaptive spatial modulation (RASM) scheme, is proposed in this paper. In this scheme, the adaptive spatial modulation (ASM)-based antennas selection works at the receiver by employing the characteristics of the RIS in each time slot, where the signal-to-noise ratio at specific selected antennas can be further enhanced with near few powers. Besides for the bits from constellation symbols, the extra bits can be mapped into the indices of receive antenna combinations and conveyed to the receiver through the ASM-based antenna-combination selection, thus providing higher spectral efficiency. To explicitly present the RASM scheme, the analytical performance of bit error rate of it is discussed in this paper. As a trade-off selection, the proposed scheme shows higher spectral efficiency and remains the satisfactory error performance. Simulation and analytical results demonstrate the better performance and exhibit more potential to apply in practical wireless communication. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05805 [pdf, other]

Computational Complexity-Constrained Spectral Efficiency Analysis for 6G Waveforms

Authors: Saulo Queiroz, João P. Vilela, Benjamin Koon Kei Ng, Chan-Tong Lam, Edmundo Monteiro

Abstract: In this work, we present a tutorial on how to account for the computational time complexity overhead of signal processing in the spectral efficiency (SE) analysis of wireless waveforms. Our methodology is particularly relevant in scenarios where achieving higher SE entails a penalty in complexity, a common trade-off present in 6G candidate waveforms. We consider that SE derives from the data rate,… ▽ More In this work, we present a tutorial on how to account for the computational time complexity overhead of signal processing in the spectral efficiency (SE) analysis of wireless waveforms. Our methodology is particularly relevant in scenarios where achieving higher SE entails a penalty in complexity, a common trade-off present in 6G candidate waveforms. We consider that SE derives from the data rate, which is impacted by time-dependent overheads. Thus, neglecting the computational complexity overhead in the SE analysis grants an unfair advantage to more computationally complex waveforms, as they require larger computational resources to meet a signal processing runtime below the symbol period. We demonstrate our points with two case studies. In the first, we refer to IEEE 802.11a-compliant baseband processors from the literature to show that their runtime significantly impacts the SE perceived by upper layers. In the second case study, we show that waveforms considered less efficient in terms of SE can outperform their more computationally expensive counterparts if provided with equivalent high-performance computational resources. Based on these cases, we believe our tutorial can address the comparative SE analysis of waveforms that operate under different computational resource constraints. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Submitted to ITU-JFET Journal

arXiv:2407.04604 [pdf, other]

PartCraft: Crafting Creative Objects by Parts

Authors: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

Abstract: This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achiev… ▽ More This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achieve this, we first parse objects into parts through unsupervised feature clustering. Then, we encode parts into text tokens and introduce an entropy-based normalized attention loss that operates on them. This loss design enables our model to learn generic prior topology knowledge about object's part composition, and further generalize to novel part compositions to ensure the generation looks holistically faithful. Lastly, we employ a bottleneck encoder to project the part tokens. This not only enhances fidelity but also accelerates learning, by leveraging shared knowledge and facilitating information exchange among instances. Visual results in the paper and supplementary material showcase the compelling power of PartCraft in crafting highly customized, innovative creations, exemplified by the "charming" and creative birds. Code is released at https://github.com/kamwoh/partcraft. △ Less

Submitted 8 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: ECCV 2024. arXiv admin note: substantial text overlap with arXiv:2311.15477

arXiv:2407.02877 [pdf, other]

Resource Allocation Design for Next-Generation Multiple Access: A Tutorial Overview

Authors: Zhiqiang Wei, Dongfang Xu, Shuangyang Li, Shenghui Song, Derrick Wing Kwan Ng, Giuseppe Caire

Abstract: Multiple access is the cornerstone technology for each generation of wireless cellular networks and resource allocation design plays a crucial role in multiple access. In this paper, we present a comprehensive tutorial overview for junior researchers in this field, aiming to offer a foundational guide for resource allocation design in the context of next-generation multiple access (NGMA). Initiall… ▽ More Multiple access is the cornerstone technology for each generation of wireless cellular networks and resource allocation design plays a crucial role in multiple access. In this paper, we present a comprehensive tutorial overview for junior researchers in this field, aiming to offer a foundational guide for resource allocation design in the context of next-generation multiple access (NGMA). Initially, we identify three types of channels in future wireless cellular networks over which NGMA will be implemented, namely: natural channels, reconfigurable channels, and functional channels. Natural channels are traditional uplink and downlink communication channels; reconfigurable channels are defined as channels that can be proactively reshaped via emerging platforms or techniques, such as intelligent reflecting surface (IRS), unmanned aerial vehicle (UAV), and movable/fluid antenna (M/FA); and functional channels support not only communication but also other functionalities simultaneously, with typical examples including integrated sensing and communication (ISAC) and joint computing and communication (JCAC) channels. Then, we introduce NGMA models applicable to these three types of channels that cover most of the practical communication scenarios of future wireless communications. Subsequently, we articulate the key optimization technical challenges inherent in the resource allocation design for NGMA, categorizing them into rate-oriented, power-oriented, and reliability-oriented resource allocation designs. The corresponding optimization approaches for solving the formulated resource allocation design problems are then presented. Finally, simulation results are presented and discussed to elucidate the practical implications and insights derived from resource allocation designs in NGMA. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 69 pages, 10 figures, 5 tables

arXiv:2406.17649 [pdf, other]

Privacy Preserving Reinforcement Learning for Population Processes

Authors: Samuel Yang-Zhao, Kee Siong Ng

Abstract: We consider the problem of privacy protection in Reinforcement Learning (RL) algorithms that operate over population processes, a practical but understudied setting that includes, for example, the control of epidemics in large populations of dynamically interacting individuals. In this setting, the RL algorithm interacts with the population over $T$ time steps by receiving population-level statist… ▽ More We consider the problem of privacy protection in Reinforcement Learning (RL) algorithms that operate over population processes, a practical but understudied setting that includes, for example, the control of epidemics in large populations of dynamically interacting individuals. In this setting, the RL algorithm interacts with the population over $T$ time steps by receiving population-level statistics as state and performing actions which can affect the entire population at each time step. An individual's data can be collected across multiple interactions and their privacy must be protected at all times. We clarify the Bayesian semantics of Differential Privacy (DP) in the presence of correlated data in population processes through a Pufferfish Privacy analysis. We then give a meta algorithm that can take any RL algorithm as input and make it differentially private. This is achieved by taking an approach that uses DP mechanisms to privatize the state and reward signal at each time step before the RL algorithm receives them as input. Our main theoretical result shows that the value-function approximation error when applying standard RL algorithms directly to the privatized states shrinks quickly as the population size and privacy budget increase. This highlights that reasonable privacy-utility trade-offs are possible for differentially private RL algorithms in population processes. Our theoretical findings are validated by experiments performed on a simulated epidemic control problem over large population sizes. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.14939 [pdf, other]

RIS-aided MIMO Beamforming: Piece-Wise Near-field Channel Model

Authors: Weijian Chen, Zai Yang, Zhiqiang Wei, Derrick Wing Kwan Ng, Michail Matthaiou

Abstract: This paper proposes a joint active and passive beamforming design for reconfigurable intelligent surface (RIS)-aided wireless communication systems, adopting a piece-wise near-field channel model. While a traditional near-field channel model, applied without any approximations, offers higher modeling accuracy than a far-field model, it renders the system design more sensitive to channel estimation… ▽ More This paper proposes a joint active and passive beamforming design for reconfigurable intelligent surface (RIS)-aided wireless communication systems, adopting a piece-wise near-field channel model. While a traditional near-field channel model, applied without any approximations, offers higher modeling accuracy than a far-field model, it renders the system design more sensitive to channel estimation errors (CEEs). As a remedy, we propose to adopt a piece-wise near-field channel model that leverages the advantages of the near-field approach while enhancing its robustness against CEEs. Our study analyzes the impact of different channel models, including the traditional near-field, the proposed piece-wise near-field and far-field channel models, on the interference distribution caused by CEEs and model mismatches. Subsequently, by treating the interference as noise, we formulate a joint active and passive beamforming design problem to maximize the spectral efficiency (SE). The formulated problem is then recast as a mean squared error (MSE) minimization problem and a suboptimal algorithm is developed to iteratively update the active and passive beamforming strategies. Simulation results demonstrate that adopting the piece-wise near-field channel model leads to an improved SE compared to both the near-field and far-field models in the presence of CEEs. Furthermore, the proposed piece-wise near-field model achieves a good trade-off between modeling accuracy and system's degrees of freedom (DoF). △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 28pages

arXiv:2406.08457 [pdf, other]

ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

Authors: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

Abstract: Existing fine-grained hashing methods typically lack code interpretability as they compute hash code bits holistically using both global and local features. To address this limitation, we propose ConceptHash, a novel method that achieves sub-code level interpretability. In ConceptHash, each sub-code corresponds to a human-understandable concept, such as an object part, and these concepts are autom… ▽ More Existing fine-grained hashing methods typically lack code interpretability as they compute hash code bits holistically using both global and local features. To address this limitation, we propose ConceptHash, a novel method that achieves sub-code level interpretability. In ConceptHash, each sub-code corresponds to a human-understandable concept, such as an object part, and these concepts are automatically discovered without human annotations. Specifically, we leverage a Vision Transformer architecture and introduce concept tokens as visual prompts, along with image patch tokens as model inputs. Each concept is then mapped to a specific sub-code at the model output, providing natural sub-code interpretability. To capture subtle visual differences among highly similar sub-categories (e.g., bird species), we incorporate language guidance to ensure that the learned hash codes are distinguishable within fine-grained object classes while maintaining semantic alignment. This approach allows us to develop hash codes that exhibit similarity within families of species while remaining distinct from species in other families. Extensive experiments on four fine-grained image retrieval benchmarks demonstrate that ConceptHash outperforms previous methods by a significant margin, offering unique sub-code interpretability as an additional benefit. Code at: https://github.com/kamwoh/concepthash. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: CVPRW 2024 - FGVC11 best paper award

arXiv:2406.05481 [pdf, ps, other]

Joint Cooperative Clustering and Power Control for Energy-Efficient Cell-Free XL-MIMO with Multi-Agent Reinforcement Learning

Authors: Ziheng Liu, Jiayi Zhang, Zhilong Liu, Derrick Wing Kwan Ng, Bo Ai

Abstract: In this paper, we investigate the amalgamation of cell-free (CF) and extremely large-scale multiple-input multiple-output (XL-MIMO) technologies, referred to as a CF XL-MIMO, as a promising advancement for enabling future mobile networks. To address the computational complexity and communication power consumption associated with conventional centralized optimization, we focus on user-centric dynam… ▽ More In this paper, we investigate the amalgamation of cell-free (CF) and extremely large-scale multiple-input multiple-output (XL-MIMO) technologies, referred to as a CF XL-MIMO, as a promising advancement for enabling future mobile networks. To address the computational complexity and communication power consumption associated with conventional centralized optimization, we focus on user-centric dynamic networks in which each user is served by an adaptive subset of access points (AP) rather than all of them. We begin our research by analyzing a joint resource allocation problem for energy-efficient CF XL-MIMO systems, encompassing cooperative clustering and power control design, where all clusters are adaptively adjustable. Then, we propose an innovative double-layer multi-agent reinforcement learning (MARL)-based scheme, which offers an effective strategy to tackle the challenges of high-dimensional signal processing. In the section of numerical results, we compare various algorithms with different network architectures. These comparisons reveal that the proposed MARL-based cooperative architecture can effectively strike a balance between system performance and communication overhead, thereby improving energy efficiency performance. It is important to note that increasing the number of user equipments participating in information sharing can effectively enhance SE performance, which also leads to an increase in power consumption, resulting in a non-trivial trade-off between the number of participants and EE performance. △ Less

Submitted 17 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.19373 [pdf, other]

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Authors: Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

Abstract: Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attem… ▽ More Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by International Conference on Neural Computing for Advanced Applications, 2024

arXiv:2405.17464 [pdf, other]

Data Valuation by Leveraging Global and Local Statistical Information

Authors: Xiaoling Zhou, Ou Wu, Michael K. Ng, Hao Jiang

Abstract: Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the… ▽ More Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the accurate calculation of Shapley values is often intractable, leading to the proposal of numerous approximated calculation methods. Despite significant progress, nearly all existing methods overlook the utilization of distribution information of values within a data corpus. In this paper, we demonstrate that both global and local statistical information of value distributions hold significant potential for data valuation within the context of machine learning. Firstly, we explore the characteristics of both global and local value distributions across several simulated and real data corpora. Useful observations and clues are obtained. Secondly, we propose a new data valuation method that estimates Shapley values by incorporating the explored distribution characteristics into an existing method, AME. Thirdly, we present a new path to address the dynamic data valuation problem by formulating an optimization problem that integrates information of both global and local value distributions. Extensive experiments are conducted on Shapley value estimation, value-based data removal/adding, mislabeled data detection, and incremental/decremental data valuation. The results showcase the effectiveness and efficiency of our proposed methodologies, affirming the significant potential of global and local value distributions in data valuation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:2306.10577 by other authors

ACM Class: I.2

arXiv:2405.12114 [pdf, other]

A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator

Authors: Zhigang Jia, Yuelian Xiang, Meixiang Zhao, Tingting Wu, Michael K. Ng

Abstract: The cross-channel deblurring problem in color image processing is difficult to solve due to the complex coupling and structural blurring of color pixels. Until now, there are few efficient algorithms that can reduce color infection in deblurring process. To solve this challenging problem, we present a novel cross-space total variation (CSTV) regularization model for color image deblurring by intro… ▽ More The cross-channel deblurring problem in color image processing is difficult to solve due to the complex coupling and structural blurring of color pixels. Until now, there are few efficient algorithms that can reduce color infection in deblurring process. To solve this challenging problem, we present a novel cross-space total variation (CSTV) regularization model for color image deblurring by introducing a quaternion blur operator and a cross-color space regularization functional. The existence and uniqueness of the solution is proved and a new L-curve method is proposed to find a sweet balance of regularization functionals on different color spaces. The Euler-Lagrange equation is derived to show that CSTV has taken into account the coupling of all color channels and the local smoothing within each color channel. A quaternion operator splitting method is firstly proposed to enhance the ability of color infection reduction of the CSTV regularization model. This strategy also applies to the well-known color deblurring models. Numerical experiments on color image databases illustrate the efficiency and manoeuvrability of the new model and algorithms. The color images restored by them successfully maintain the color and spatial information and are of higher quality in terms of PSNR, SSIM, MSE and CIEde2000 than the restorations of the-state-of-the-art methods. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 15pages,10figures

arXiv:2405.11883 [pdf, other]

Asynchronous MIMO-OFDM Massive Unsourced Random Access with Codeword Collisions

Authors: Tianya Li, Yongpeng Wu, Junyuan Gao, Wenjun Zhang, Xiang-Gen Xia, Derrick Wing Kwan Ng, Chengshan Xiao

Abstract: This paper investigates asynchronous MIMO massive unsourced random access in an orthogonal frequency division multiplexing (OFDM) system over frequency-selective fading channels, with the presence of both timing and carrier frequency offsets (TO and CFO) and non-negligible codeword collisions. The proposed coding framework segregates the data into two components, namely, preamble and coding parts,… ▽ More This paper investigates asynchronous MIMO massive unsourced random access in an orthogonal frequency division multiplexing (OFDM) system over frequency-selective fading channels, with the presence of both timing and carrier frequency offsets (TO and CFO) and non-negligible codeword collisions. The proposed coding framework segregates the data into two components, namely, preamble and coding parts, with the former being tree-coded and the latter LDPC-coded. By leveraging the dual sparsity of the equivalent channel across both codeword and delay domains (CD and DD), we develop a message passing-based sparse Bayesian learning algorithm, combined with belief propagation and mean field, to iteratively estimate DD channel responses, TO, and delay profiles. Furthermore, we establish a novel graph-based algorithm to iteratively separate the superimposed channels and compensate for the phase rotations. Additionally, the proposed algorithm is applied to the flat fading scenario to estimate both TO and CFO, where the channel and offset estimation is enhanced by leveraging the geometric characteristics of the signal constellation. Simulations reveal that the proposed algorithm achieves superior performance and substantial complexity reduction in both channel and offset estimation compared to the codebook enlarging-based counterparts, and enhanced data recovery performances compared to state-of-the-art URA schemes. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 13 pages, 12 figures, submitted to the IEEE for possible publication

arXiv:2404.16994 [pdf, other]

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Authors: Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng

Abstract: Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straight-forward, highly efficient, and resource-light approach to adapting an existi… ▽ More Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straight-forward, highly efficient, and resource-light approach to adapting an existing image-language pre-trained model for dense video understanding. Our preliminary experiments reveal that directly fine-tuning pre-trained image-language models with multiple frames as inputs on video datasets leads to performance saturation or even a drop. Our further investigation reveals that it is largely attributed to the bias of learned high-norm visual features. Motivated by this finding, we propose a simple but effective pooling strategy to smooth the feature distribution along the temporal dimension and thus reduce the dominant impacts from the extreme features. The new model is termed Pooling LLaVA, or PLLaVA in short. PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks. Notably, on the recent popular VideoChatGPT benchmark, PLLaVA achieves a score of 3.48 out of 5 on average of five evaluated dimensions, exceeding the previous SOTA results from GPT4V (IG-VLM) by 9%. On the latest multi-choice benchmark MVBench, PLLaVA achieves 58.1% accuracy on average across 20 sub-tasks, 14.5% higher than GPT4V (IG-VLM). Code is available at https://pllava.github.io/ △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.19185 [pdf, other]

Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation Learning

Authors: Suhang Fan, Wei Xu, Renjie Xie, Shi Jin, Derrick Wing Kwan Ng, Naofal Al-Dhahir

Abstract: Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the… ▽ More Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the vertical and horizontal polarization directions tend to exhibit high polarization correlation. To fully exploit the inherent propagation similarity within dual-polarized channels, we propose a disentangled representation neural network (NN) for CSI feedback, referred to as DiReNet. The proposed DiReNet disentangles dual-polarized CSI into three components: polarization-shared information, vertical polarization-specific information, and horizontal polarization-specific information. This disentanglement of dual-polarized CSI enables the minimization of information redundancy caused by the polarization correlation and improves the performance of CSI compression and recovery. Additionally, flexible quantization and network extension schemes are designed. Consequently, our method provides a pragmatic solution for CSI feedback to harness the physical MIMO polarization as a priori information. Our experimental results show that the performance of our proposed DiReNet surpasses that of existing DL-based networks, while also effectively reducing the number of network parameters by nearly one third. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.12770 [pdf, other]

Multispectral Image Restoration by Generalized Opponent Transformation Total Variation

Authors: Zhantao Ma, Michael K. Ng

Abstract: Multispectral images (MSI) contain light information in different wavelengths of objects, which convey spectral-spatial information and help improve the performance of various image processing tasks. Numerous techniques have been created to extend the application of total variation regularization in restoring multispectral images, for example, based on channel coupling and adaptive total variation… ▽ More Multispectral images (MSI) contain light information in different wavelengths of objects, which convey spectral-spatial information and help improve the performance of various image processing tasks. Numerous techniques have been created to extend the application of total variation regularization in restoring multispectral images, for example, based on channel coupling and adaptive total variation regularization. The primary contribution of this paper is to propose and develop a new multispectral total variation regularization in a generalized opponent transformation domain instead of the original multispectral image domain. Here opponent transformations for multispectral images are generalized from a well-known opponent transformation for color images. We will explore the properties of generalized opponent transformation total variation (GOTTV) regularization and the corresponding optimization formula for multispectral image restoration. To evaluate the effectiveness of the new GOTTV method, we provide numerical examples that showcase its superior performance compared to existing multispectral image total variation methods, using criteria such as MPSNR and MSSIM. △ Less

Submitted 19 March, 2024; originally announced March 2024.

MSC Class: 65F22; 68U10; 35A15; 65K10; 52A41

arXiv:2403.11809 [pdf, other]

Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems

Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Jie Xu, Derrick Wing Kwan Ng, Shuguang Cui

Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channe… ▽ More Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channel estimation (CE). In this paper, we propose a new sensing-enhanced uplink CE scheme for near-field XL-MIMO, which notably reduces the required quantity of baseband samples and the dictionary size. In particular, we first propose a sensing method that can be accomplished in a single time slot. It employs power sensors embedded within the antenna elements to measure the received power pattern rather than baseband samples. A time inversion algorithm is then proposed to precisely estimate the locations of users and scatterers, which offers a substantially lower computational complexity. Based on the estimated locations from sensing, a novel dictionary is then proposed by considering the eigen-problem based on the near-field transmission model, which facilitates efficient near-field CE with less baseband sampling and a more lightweight dictionary. Moreover, we derive the general form of the eigenvectors associated with the near-field channel matrix, revealing their noteworthy connection to the discrete prolate spheroidal sequence (DPSS). Simulation results unveil that the proposed time inversion algorithm achieves accurate localization with power measurements only, and remarkably outperforms various widely-adopted algorithms in terms of computational complexity. Furthermore, the proposed eigen-dictionary considerably improves the accuracy in CE with a compact dictionary size and a drastic reduction in baseband samples by up to 77%. △ Less

Submitted 27 August, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 14 pages, 10 figures

arXiv:2403.08989 [pdf, ps, other]

Maximum Channel Coding Rate of Finite Block Length MIMO Faster-Than-Nyquist Signaling

Authors: Zichao Zhang, Melda Yuksel, Halim Yanikomeroglu, Benjamin K. Ng, Chan-Tong Lam

Abstract: The pursuit of higher data rates and efficient spectrum utilization in modern communication technologies necessitates novel solutions. In order to provide insights into improving spectral efficiency and reducing latency, this study investigates the maximum channel coding rate (MCCR) of finite block length (FBL) multiple-input multiple-output (MIMO) faster-than-Nyquist (FTN) channels. By optimizing… ▽ More The pursuit of higher data rates and efficient spectrum utilization in modern communication technologies necessitates novel solutions. In order to provide insights into improving spectral efficiency and reducing latency, this study investigates the maximum channel coding rate (MCCR) of finite block length (FBL) multiple-input multiple-output (MIMO) faster-than-Nyquist (FTN) channels. By optimizing power allocation, we derive the system's MCCR expression. Simulation results are compared with the existing literature to reveal the benefits of FTN in FBL transmission. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.18074 [pdf, other]

A One-step Image Retargeing Algorithm Based on Conformal Energy

Authors: Chengyang Liu, Michael K. Ng

Abstract: The image retargeting problem is to find a proper mapping to resize an image to one with a prescribed aspect ratio, which is quite popular these days. In this paper, we propose an efficient and orientation-preserving one-step image retargeting algorithm based on minimizing the harmonic energy, which can well preserve the regions of interest (ROIs) and line structures in the image. We also give som… ▽ More The image retargeting problem is to find a proper mapping to resize an image to one with a prescribed aspect ratio, which is quite popular these days. In this paper, we propose an efficient and orientation-preserving one-step image retargeting algorithm based on minimizing the harmonic energy, which can well preserve the regions of interest (ROIs) and line structures in the image. We also give some mathematical proofs in the paper to ensure the well-posedness and accuracy of our algorithm. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 24 pages, 10 figures

arXiv:2402.13572 [pdf, other]

On the Expressive Power of a Variant of the Looped Transformer

Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu

Abstract: Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by t… ▽ More Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer block, dubbed Algorithm Transformer (abbreviated as AlgoFormer). Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can achieve significantly higher expressiveness in algorithm representation when using the same number of parameters. In particular, inspired by the structure of human-designed learning algorithms, our transformer block consists of a pre-transformer that is responsible for task pre-processing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some challenging tasks. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.11412 [pdf, other]

Predicting Maximum Permitted Process Forces for Object Grasping and Manipulation Using a Deep Learning Regression Model

Authors: S. Wucherer, R. McMurray, K. Y. Ng, F. Kerber

Abstract: During the execution of handling processes in manufacturing, it is difficult to measure the process forces with state-of-the-art gripper systems since they usually lack integrated sensors. Thus, the exact state of the gripped object and the actuating process forces during manipulation and handling are unknown. This paper proposes a deep learning regression model to construct a continuous stability… ▽ More During the execution of handling processes in manufacturing, it is difficult to measure the process forces with state-of-the-art gripper systems since they usually lack integrated sensors. Thus, the exact state of the gripped object and the actuating process forces during manipulation and handling are unknown. This paper proposes a deep learning regression model to construct a continuous stability metric to predict the maximum process forces on the gripped objects using high-resolution optical tactile sensors. A pull experiment was developed to obtain a valid dataset for training. Continuously force-based labeled pairs of tactile images for varying grip positions of industrial gearbox parts were acquired to train a novel neural network inspired by encoder-decoder architectures. A ResNet-18 model was used for comparison. Both models can predict the maximum process force for each object with a precision of less than 1 N. During validation, the generalization potential of the proposed methodology with respect to previously unknown objects was demonstrated with an accuracy of 0.4-2.1 N and precision of 1.7-3.4 N, respectively. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures, 3 tables, to be submitted as a conference paper to IEEE CCTA2024

arXiv:2401.15280 [pdf, ps, other]

Analytical Framework for Effective Degrees of Freedom in Near-Field XL-MIMO

Authors: Zhe Wang, Jiayi Zhang, Wenhui Yi, Hongyang Du, Dusit Niyato, Bo Ai, Derrick Wing Kwan Ng

Abstract: In this paper, we develop an effective degrees of freedom (EDoF) performance analysis framework specifically tailored for near-field XL-MIMO systems. We explore five representative distinct XL-MIMO hardware designs, including uniform planar array (UPA)-based with point antennas, two-dimensional (2D) continuous aperture (CAP) plane-based, UPA-based with patch antennas, uniform linear array (ULA)-ba… ▽ More In this paper, we develop an effective degrees of freedom (EDoF) performance analysis framework specifically tailored for near-field XL-MIMO systems. We explore five representative distinct XL-MIMO hardware designs, including uniform planar array (UPA)-based with point antennas, two-dimensional (2D) continuous aperture (CAP) plane-based, UPA-based with patch antennas, uniform linear array (ULA)-based, and one-dimensional (1D) CAP line segment-based XL-MIMO systems. Our analysis encompasses two near-field channel models: the scalar and dyadic Green's function-based channel models. More importantly, when applying the scalar Green's function-based channel, we derive EDoF expressions in the closed-form, characterizing the impacts of the physical size of the transceiver, the transmitting distance, and the carrier frequency. In our numerical results, we evaluate and compare the EDoF performance across all examined XL-MIMO designs, confirming the accuracy of our proposed closed-form expressions. Furthermore, we observe that with an increasing number of antennas, the EDoF performance for both UPA-based and ULA-based systems approaches that of 2D CAP plane and 1D CAP line segment-based systems, respectively. Moreover, we unveil that the EDoF performance for near-field XL-MIMO systems is predominantly determined by the array aperture size rather than the sheer number of antennas. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 32 pages, 11 figures. This paper has been submitted to IEEE journal for possible publication

arXiv:2401.14008 [pdf, other]

Massive Unsourced Random Access for Near-Field Communications

Authors: Xinyu Xie, Yongpeng Wu, Jianping An, Derrick Wing Kwan Ng, Chengwen Xing, Wenjun Zhang

Abstract: This paper investigates the unsourced random access (URA) problem with a massive multiple-input multiple-output receiver that serves wireless devices in the near-field of radiation. We employ an uncoupled transmission protocol without appending redundancies to the slot-wise encoded messages. To exploit the channel sparsity for block length reduction while facing the collapsed sparse structure in t… ▽ More This paper investigates the unsourced random access (URA) problem with a massive multiple-input multiple-output receiver that serves wireless devices in the near-field of radiation. We employ an uncoupled transmission protocol without appending redundancies to the slot-wise encoded messages. To exploit the channel sparsity for block length reduction while facing the collapsed sparse structure in the angular domain of near-field channels, we propose a sparse channel sampling method that divides the angle-distance (polar) domain based on the maximum permissible coherence. Decoding starts with retrieving active codewords and channels from each slot. We address the issue by leveraging the structured channel sparsity in the spatial and polar domains and propose a novel turbo-based recovery algorithm. Furthermore, we investigate an off-grid compressed sensing method to refine discretely estimated channel parameters over the continuum that improves the detection performance. Afterward, without the assistance of redundancies, we recouple the separated messages according to the similarity of the users' channel information and propose a modified K-medoids method to handle the constraints and collisions involved in channel clustering. Simulations reveal that via exploiting the channel sparsity, the proposed URA scheme achieves high spectral efficiency and surpasses existing multi-slot-based schemes. Moreover, with more measurements provided by the overcomplete channel sampling, the near-field-suited scheme outperforms its counterpart of the far-field. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted by IEEE Transactions on Communications

arXiv:2401.12472 [pdf, other]

Contrastive Learning in Distilled Models

Authors: Valerie Lim, Kai Wen Ng, Kenneth Lim

Abstract: Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distilla… ▽ More Natural Language Processing models like BERT can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on Semantic Textual Similarity, and may be too large to be deployed as lightweight edge applications. We seek to apply a suitable contrastive learning method based on the SimCSE paper, to a model architecture adapted from a knowledge distillation based model, DistilBERT, to address these two issues. Our final lightweight model DistilFace achieves an average of 72.1 in Spearman's correlation on STS tasks, a 34.2 percent improvement over BERT base. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.09495

IPR-NeRF: Ownership Verification meets Neural Radiance Field

Authors: Win Kent Ong, Kam Woh Ng, Chee Seng Chan, Yi Zhe Song, Tao Xiang

Abstract: Neural Radiance Field (NeRF) models have gained significant attention in the computer vision community in the recent past with state-of-the-art visual quality and produced impressive demonstrations. Since then, technopreneurs have sought to leverage NeRF models into a profitable business. Therefore, NeRF models make it worth the risk of plagiarizers illegally copying, re-distributing, or misusing… ▽ More Neural Radiance Field (NeRF) models have gained significant attention in the computer vision community in the recent past with state-of-the-art visual quality and produced impressive demonstrations. Since then, technopreneurs have sought to leverage NeRF models into a profitable business. Therefore, NeRF models make it worth the risk of plagiarizers illegally copying, re-distributing, or misusing those models. This paper proposes a comprehensive intellectual property (IP) protection framework for the NeRF model in both black-box and white-box settings, namely IPR-NeRF. In the black-box setting, a diffusion-based solution is introduced to embed and extract the watermark via a two-stage optimization process. In the white-box setting, a designated digital signature is embedded into the weights of the NeRF model by adopting the sign loss objective. Our extensive experiments demonstrate that not only does our approach maintain the fidelity (\ie, the rendering quality) of IPR-NeRF models, but it is also robust against both ambiguity and removal attacks compared to prior arts. △ Less

Submitted 22 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Error on result tabulation of state of the art method which might cause misleading to readers

arXiv:2401.08402 [pdf, other]

Uniform Recovery Guarantees for Quantized Corrupted Sensing Using Structured or Generative Priors

Authors: Junren Chen, Zhaoqiang Liu, Meng Ding, Michael K. Ng

Abstract: This paper studies quantized corrupted sensing where the measurements are contaminated by unknown corruption and then quantized by a dithered uniform quantizer. We establish uniform guarantees for Lasso that ensure the accurate recovery of all signals and corruptions using a single draw of the sub-Gaussian sensing matrix and uniform dither. For signal and corruption with structured priors (e.g., s… ▽ More This paper studies quantized corrupted sensing where the measurements are contaminated by unknown corruption and then quantized by a dithered uniform quantizer. We establish uniform guarantees for Lasso that ensure the accurate recovery of all signals and corruptions using a single draw of the sub-Gaussian sensing matrix and uniform dither. For signal and corruption with structured priors (e.g., sparsity, low-rankness), our uniform error rate for constrained Lasso typically coincides with the non-uniform one [Sun, Cui and Liu, 2022] up to logarithmic factors. By contrast, our uniform error rate for unconstrained Lasso exhibits worse dependence on the structured parameters due to regularization parameters larger than the ones for non-uniform recovery. For signal and corruption living in the ranges of some Lipschitz continuous generative models (referred to as generative priors), we achieve uniform recovery via constrained Lasso with a measurement number proportional to the latent dimensions of the generative models. Our treatments to the two kinds of priors are (nearly) unified and share the common key ingredients of (global) quantized product embedding (QPE) property, which states that the dithered uniform quantization (universally) preserves inner product. As a by-product, our QPE result refines the one in [Xu and Jacques, 2020] under sub-Gaussian random matrix, and in this specific instance we are able to sharpen the uniform error decaying rate (for the projected-back projection estimator with signals in some convex symmetric set) presented therein from $O(m^{-1/16})$ to $O(m^{-1/8})$. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 69 pages, 11 figures (In Review)

arXiv:2401.01738 [pdf, other]

doi 10.1109/TWC.2024.3351856

Integrated Sensing and Communication with Massive MIMO: A Unified Tensor Approach for Channel and Target Parameter Estimation

Authors: Ruoyu Zhang, Lei Cheng, Shuai Wang, Yi Lou, Yulong Gao, Wen Wu, Derrick Wing Kwan Ng

Abstract: Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channe… ▽ More Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channel state information and target parameter information. To overcome these two challenges with a unified framework, we first analyze their underlying system models and then propose a novel tensor-based approach that addresses both the channel estimation and target sensing problems. Specifically, by parameterizing the high-dimensional communication channel exploiting a small number of physical parameters, we associate the channel state information with the sensing parameters of targets in terms of angular, delay, and Doppler dimensions. Then, we propose a shared training pattern adopting the same time-frequency resources such that both the channel estimation and target parameter estimation can be formulated as a canonical polyadic decomposition problem with a similar mathematical expression. On this basis, we first investigate the uniqueness condition of the tensor factorization and the maximum number of resolvable targets by utilizing the specific Vandermonde △ Less

Submitted 3 January, 2024; originally announced January 2024.

Journal ref: IEEE Transactions on Wireless Communications, 2024

arXiv:2312.16184 [pdf, other]

Dynamic Knowledge Injection for AIXI Agents

Authors: Samuel Yang-Zhao, Kee Siong Ng, Marcus Hutter

Abstract: Prior approximations of AIXI, a Bayesian optimality notion for general reinforcement learning, can only approximate AIXI's Bayesian environment model using an a-priori defined set of models. This is a fundamental source of epistemic uncertainty for the agent in settings where the existence of systematic bias in the predefined model class cannot be resolved by simply collecting more data from the e… ▽ More Prior approximations of AIXI, a Bayesian optimality notion for general reinforcement learning, can only approximate AIXI's Bayesian environment model using an a-priori defined set of models. This is a fundamental source of epistemic uncertainty for the agent in settings where the existence of systematic bias in the predefined model class cannot be resolved by simply collecting more data from the environment. We address this issue in the context of Human-AI teaming by considering a setup where additional knowledge for the agent in the form of new candidate models arrives from a human operator in an online fashion. We introduce a new agent called DynamicHedgeAIXI that maintains an exact Bayesian mixture over dynamically changing sets of models via a time-adaptive prior constructed from a variant of the Hedge algorithm. The DynamicHedgeAIXI agent is the richest direct approximation of AIXI known to date and comes with good performance guarantees. Experimental results on epidemic control on contact networks validates the agent's practical utility. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 16 pages, 2 figures, extended length version of paper to be published in AAAI2024

arXiv:2312.15231 [pdf, ps, other]

Sensing-Enhanced Secure Communication: Joint Time Allocation and Beamforming Design

Authors: Dongfang Xu, Yiming Xu, Zhiqiang Wei, Shenghui Song, Derrick Wing Kwan Ng

Abstract: The integration of sensing and communication enables wireless communication systems to serve environment-aware applications. In this paper, we propose to leverage sensing to enhance physical layer security (PLS) in multiuser communication systems in the presence of a suspicious target. To this end, we develop a two-phase framework to first estimate the location of the potential eavesdropper by sen… ▽ More The integration of sensing and communication enables wireless communication systems to serve environment-aware applications. In this paper, we propose to leverage sensing to enhance physical layer security (PLS) in multiuser communication systems in the presence of a suspicious target. To this end, we develop a two-phase framework to first estimate the location of the potential eavesdropper by sensing and then utilize the estimated information to enhance PLS for communication. In particular, in the first phase, a dual-functional radar and communication (DFRC) base station (BS) exploits a sensing signal to mitigate the sensing information uncertainty of the potential eavesdropper. Then, in the second phase, to facilitate joint sensing and secure communication, the DFRC BS employs beamforming and artificial noise to enhance secure communication. The design objective is to maximize the system sum rate while alleviating the information leakage by jointly optimizing the time allocation and beamforming policy. Capitalizing on monotonic optimization theory, we develop a two-layer globally optimal algorithm to reveal the performance upper bound of the considered system. Simulation results show that the proposed scheme achieves a significant sum rate gain over two baseline schemes that adopt existing techniques. Moreover, our results unveil that ISAC is a promising paradigm for enhancing secure communication in wireless networks. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 9 pages,3 figures

arXiv:2312.09022 [pdf, other]

BDHT: Generative AI Enables Causality Analysis for Mild Cognitive Impairment

Authors: Qiankun Zuo, Ling Chen, Yanyan Shen, Michael Kwok-Po Ng, Baiying Lei, Shuqiang Wang

Abstract: Effective connectivity estimation plays a crucial role in understanding the interactions and information flow between different brain regions. However, the functional time series used for estimating effective connectivity is derived from certain software, which may lead to large computing errors because of different parameter settings and degrade the ability to model complex causal relationships b… ▽ More Effective connectivity estimation plays a crucial role in understanding the interactions and information flow between different brain regions. However, the functional time series used for estimating effective connectivity is derived from certain software, which may lead to large computing errors because of different parameter settings and degrade the ability to model complex causal relationships between brain regions. In this paper, a brain diffuser with hierarchical transformer (BDHT) is proposed to estimate effective connectivity for mild cognitive impairment (MCI) analysis. To our best knowledge, the proposed brain diffuser is the first generative model to apply diffusion models to the application of generating and analyzing multimodal brain networks. Specifically, the BDHT leverages structural connectivity to guide the reverse processes in an efficient way. It makes the denoising process more reliable and guarantees effective connectivity estimation accuracy. To improve denoising quality, the hierarchical denoising transformer is designed to learn multi-scale features in topological space. By stacking the multi-head attention and graph convolutional network, the graph convolutional transformer (GraphConformer) module is devised to enhance structure-function complementarity and improve the ability in noise estimation. Experimental evaluations of the denoising diffusion model demonstrate its effectiveness in estimating effective connectivity. The proposed model achieves superior performance in terms of accuracy and robustness compared to existing approaches. Moreover, the proposed model can identify altered directional connections and provide a comprehensive understanding of parthenogenesis for MCI treatment. △ Less

Submitted 28 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 13pages, 14 figures

arXiv:2312.04355 [pdf, other]

Secure Cell-Free Integrated Sensing and Communication in the Presence of Information and Sensing Eavesdroppers

Authors: Zixiang Ren, Jie Xu, Ling Qiu, Derrick Wing Kwan Ng

Abstract: This paper studies a secure cell-free integrated sensing and communication (ISAC) system, in which multiple ISAC transmitters collaboratively send confidential information to multiple communication users (CUs) and concurrently conduct target detection. Different from prior works investigating communication security against potential information eavesdropping, we consider the security of both commu… ▽ More This paper studies a secure cell-free integrated sensing and communication (ISAC) system, in which multiple ISAC transmitters collaboratively send confidential information to multiple communication users (CUs) and concurrently conduct target detection. Different from prior works investigating communication security against potential information eavesdropping, we consider the security of both communication and sensing in the presence of both information and sensing eavesdroppers that aim to intercept confidential communication information and extract target information, respectively. Towards this end, we optimize the joint information and sensing transmit beamforming at these ISAC transmitters for secure cell-free ISAC. Our objective is to maximize the detection probability over a designated sensing area while ensuring the minimum signal-to-interference-plus-noise-ratio (SINR) requirements at CUs. Our formulation also takes into account the maximum tolerable signal-to-noise ratio (SNR) at information eavesdroppers for ensuring the confidentiality of information transmission, and the maximum detection probability constraints at sensing eavesdroppers for preserving sensing privacy. The formulated secure joint transmit beamforming problem is highly non-convex due to the intricate interplay between the detection probabilities, beamforming vectors, and SINR constraints. Fortunately, through strategic manipulation and via applying the semidefinite relaxation (SDR) technique, we successfully obtain the globally optimal solution to the design problem by rigorously verifying the tightness of SDR. Furthermore, we present two alternative joint beamforming designs based on the sensing SNR maximization over the specific sensing area and the coordinated beamforming, respectively. Numerical results reveal the benefits of our proposed design over these alternative benchmarks. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 13 pages

arXiv:2312.00857 [pdf, other]

Latent Space Explorer: Visual Analytics for Multimodal Latent Space Exploration

Authors: Bum Chul Kwon, Samuel Friedman, Kai Xu, Steven A Lubitz, Anthony Philippakis, Puneet Batra, Patrick T Ellinor, Kenney Ng

Abstract: Machine learning models built on training data with multiple modalities can reveal new insights that are not accessible through unimodal datasets. For example, cardiac magnetic resonance images (MRIs) and electrocardiograms (ECGs) are both known to capture useful information about subjects' cardiovascular health status. A multimodal machine learning model trained from large datasets can potentiall… ▽ More Machine learning models built on training data with multiple modalities can reveal new insights that are not accessible through unimodal datasets. For example, cardiac magnetic resonance images (MRIs) and electrocardiograms (ECGs) are both known to capture useful information about subjects' cardiovascular health status. A multimodal machine learning model trained from large datasets can potentially predict the onset of heart-related diseases and provide novel medical insights about the cardiovascular system. Despite the potential benefits, it is difficult for medical experts to explore multimodal representation models without visual aids and to test the predictive performance of the models on various subpopulations. To address the challenges, we developed a visual analytics system called Latent Space Explorer. Latent Space Explorer provides interactive visualizations that enable users to explore the multimodal representation of subjects, define subgroups of interest, interactively decode data with different modalities with the selected subjects, and inspect the accuracy of the embedding in downstream prediction tasks. A user study was conducted with medical experts and their feedback provided useful insights into how Latent Space Explorer can help their analysis and possible new direction for further development in the medical domain. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 7 pages, 5 figures

arXiv:2311.15477 [pdf, other]

DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

Authors: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

Abstract: Recent text-to-image (T2I) generative models allow for high-quality synthesis following either text instructions or visual examples. Despite their capabilities, these models face limitations in creating new, detailed creatures within specific categories (e.g., virtual dog or bird species), which are valuable in digital asset creation and biodiversity analysis. To bridge this gap, we introduce a no… ▽ More Recent text-to-image (T2I) generative models allow for high-quality synthesis following either text instructions or visual examples. Despite their capabilities, these models face limitations in creating new, detailed creatures within specific categories (e.g., virtual dog or bird species), which are valuable in digital asset creation and biodiversity analysis. To bridge this gap, we introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts (e.g., 200 bird species), we aim to train a T2I model capable of creating new, hybrid concepts within diverse backgrounds and contexts. We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts (e.g., body parts of a specific species) in an unsupervised manner. The T2I thus adapts to generate novel concepts (e.g., new bird species) with faithful structures and photorealistic appearance by seamlessly and flexibly composing learned sub-concepts. To enhance sub-concept fidelity and disentanglement, we extend the textual inversion technique by incorporating an additional projector and tailored attention loss regularization. Extensive experiments on two fine-grained image benchmarks demonstrate the superiority of DreamCreature over prior methods in both qualitative and quantitative evaluation. Ultimately, the learned sub-concepts facilitate diverse creative applications, including innovative consumer product designs and nuanced property modifications. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: Website: https://github.com/kamwoh/dreamcreature

arXiv:2311.13139 [pdf, ps, other]

Joint Distributed Precoding and Beamforming for RIS-aided Cell-Free Massive MIMO Systems

Authors: Peng Zhang, Jiayi Zhang, Huahua Xiao, Xiaodan Zhang, Derrick Wing Kwan Ng, Bo Ai

Abstract: The amalgamation of cell-free networks and reconfigurable intelligent surface (RIS) has become a prospective technique for future sixth-generation wireless communication systems. In this paper, we focus on the precoding and beamforming design for a downlink RIS-aided cell-free network. The design is formulated as a non-convex optimization problem by jointly optimizing the combining vector, active… ▽ More The amalgamation of cell-free networks and reconfigurable intelligent surface (RIS) has become a prospective technique for future sixth-generation wireless communication systems. In this paper, we focus on the precoding and beamforming design for a downlink RIS-aided cell-free network. The design is formulated as a non-convex optimization problem by jointly optimizing the combining vector, active precoding, and passive RIS beamforming for minimizing the weighted sum of users' mean square error. A novel joint distributed precoding and beamforming framework is proposed to decentralize the alternating optimization method for acquiring a suboptimal solution to the design problem. Finally, numerical results validate the effectiveness of the proposed distributed precoding and beamforming framework, showing its low-complexity and improved scalability compared with the centralized method. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.09814 [pdf, ps, other]

Stacked Intelligent Metasurface-Aided MIMO Transceiver Design

Authors: Jiancheng An, Chau Yuen, Chao Xu, Hongbin Li, Derrick Wing Kwan Ng, Marco Di Renzo, Mérouane Debbah, Lajos Hanzo

Abstract: Next-generation wireless networks are expected to utilize the limited radio frequency (RF) resources more efficiently with the aid of intelligent transceivers. To this end, we propose a promising transceiver architecture relying on stacked intelligent metasurfaces (SIM). An SIM is constructed by stacking an array of programmable metasurface layers, where each layer consists of a massive number of… ▽ More Next-generation wireless networks are expected to utilize the limited radio frequency (RF) resources more efficiently with the aid of intelligent transceivers. To this end, we propose a promising transceiver architecture relying on stacked intelligent metasurfaces (SIM). An SIM is constructed by stacking an array of programmable metasurface layers, where each layer consists of a massive number of low-cost passive meta-atoms that individually manipulate the electromagnetic (EM) waves. By appropriately configuring the passive meta-atoms, an SIM is capable of accomplishing advanced computation and signal processing tasks, such as multiple-input multiple-output (MIMO) precoding/combining, multi-user interference mitigation, and radar sensing, as the EM wave propagates through the multiple layers of the metasurface, which effectively reduces both the RF-related energy consumption and processing delay. Inspired by this, we provide an overview of the SIM-aided MIMO transceiver design, which encompasses its hardware architecture and its potential benefits over state-of-the-art solutions. Furthermore, we discuss promising application scenarios and identify the open research challenges associated with the design of advanced SIM architectures for next-generation wireless networks. Finally, numerical results are provided for quantifying the benefits of wave-based signal processing in wireless systems. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 9 pages, 5 figures, 1 table

arXiv:2311.08562 [pdf, other]

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Authors: Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng

Abstract: Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework that captures their abilities in reasoning, planning, collaboration, and more. This work int… ▽ More Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework that captures their abilities in reasoning, planning, collaboration, and more. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings, providing quantitative metrics to evaluate their judgment, reasoning, deception, self-awareness, cooperation, coordination, and rationality. We utilize games such as Chameleon and Undercover, alongside game theory scenarios like Cost Sharing, Multi-player Prisoner's Dilemma, and Public Good, to create diverse testing environments. Our framework is fortified with the Probabilistic Graphical Modeling (PGM) method, enhancing the LLMs' capabilities in navigating complex social and cognitive dimensions. The benchmark evaluates seven multi-agent systems powered by different LLMs, quantitatively highlighting a significant capability gap over threefold between the strongest, GPT-4, and the weakest, Llama-2-70B. It also confirms that our PGM enhancement boosts the inherent abilities of all selected models by 50% on average. Our codes are released here https://github.com/cathyxl/MAgIC. △ Less

Submitted 16 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: work in progress

arXiv:2311.07091 [pdf, ps, other]

Code-Aided Channel Estimation in LDPC-Coded MIMO Systems

Authors: Binghui Shi, Yongpeng Wu, Peihong Yuan, Derrick Wing Kwan Ng, Xiang-Gen Xia, Wenjun Zhang

Abstract: For a multiple-input multiple-output (MIMO) system with unknown channel state information (CSI), a novel low-density parity check (LDPC)-coded transmission (LCT) scheme with joint pilot and data channel estimation is proposed. To fine-tune the CSI, a method based on the constraints introduced by the coded data from an LDPC code is designed such that the MIMO detector exploits the fine-tuned CSI. F… ▽ More For a multiple-input multiple-output (MIMO) system with unknown channel state information (CSI), a novel low-density parity check (LDPC)-coded transmission (LCT) scheme with joint pilot and data channel estimation is proposed. To fine-tune the CSI, a method based on the constraints introduced by the coded data from an LDPC code is designed such that the MIMO detector exploits the fine-tuned CSI. For reducing the computational burden, a coordinate ascent algorithm is employed along with several approximation methods, effectively reducing the required times of MIMO detection and computational complexity to achieve a satisfying performance. Simulation results utilizing WiMAX standard LDPC codes and quadrature phase-shift keying (QPSK) modulation demonstrate gains of up to 1.3 dB at a frame error rate (FER) of $10^{-4}$ compared to pilot-assisted transmission (PAT) over Rayleigh block-fading channels. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: This paper has been accepted by IEEE Wireless Communications Letters

arXiv:2311.06770 [pdf, other]

Compressive Sensing-Based Grant-Free Massive Access for 6G Massive Communication

Authors: Zhen Gao, Malong Ke, Yikun Mei, Li Qiao, Sheng Chen, Derrick Wing Kwan Ng, H. Vincent Poor

Abstract: The advent of the sixth-generation (6G) of wireless communications has given rise to the necessity to connect vast quantities of heterogeneous wireless devices, which requires advanced system capabilities far beyond existing network architectures. In particular, such massive communication has been recognized as a prime driver that can empower the 6G vision of future ubiquitous connectivity, suppor… ▽ More The advent of the sixth-generation (6G) of wireless communications has given rise to the necessity to connect vast quantities of heterogeneous wireless devices, which requires advanced system capabilities far beyond existing network architectures. In particular, such massive communication has been recognized as a prime driver that can empower the 6G vision of future ubiquitous connectivity, supporting Internet of Human-Machine-Things for which massive access is critical. This paper surveys the most recent advances toward massive access in both academic and industry communities, focusing primarily on the promising compressive sensing-based grant-free massive access paradigm. We first specify the limitations of existing random access schemes and reveal that the practical implementation of massive communication relies on a dramatically different random access paradigm from the current ones mainly designed for human-centric communications. Then, a compressive sensing-based grant-free massive access roadmap is presented, where the evolutions from single-antenna to large-scale antenna array-based base stations, from single-station to cooperative massive multiple-input multiple-output systems, and from unsourced to sourced random access scenarios are detailed. Finally, we discuss the key challenges and open issues to shed light on the potential future research directions of grant-free massive access. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: Accepted by IEEE IoT Journal

arXiv:2311.06002 [pdf, other]

Fully-Passive versus Semi-Passive IRS-Enabled Sensing: SNR and CRB Comparison

Authors: Xianxin Song, Xinmin Li, Xiaoqi Qin, Jie Xu, Tony Xiao Han, Derrick Wing Kwan Ng

Abstract: This paper investigates the sensing performance of two intelligent reflecting surface (IRS)-enabled non-line-of-sight (NLoS) sensing systems with fully-passive and semi-passive IRSs, respectively. In particular, we consider a fundamental setup with one base station (BS), one uniform linear array (ULA) IRS, and one point target in the NLoS region of the BS. Accordingly, we analyze the sensing signa… ▽ More This paper investigates the sensing performance of two intelligent reflecting surface (IRS)-enabled non-line-of-sight (NLoS) sensing systems with fully-passive and semi-passive IRSs, respectively. In particular, we consider a fundamental setup with one base station (BS), one uniform linear array (ULA) IRS, and one point target in the NLoS region of the BS. Accordingly, we analyze the sensing signal-to-noise ratio (SNR) performance for a target detection scenario and the estimation Cramér-Rao bound (CRB) performance for a target's direction-of-arrival (DoA) estimation scenario, in cases where the transmit beamforming at the BS and the reflective beamforming at the IRS are jointly optimized. First, for the target detection scenario, we characterize the maximum sensing SNR when the BS-IRS channels are line-of-sight (LoS) and Rayleigh fading, respectively. It is revealed that when the number of reflecting elements $N$ equipped at the IRS becomes sufficiently large, the maximum sensing SNR increases proportionally to $N^2$ for the semi-passive-IRS sensing system, but proportionally to $N^4$ for the fully-passive-IRS counterpart. Then, for the target's DoA estimation scenario, we analyze the minimum CRB performance when the BS-IRS channel follows Rayleigh fading. Specifically, when $N$ grows, the minimum CRB decreases inversely proportionally to $N^4$ and $N^6$ for the semi-passive and fully-passive-IRS sensing systems, respectively. Finally, numerical results are presented to corroborate our analysis across various transmit and reflective beamforming design schemes under general channel setups. It is shown that the fully-passive-IRS sensing system outperforms the semi-passive counterpart when $N$ exceeds a certain threshold. This advantage is attributed to the additional reflective beamforming gain in the IRS-BS path, which efficiently compensates for the path loss for a large $N$. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 13 pages,7 figures

arXiv:2311.05907 [pdf, other]

Sensing-Assisted Sparse Channel Recovery for Massive Antenna Systems

Authors: Zixiang Ren, Ling Qiu, Jie Xu, Derrick Wing Kwan Ng

Abstract: This correspondence presents a novel sensing-assisted sparse channel recovery approach for massive antenna wireless communication systems. We focus on a fundamental configuration with one massive-antenna base station (BS) and one single-antenna communication user (CU). The wireless channel exhibits sparsity and consists of multiple paths associated with scatterers detectable via radar sensing. Und… ▽ More This correspondence presents a novel sensing-assisted sparse channel recovery approach for massive antenna wireless communication systems. We focus on a fundamental configuration with one massive-antenna base station (BS) and one single-antenna communication user (CU). The wireless channel exhibits sparsity and consists of multiple paths associated with scatterers detectable via radar sensing. Under this setup, the BS first sends downlink pilots to the CU and concurrently receives the echo pilot signals for sensing the surrounding scatterers. Subsequently, the CU sends feedback information on its received pilot signal to the BS. Accordingly, the BS determines the sparse basis based on the sensed scatterers and proceeds to recover the wireless channel, exploiting the feedback information based on advanced compressive sensing (CS) algorithms. Numerical results show that the proposed sensing-assisted approach significantly increases the overall achievable rate than the conventional design relying on a discrete Fourier transform (DFT)-based sparse basis without sensing, thanks to the reduced training overhead and enhanced recovery accuracy with limited feedback. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 5 pages, 4 figs

arXiv:2311.02900 [pdf, other]

doi 10.1109/ICRA48506.2021.9561575

Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation

Authors: Xueyan Oh, Leonard Loh, Shaohui Foong, Zhong Bao Andy Koh, Kow Leong Ng, Poh Kang Tan, Pei Lin Pearlin Toh, U-Xuan Tan

Abstract: General Visual Inspection is a manual inspection process regularly used to detect and localise obvious damage on the exterior of commercial aircraft. There has been increasing demand to perform this process at the boarding gate to minimize the downtime of the aircraft and automating this process is desired to reduce the reliance on human labour. This automation typically requires the first step of… ▽ More General Visual Inspection is a manual inspection process regularly used to detect and localise obvious damage on the exterior of commercial aircraft. There has been increasing demand to perform this process at the boarding gate to minimize the downtime of the aircraft and automating this process is desired to reduce the reliance on human labour. This automation typically requires the first step of estimating a camera's pose with respect to the aircraft for initialisation. However, localisation methods often require infrastructure, which can be very challenging when performed in uncontrolled outdoor environments and within the limited turnover time (approximately 2 hours) on an airport tarmac. In addition, access to commercial aircraft can be very restricted, causing development and testing of solutions to be a challenge. Hence, this paper proposes an on-site infrastructure-less initialisation method, by using the same pan-tilt-zoom camera used for the inspection task to estimate its own pose. This is achieved using a Deep Convolutional Neural Network trained with only synthetic images to regress the camera's pose. We apply domain randomisation when generating our dataset for training our network and improve prediction accuracy by introducing a new component to an existing loss function that leverages on known aircraft geometry to relate position and orientation. Experiments are conducted and we have successfully regressed camera poses with a median error of 0.22 m and 0.73 degrees. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: This paper has been accepted by 2021 IEEE International Conference on Robotics and Automation (ICRA) with DOI: 10.1109/ICRA48506.2021.9561575

arXiv:2310.19852 [pdf, other]

AI Alignment: A Comprehensive Survey

Authors: Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

Abstract: AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness,… ▽ More AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey, we delve into the core concepts, methodology, and practice of alignment. First, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss techniques for learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources. △ Less

Submitted 1 May, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Continually updated, including weak-to-strong generalization and socio-technical thinking. 58 pages (excluding bibliography), 801 references

arXiv:2310.18180 [pdf, other]

DPSS-based Codebook Design for Near-Field XL-MIMO Channel Estimation

Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Derrick Wing Kwan Ng

Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. While accurate channel estimation is essential for beamforming and data detection, the unique characteristics of near-field channels pose additional challenges to the effective acquisition of channel… ▽ More Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. While accurate channel estimation is essential for beamforming and data detection, the unique characteristics of near-field channels pose additional challenges to the effective acquisition of channel state information. In this paper, we propose a novel codebook design, which allows efficient near-field channel estimation with significantly reduced codebook size. Specifically, we consider the eigen-problem based on the near-field electromagnetic wave transmission model. Moreover, we derive the general form of the eigenvectors associated with the near-field channel matrix, revealing their noteworthy connection to the discrete prolate spheroidal sequence (DPSS). Based on the proposed near-field codebook design, we further introduce a two-step channel estimation scheme. Simulation results demonstrate that the proposed codebook design not only achieves superior sparsification performance of near-field channels with a lower leakage effect, but also significantly improves the accuracy in compressive sensing channel estimation. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 6 pages, 5 figures

arXiv:2310.16684 [pdf, other]

Local Statistics for Generative Image Detection

Authors: Yung Jer Wong, Teck Khim Ng

Abstract: Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvement in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesi… ▽ More Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvement in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesized images. In this paper, we highlight the effectiveness of computing local statistics, as opposed to global statistics, in distinguishing digital camera images from DM-generated images. We hypothesized that local statistics should be used to address the spatial non-stationarity problem in images. We show that our approach produced promising results and it is also robust to various perturbations such as image resizing and JPEG compression. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Showing 1–50 of 464 results for author: Ng, K