Search | arXiv e-print repository

Joint Source-Channel Optimization for UAV Video Coding and Transmission

Authors: Kesong Wu, Xianbin Cao, Peng Yang, Haijun Zhang, Tony Q. S. Quek, Dapeng Oliver Wu

Abstract: This paper is concerned with unmanned aerial vehicle (UAV) video coding and transmission in scenarios such as emergency rescue and environmental monitoring. Unlike existing methods of modeling UAV video source coding and channel transmission separately, we investigate the joint source-channel optimization issue for video coding and transmission. Particularly, we design eight-dimensional delay-powe… ▽ More This paper is concerned with unmanned aerial vehicle (UAV) video coding and transmission in scenarios such as emergency rescue and environmental monitoring. Unlike existing methods of modeling UAV video source coding and channel transmission separately, we investigate the joint source-channel optimization issue for video coding and transmission. Particularly, we design eight-dimensional delay-power-rate-distortion models in terms of source coding and channel transmission and characterize the correlation between video coding and transmission, with which a joint source-channel optimization problem is formulated. Its objective is to minimize end-to-end distortion and UAV power consumption by optimizing fine-grained parameters related to UAV video coding and transmission. This problem is confirmed to be a challenging sequential-decision and non-convex optimization problem. We therefore decompose it into a family of repeated optimization problems by Lyapunov optimization and design an approximate convex optimization scheme with provable performance guarantees to tackle these problems. Based on the theoretical transformation, we propose a Lyapunov repeated iteration (LyaRI) algorithm. Both objective and subjective experiments are conducted to comprehensively evaluate the performance of LyaRI. The results indicate that, compared to its counterparts, LyaRI achieves better video quality and stability performance, with a 47.74% reduction in the variance of the obtained encoding bitrate. △ Less

Submitted 19 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.04192 [pdf, ps, other]

Pilot-Aided Joint Time Synchronization and Channel Estimation for OTFS

Authors: Jiazheng Sun, Peng Yang, Xianbin Cao, Zehui Xiong, Haijun Zhang, Tony Q. S. Quek

Abstract: This letter proposes a pilot-aided joint time synchronization and channel estimation (JTSCE) algorithm for orthogonal time frequency space (OTFS) systems. Unlike existing algorithms, JTSCE employs a maximum length sequence (MLS) rather than an isolated signal as the pilot. Distinctively, JTSCE explores MLS's autocorrelation properties to estimate timing offset and channel delay taps. After obtaini… ▽ More This letter proposes a pilot-aided joint time synchronization and channel estimation (JTSCE) algorithm for orthogonal time frequency space (OTFS) systems. Unlike existing algorithms, JTSCE employs a maximum length sequence (MLS) rather than an isolated signal as the pilot. Distinctively, JTSCE explores MLS's autocorrelation properties to estimate timing offset and channel delay taps. After obtaining delay taps, closed-form expressions of Doppler and channel gain for each propagation path are derived. Simulation results indicate that, compared to its counterpart, JTSCE achieves better bit error rate performance, close to that with perfect time synchronization and channel state information. △ Less

Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

arXiv:2407.21395 [pdf, other]

HINER: Neural Representation for Hyperspectral Image

Authors: Junqi Shi, Mingyi Jiang, Ming Lu, Tong Chen, Xun Cao, Zhan Ma

Abstract: This paper introduces {HINER}, a novel neural representation for compressing HSI and ensuring high-quality downstream tasks on compressed HSI. HINER fully exploits inter-spectral correlations by explicitly encoding of spectral wavelengths and achieves a compact representation of the input HSI sample through joint optimization with a learnable decoder. By additionally incorporating the Content Angl… ▽ More This paper introduces {HINER}, a novel neural representation for compressing HSI and ensuring high-quality downstream tasks on compressed HSI. HINER fully exploits inter-spectral correlations by explicitly encoding of spectral wavelengths and achieves a compact representation of the input HSI sample through joint optimization with a learnable decoder. By additionally incorporating the Content Angle Mapper with the L1 loss, we can supervise the global and local information within each spectral band, thereby enhancing the overall reconstruction quality. For downstream classification on compressed HSI, we theoretically demonstrate the task accuracy is not only related to the classification loss but also to the reconstruction fidelity through a first-order expansion of the accuracy degradation, and accordingly adapt the reconstruction by introducing Adaptive Spectral Weighting. Owing to the monotonic mapping of HINER between wavelengths and spectral bands, we propose Implicit Spectral Interpolation for data augmentation by adding random variables to input wavelengths during classification model training. Experimental results on various HSI datasets demonstrate the superior compression performance of our HINER compared to the existing learned methods and also the traditional codecs. Our model is lightweight and computationally efficient, which maintains high accuracy for downstream classification task even on decoded HSIs at high compression ratios. Our materials will be released at https://github.com/Eric-qi/HINER. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: ACM MM24

arXiv:2407.16554 [pdf, other]

doi 10.1145/3664647.3680585

Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

Authors: Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao

Abstract: Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce… ▽ More Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce a novel coarse-to-fine proposal refinement framework (CFPRF) that incorporates a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization. Specifically, the FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions. The PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN. To learn robust discriminative features, we devise a difference-aware feature learning (DAFL) module guided by contrastive representation learning to enlarge the sensitive differences between different frames induced by minor manipulations. We further design a boundary-aware feature enhancement (BAFE) module to capture the contextual information of multiple transition boundaries and guide the interaction between boundary information and temporal features via a cross-attention mechanism. Extensive experiments show that our CFPRF achieves state-of-the-art performance on various datasets, including LAV-DF, ASVS2019PS, and HAD. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 9pages, 3figures. This paper has been accepted for ACM MM 2024

MSC Class: 68T07; 68T10 ACM Class: I.2; I.5

arXiv:2407.08509 [pdf, other]

Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration

Authors: Shuang Xu, Chang Yu, Jiangjun Peng, Xiangyong Cao

Abstract: Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derive… ▽ More Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derived from the 2-D frontal slice-wise Haar discrete wavelet transform, effectively modeling the low-rank prior for separated coarse-grained structure and fine-grained textures in the image. Experimental evaluations conducted on hyperspectral image inpainting, multi-temporal image cloud removal, and hyperspectral image denoising have revealed the HNN's potential. Typically, HNN achieves a performance improvement of 1-4 dB and a speedup of 10-28x compared to some state-of-the-art methods (e.g., tensor correlated total variation, and fully-connected tensor network) for inpainting tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.06633 [pdf, other]

Variational Zero-shot Multispectral Pansharpening

Authors: Xiangyu Rui, Xiangyong Cao, Yining Li, Deyu Meng

Abstract: Pansharpening aims to generate a high spatial resolution multispectral image (HRMS) by fusing a low spatial resolution multispectral image (LRMS) and a panchromatic image (PAN). The most challenging issue for this task is that only the to-be-fused LRMS and PAN are available, and the existing deep learning-based methods are unsuitable since they rely on many training pairs. Traditional variational… ▽ More Pansharpening aims to generate a high spatial resolution multispectral image (HRMS) by fusing a low spatial resolution multispectral image (LRMS) and a panchromatic image (PAN). The most challenging issue for this task is that only the to-be-fused LRMS and PAN are available, and the existing deep learning-based methods are unsuitable since they rely on many training pairs. Traditional variational optimization (VO) based methods are well-suited for addressing such a problem. They focus on carefully designing explicit fusion rules as well as regularizations for an optimization problem, which are based on the researcher's discovery of the image relationships and image structures. Unlike previous VO-based methods, in this work, we explore such complex relationships by a parameterized term rather than a manually designed one. Specifically, we propose a zero-shot pansharpening method by introducing a neural network into the optimization objective. This network estimates a representation component of HRMS, which mainly describes the relationship between HRMS and PAN. In this way, the network achieves a similar goal to the so-called deep image prior because it implicitly regulates the relationship between the HRMS and PAN images through its inherent structure. We directly minimize this optimization objective via network parameters and the expected HRMS image through iterative updating. Extensive experiments on various benchmark datasets demonstrate that our proposed method can achieve better performance compared with other state-of-the-art methods. The codes are available at https://github.com/xyrui/PSDip. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04928 [pdf, other]

CLIPVQA:Video Quality Assessment via CLIP

Authors: Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, Xiaochun Cao

Abstract: In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem… ▽ More In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully aggregated with the generated content information via a cross-attention module for producing video-language representation. Finally, the video-level quality and video-language representations are fused together for final video quality prediction, where a vectorized regression loss is employed for efficient end-to-end optimization. Comprehensive experiments are conducted on eight in-the-wild video datasets with diverse resolutions to evaluate the performance of CLIPVQA. The experimental results show that the proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods. A series of ablation studies are also performed to validate the effectiveness of each module in CLIPVQA. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03335 [pdf, other]

Dual-Domain Deep D-bar Method for Solving Electrical Impedance Tomography

Authors: Xiang Cao, Qiaoqiao Ding, Xiaoqun Zhang

Abstract: The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems due to its efficiency and simplicity. It provides a direct approach by applying low-pass filtering to the scattering data in the non-linear Fourier domain, thereby yielding a smoothed conductivity approximation. However, D-bar images often present low contrast and low resolu… ▽ More The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems due to its efficiency and simplicity. It provides a direct approach by applying low-pass filtering to the scattering data in the non-linear Fourier domain, thereby yielding a smoothed conductivity approximation. However, D-bar images often present low contrast and low resolution due to the absence of accurate high-frequency information and ill-posedness of the problem. In this paper, we proposed a dual-domain neural network architecture to retrieve high-contrast D-bar image sequences from low-contrast D-bar images. To further accentuate the spatial features of the conductivity distribution, the widely adopted U-net has been tailored for conductivity image calibration from the predicted D-bar image sequences. We call such a hybrid approach by Dual-Domain Deep D-bar method due to the consideration of both scattering data and image information. Compared to the single-scale structure, our proposed multi-scale structure exhibits superior capabilities in reducing artifacts and refining conductivity approximation. Additionally, solving discrete D-bar systems using the GMRES algorithm entails significant computational complexity, which is extremely time-consuming on CPU-based devices. To remedy this, we designed a surrogate GPU-based Richardson iterative method to accelerate the data enhancement process by D-bar. Numerical results are presented for simulated EIT data from the KIT4 and ACT4 systems to demonstrate notable improvements in absolute EIT imaging quality when compared to existing methodologies. △ Less

Submitted 12 May, 2024; originally announced July 2024.

Comments: 15 pages, 7 figures

arXiv:2407.00933 [pdf, other]

Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis

Authors: Xueyao Zhang, Bo Yang, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

Abstract: This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can… ▽ More This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can cause outages during the task offloading. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) whose computationally capable metamaterials are leveraged to jointly enable V2I reflective links as well as to implement interference cancellation at the V2V links. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices to maximize an autonomous driving safety task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems solved via an alternate approximation method. Our simulation results showcase that the proposed RICS-assisted offloading framework significantly improves the safety of the considered autonomous driving network, yielding a nearly 34\% improvement in the safety coefficient of the CVs. In addition, it is demonstrated that the V2V data rate can be improved by around 60\% indicating that the RICS-induced adjustment of the signals can effectively mitigate interference at the V2V link. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18055 [pdf, other]

Filtering Reconfigurable Intelligent Computational Surface for RF Spectrum Purification

Authors: Kaining Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Mérouane Debbah, Chau Yuen

Abstract: The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-b… ▽ More The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-band signals has been largely overlooked, leading to RF spectrum pollution and degradation of wireless transmissions. To address this issue, we propose a novel surface structure called the Filtering Reconfigurable Intelligent Computational Surface (FRICS). We introduce two types of FRICS structures: one that dynamically reflects resonance band signals through a tunable spatial filter while absorbing out-of-band signals using metamaterials and the other one that dynamically amplifies in-band signals using computational metamaterials while reflecting out-of-band signals. To evaluate the performance of FRICS, we implement it in device-to-device (D2D) communication and vehicular-to-everything (V2X) scenarios. The experiments demonstrate the superiority of FRICS in signal-to-interference-noise ratio (SINR) and energy efficiency (EE). Finally, we discuss the critical challenges faced and promising techniques for implementing FRICS in future wireless systems. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.13335 [pdf, other]

AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations

Authors: Xuelin Cao, Bo Yang, Kaining Wang, Xinghua Li, Zhiwen Yu, Chau Yuen, Yan Zhang, Zhu Han

Abstract: With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz… ▽ More With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high Quality-of-Service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state-of-the-art, and we further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.05982 [pdf]

doi 10.1007/s10334-024-01182-7

Artificial Intelligence for Neuro MRI Acquisition: A Review

Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potential in enhancing the efficiency and throughput of acquisition steps. This review discusses several pivotal AI-based methods in neuro MRI acquisition, focusing on their technological advances, impact on clinical practice, and potential risks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Magn Reson Mater Phy (2024)

arXiv:2405.10513 [pdf, other]

Federated Learning With Energy Harvesting Devices: An MDP Framework

Authors: Kai Zhang, Xuanyu Cao

Abstract: Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesti… ▽ More Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is impacted by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.19167 [pdf]

Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

Authors: Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo Jin Lee, Michael Ohliger, Hui Xue, Yang Yang

Abstract: Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models… ▽ More Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models with constrained data diversity and quantity, which exhibit limited generalizability in clinical applications across diverse MRI systems, pulse sequences, and organs. In this study, we present ImT-MRD: a complex-valued imaging transformer trained on a vast number of clinical MRI scans aiming at universal MR denoising at LF systems. Compared with averaging multiple-repeated scans for higher image SNR, the model obtains better image quality from fewer repetitions, demonstrating its capability for accelerating scans under various clinical settings. Moreover, with its complex-valued image input, the model can denoise intermediate results before advanced post-processing and prepare high-quality data for further MRI research. By delivering universal and accurate denoising across clinical and research tasks, our model holds great promise to expedite the evolution of LF MRI for accessible and equal biomedical applications. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.13955 [pdf, other]

doi 10.33012/2023.19426

GNSS Measurement-Based Context Recognition for Vehicle Navigation using Gated Recurrent Unit

Authors: Sheng Liu, Zhiqiang Yao, Xuemeng Cao, Xiaowen Cai

Abstract: Recent years, people have put forward higher and higher requirements for context-adaptive navigation (CAN). CAN system realizes seamless navigation in complex environments by recognizing the ambient surroundings of vehicles, and it is crucial to develop a fast, reliable, and robust navigational context recognition (NCR) method to enable CAN systems to operate effectively. Environmental context rec… ▽ More Recent years, people have put forward higher and higher requirements for context-adaptive navigation (CAN). CAN system realizes seamless navigation in complex environments by recognizing the ambient surroundings of vehicles, and it is crucial to develop a fast, reliable, and robust navigational context recognition (NCR) method to enable CAN systems to operate effectively. Environmental context recognition based on Global Navigation Satellite System (GNSS) measurements has attracted widespread attention due to its low cost because it does not require additional infrastructure. The performance and application value of NCR methods depend on three main factors: context categorization, feature extraction, and classification models. In this paper, a fine-grained context categorization framework comprising seven environment categories (open sky, tree-lined avenue, semi-outdoor, urban canyon, viaduct-down, shallow indoor, and deep indoor) is proposed, which currently represents the most elaborate context categorization framework known in this research domain. To improve discrimination between categories, a new feature called the C/N0-weighted azimuth distribution factor, is designed. Then, to ensure real-time performance, a lightweight gated recurrent unit (GRU) network is adopted for its excellent sequence data processing capabilities. A dataset containing 59,996 samples is created and made publicly available to researchers in the NCR community on Github. Extensive experiments have been conducted on the dataset, and the results show that the proposed method achieves an overall recognition accuracy of 99.41\% for isolated scenarios and 94.95\% for transition scenarios, with an average transition delay of 2.14 seconds. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 9 pages, 9 figures, 5 tables

Journal ref: Proceedings of the 36th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2023)

arXiv:2404.07215 [pdf, other]

Computation Offloading for Multi-server Multi-access Edge Vehicular Networks: A DDQN-based Method

Authors: Siyu Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Yan Zhang, Chau Yuen

Abstract: In this paper, we investigate a multi-user offloading problem in the overlapping domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the hum… ▽ More In this paper, we investigate a multi-user offloading problem in the overlapping domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the human behaviour model when making the offloading decision, and then introduce a server evaluation mechanism based on both the mobility parameter and the server load to select the optimal offloading server. In order to fully utilise the server resources, we design a double deep Q-network (DDQN)-based reward evaluation algorithm that considers the priority of tasks when scheduling offload requests. Finally, numerical simulations are conducted to verify that our proposed method outperforms traditional mathematical computation methods as well as the DQN algorithm. △ Less

Submitted 20 February, 2024; originally announced April 2024.

arXiv:2404.03327 [pdf, other]

DI-Retinex: Digital-Imaging Retinex Theory for Low-Light Image Enhancement

Authors: Shangquan Sun, Wenqi Ren, Jingyang Peng, Fenglong Song, Xiaochun Cao

Abstract: Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex t… ▽ More Many existing methods for low-light image enhancement (LLIE) based on Retinex theory ignore important factors that affect the validity of this theory in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex theory (DI-Retinex) through theoretical and experimental analysis of Retinex theory in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the lowlight enhancement problem in an unsupervised manner, we propose an image-adaptive masked reverse degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.00568 [pdf, other]

doi 10.1109/TSG.2024.3451993

Stochastic-Robust Planning of Networked Hydrogen-Electrical Microgrids: A Study on Induced Refueling Demand

Authors: Xunhang Sun, Xiaoyu Cao, Bo Zeng, Qiaozhu Zhai, Tamer Başar, Xiaohong Guan

Abstract: Hydrogen-electrical microgrids are increasingly assuming an important role on the pathway toward decarbonization of energy and transportation systems. This paper studies networked hydrogen-electrical microgrids planning (NHEMP), considering a critical but often-overlooked issue, i.e., the demand-inducing effect (DIE) associated with infrastructure development decisions. Specifically, higher refuel… ▽ More Hydrogen-electrical microgrids are increasingly assuming an important role on the pathway toward decarbonization of energy and transportation systems. This paper studies networked hydrogen-electrical microgrids planning (NHEMP), considering a critical but often-overlooked issue, i.e., the demand-inducing effect (DIE) associated with infrastructure development decisions. Specifically, higher refueling capacities will attract more refueling demand of hydrogen-powered vehicles (HVs). To capture such interactions between investment decisions and induced refueling demand, we introduce a decision-dependent uncertainty (DDU) set and build a trilevel stochastic-robust formulation. The upper-level determines optimal investment strategies for hydrogen-electrical microgrids, the lower-level optimizes the risk-aware operation schedules across a series of stochastic scenarios, and, for each scenario, the middle-level identifies the "worst" situation of refueling demand within an individual DDU set to ensure economic feasibility. Then, an adaptive and exact decomposition algorithm, based on Parametric Column-and-Constraint Generation (PC&CG), is customized and developed to address the computational challenge and to quantitatively analyze the impact of DIE. Case studies on an IEEE exemplary system validate the effectiveness of the proposed NHEMP model and the PC&CG algorithm. It is worth highlighting that DIE can make an important contribution to the economic benefits of NHEMP, yet its significance will gradually decrease when the main bottleneck transits to other system restrictions. △ Less

Submitted 27 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.13562 [pdf, other]

Augmented LRFS-based Filter: Holistic Tracking of Group Objects

Authors: Chaoqun Yang, Xiaowei Liang, Zhiguo Shi, Heng Zhang, Xianghui Cao

Abstract: This paper addresses the problem of group target tracking (GTT), wherein multiple closely spaced targets within a group pose a coordinated motion. To improve the tracking performance, the labeled random finite sets (LRFSs) theory is adopted, and this paper develops a new kind of LRFSs, i.e., augmented LRFSs, which introduces group information into the definition of LRFSs. Specifically, for each el… ▽ More This paper addresses the problem of group target tracking (GTT), wherein multiple closely spaced targets within a group pose a coordinated motion. To improve the tracking performance, the labeled random finite sets (LRFSs) theory is adopted, and this paper develops a new kind of LRFSs, i.e., augmented LRFSs, which introduces group information into the definition of LRFSs. Specifically, for each element in an LRFS, the kinetic states, track label, and the corresponding group information of its represented target are incorporated. Furthermore, by means of the labeled multi-Bernoulli (LMB) filter with the proposed augmented LRFSs, the group structure is iteratively propagated and updated during the tracking process, which achieves the simultaneously estimation of the kinetic states, track label, and the corresponding group information of multiple group targets, and further improves the GTT tracking performance. Finally, simulation experiments are provided, which well demonstrates the effectiveness of the labeled multi-Bernoulli filter with the proposed augmented LRFSs for GTT tracking. △ Less

Submitted 19 August, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13346 [pdf, other]

A Control-Recoverable Added-Noise-based Privacy Scheme for LQ Control in Networked Control Systems

Authors: Xuening Tang, Xianghui Cao, Wei Xing Zheng

Abstract: As networked control systems continue to evolve, ensuring the privacy of sensitive data becomes an increasingly pressing concern, especially in situations where the controller is physically separated from the plant. In this paper, we propose a secure control scheme for computing linear quadratic control in a networked control system utilizing two networked controllers, a privacy encoder and a cont… ▽ More As networked control systems continue to evolve, ensuring the privacy of sensitive data becomes an increasingly pressing concern, especially in situations where the controller is physically separated from the plant. In this paper, we propose a secure control scheme for computing linear quadratic control in a networked control system utilizing two networked controllers, a privacy encoder and a control restorer. Specifically, the encoder generates two state signals blurred with random noise and sends them to the controllers, while the restorer reconstructs the correct control signal. The proposed design effectively preserves the privacy of the control system's state without sacrificing the control performance. We theoretically quantify the privacy-preserving performance in terms of the state estimation error of the controllers and the disclosure probability. Additionally, the proposed privacy-preserving scheme is also proven to satisfy differential privacy. Moreover, we extend the proposed privacy-preserving scheme and evaluation method to cases where collusion between two controllers occurs. Finally, we verify the validity of our proposed scheme through simulations. △ Less

Submitted 22 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.05906 [pdf, other]

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Authors: Jingyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao

Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing… ▽ More Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 13 pages, 10 figures, conference or other essential info

arXiv:2403.05247 [pdf, other]

Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

Authors: Tianrui Lou, Xiaojun Jia, Jindong Gu, Li Liu, Siyuan Liang, Bangyan He, Xiaochun Cao

Abstract: Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An… ▽ More Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. Another promising strategy, shape-based attack, can effectively eliminate outliers, but existing methods often suffer significant reductions in imperceptibility due to irrational deformations. We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes. Therefore, we propose a novel shape-based adversarial attack method, HiT-ADV, which initially conducts a two-stage search for attack regions based on saliency and imperceptibility scores, and then adds deformation perturbations in each attack region using Gaussian kernel functions. Additionally, HiT-ADV is extendable to physical attack. We propose that by employing benign resampling and benign rigid transformations, we can further enhance physical adversarial strength with little sacrifice to imperceptibility. Extensive experiments have validated the superiority of our method in terms of adversarial and imperceptible properties in both digital and physical spaces. Our code is avaliable at: https://github.com/TRLou/HiT-ADV. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2402.15865 [pdf, other]

HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models

Authors: Li Pang, Xiangyu Rui, Long Cui, Hongzhong Wang, Deyu Meng, Xiangyong Cao

Abstract: Hyperspectral image (HSI) restoration aims at recovering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the complex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervis… ▽ More Hyperspectral image (HSI) restoration aims at recovering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the complex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervised HSI restoration framework with pre-trained diffusion model (HIR-Diff), which restores the clean HSIs from the product of two low-rank components, i.e., the reduced image and the coefficient matrix. Specifically, the reduced image, which has a low spectral dimension, lies in the image field and can be inferred from our improved diffusion model where a new guidance function with total variation (TV) prior is designed to ensure that the reduced image can be well sampled. The coefficient matrix can be effectively pre-estimated based on singular value decomposition (SVD) and rank-revealing QR (RRQR) factorization. Furthermore, a novel exponential noise schedule is proposed to accelerate the restoration process (about 5$\times$ acceleration for denoising) with little performance decrease. Extensive experimental results validate the superiority of our method in both performance and speed on a variety of HSI restoration tasks, including HSI denoising, noisy HSI super-resolution, and noisy HSI inpainting. The code is available at https://github.com/LiPang/HIRDiff. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.00398 [pdf, other]

Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks

Authors: Bo Yang, Xueyao Zhang, Zhiwen Yu, Xuelin Cao, Chongwen Huang, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

Abstract: In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from sever… ▽ More In this paper, we focus on improving autonomous driving safety via task offloading from cellular vehicles (CVs), using vehicle-to-infrastructure (V2I) links, to an multi-access edge computing (MEC) server. Considering that the frequencies used for V2I links can be reused for vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of each V2I link may suffer from severe interference, causing outages in the task offloading process. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) to enable, not only V2I reflective links, but also interference cancellation at the V2V links exploiting the computational capability of its metamaterials. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices, with the objective to maximize a safety-based autonomous driving task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems and solved via an alternate approximation method. Our simulation results demonstrate the effectiveness of the proposed RICS optimization in improving the safety in autonomous driving networks. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.03856 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446951

DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

Authors: Haojie Wei, Xueke Cao, Wenbo Xu, Tangpeng Dan, Yueguo Chen

Abstract: Singing voice separation and vocal pitch estimation are pivotal tasks in music information retrieval. Existing methods for simultaneous extraction of clean vocals and vocal pitches can be classified into two categories: pipeline methods and naive joint learning methods. However, the efficacy of these methods is limited by the following problems: On the one hand, pipeline methods train models for e… ▽ More Singing voice separation and vocal pitch estimation are pivotal tasks in music information retrieval. Existing methods for simultaneous extraction of clean vocals and vocal pitches can be classified into two categories: pipeline methods and naive joint learning methods. However, the efficacy of these methods is limited by the following problems: On the one hand, pipeline methods train models for each task independently, resulting a mismatch between the data distributions at the training and testing time. On the other hand, naive joint learning methods simply add the losses of both tasks, possibly leading to a misalignment between the distinct objectives of each task. To solve these problems, we propose a Deep Joint Cascade Model (DJCM) for singing voice separation and vocal pitch estimation. DJCM employs a novel joint cascade model structure to concurrently train both tasks. Moreover, task-specific weights are used to align different objectives of both tasks. Experimental results show that DJCM achieves state-of-the-art performance on both tasks, with great improvements of 0.45 in terms of Signal-to-Distortion Ratio (SDR) for singing voice separation and 2.86% in terms of Overall Accuracy (OA) for vocal pitch estimation. Furthermore, extensive ablation studies validate the effectiveness of each design of our proposed model. The code of DJCM is available at https://github.com/Dream-High/DJCM . △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: This paper has been accepted by ICASSP 2024

arXiv:2401.03381 [pdf, other]

Distributionally Robust Frequency-Constrained Microgrid Scheduling Towards Seamless Islanding

Authors: Lun Yang, Haoxiang Yang, Xiaoyu Cao, Xiaohong Guan

Abstract: Unscheduled islanding events of microgrids result in the transition between grid-connected and islanded modes and induce a sudden and unknown power imbalance, posing a threat to frequency security. To achieve seamless islanding, we propose a distributionally robust frequency-constrained microgrid scheduling model considering unscheduled islanding events. This model co-optimizes unit commitments, p… ▽ More Unscheduled islanding events of microgrids result in the transition between grid-connected and islanded modes and induce a sudden and unknown power imbalance, posing a threat to frequency security. To achieve seamless islanding, we propose a distributionally robust frequency-constrained microgrid scheduling model considering unscheduled islanding events. This model co-optimizes unit commitments, power dispatch, upward/downward primary frequency response reserves, virtual inertia provisions from renewable energy sources (RESs), deloading ratios of RESs, and battery operations, while ensuring the system frequency security during unscheduled islanding. We establish an affine relationship between the actual power exchange and RES uncertainty in grid-connected mode, describe RES uncertainty with a Wasserstein-metric ambiguity set, and formulate frequency constraints under uncertain post-islanding power imbalance as distributionally robust quadratic chance constraints, which are further transformed by a tight conic relaxation. We solve the proposed mixed-integer convex program and demonstrate its effectiveness through case studies. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 10 pages, 7 figures

arXiv:2312.13523 [pdf]

doi 10.1002/mrm.29990

High-resolution myelin-water fraction and quantitative relaxation mapping using 3D ViSTa-MR fingerprinting

Authors: Congyu Liao, Xiaozhi Cao, Siddharth Srinivasan Iyer, Sophie Schauman, Zihan Zhou, Xiaoqian Yan, Quan Chen, Zhitao Li, Nan Wang, Ting Gong, Zhe Wu, Hongjian He, Jianhui Zhong, Yang Yang, Adam Kerr, Kalanit Grill-Spector, Kawin Setsompop

Abstract: Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous mapping of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MR… ▽ More Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous mapping of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time. Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MRF), to achieve high-fidelity whole-brain MWF and T1/T2/PD mapping on a clinical 3T scanner. To achieve fast acquisition and memory-efficient reconstruction, the ViSTa-MRF sequence leverages an optimized 3D tiny-golden-angle-shuffling spiral-projection acquisition and joint spatial-temporal subspace reconstruction with optimized preconditioning algorithm. With the proposed ViSTa-MRF approach, high-fidelity direct MWF mapping was achieved without a need for multi-compartment fitting that could introduce bias and/or noise from additional assumptions or priors. Results: The in-vivo results demonstrate the effectiveness of the proposed acquisition and reconstruction framework to provide fast multi-parametric mapping with high SNR and good quality. The in-vivo results of 1mm- and 0.66mm-iso datasets indicate that the MWF values measured by the proposed method are consistent with standard ViSTa results that are 30x slower with lower SNR. Furthermore, we applied the proposed method to enable 5-minute whole-brain 1mm-iso assessment of MWF and T1/T2/PD mappings for infant brain development and for post-mortem brain samples. Conclusions: In this work, we have developed a 3D ViSTa-MRF technique that enables the acquisition of whole-brain MWF, quantitative T1, T2, and PD maps at 1mm and 0.66mm isotropic resolution in 5 and 15 minutes, respectively. This advancement allows for quantitative investigations of myelination changes in the brain. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 38 pages, 12 figures and 1 table

Journal ref: Magnetic Resonance in Medicine 2023

arXiv:2312.13127 [pdf, other]

Pixel-to-Abundance Translation: Conditional Generative Adversarial Networks Based on Patch Transformer for Hyperspectral Unmixing

Authors: Li Wang, Xiaohua Zhang, Longfei Li, Hongyun Meng, Xianghai Cao

Abstract: Spectral unmixing is a significant challenge in hyperspectral image processing. Existing unmixing methods utilize prior knowledge about the abundance distribution to solve the regularization optimization problem, where the difficulty lies in choosing appropriate prior knowledge and solving the complex regularization optimization problem. To solve these problems, we propose a hyperspectral conditio… ▽ More Spectral unmixing is a significant challenge in hyperspectral image processing. Existing unmixing methods utilize prior knowledge about the abundance distribution to solve the regularization optimization problem, where the difficulty lies in choosing appropriate prior knowledge and solving the complex regularization optimization problem. To solve these problems, we propose a hyperspectral conditional generative adversarial network (HyperGAN) method as a generic unmixing framework, based on the following assumption: the unmixing process from pixel to abundance can be regarded as a transformation of two modalities with an internal specific relationship. The proposed HyperGAN is composed of a generator and discriminator, the former completes the modal conversion from mixed hyperspectral pixel patch to the abundance of corresponding endmember of the central pixel and the latter is used to distinguish whether the distribution and structure of generated abundance are the same as the true ones. We propose hyperspectral image (HSI) Patch Transformer as the main component of the generator, which utilize adaptive attention score to capture the internal pixels correlation of the HSI patch and leverage the spatial-spectral information in a fine-grained way to achieve optimization of the unmixing process. Experiments on synthetic data and real hyperspectral data achieve impressive results compared to state-of-the-art competitors. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.09488 [pdf]

Sequence adaptive field-imperfection estimation (SAFE): retrospective estimation and correction of $B_1^+$ and $B_0$ inhomogeneities for enhanced MRF quantification

Authors: Mengze Gao, Xiaozhi Cao, Daniel Abraham, Zihan Zhou, Kawin Setsompop

Abstract: $B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and $B_0$ calibration scans can mitigate this but add scan time and cannot be applied retrospectively to previously collected data. Here, we proposed a calibration-free sequence-adaptive deep-learning framework, to estimate and correct for $B_1^+… ▽ More $B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and $B_0$ calibration scans can mitigate this but add scan time and cannot be applied retrospectively to previously collected data. Here, we proposed a calibration-free sequence-adaptive deep-learning framework, to estimate and correct for $B_1^+$ and $B_0$ effects of any MRF sequence. We demonstrate its capability on arbitrary MRF sequences at 3T, where no training data were previously obtained. Such approach can be applied to any previously-acquired and future MRF-scans. The flexibility in directly applying this framework to other quantitative sequences is also highlighted. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 12 pages, 5 figures, submitted to International Society for Magnetic Resonance in Medicine 31th Scientific Meeting, 2024

arXiv:2311.15556 [pdf, other]

PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

Authors: Jiquan Yuan, Xinyan Cao, Changjin Li, Fanyi Yang, Jinlong Lin, Xixin Cao

Abstract: As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natura… ▽ More As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natural images, and not all generated images meet the requirements of the real world. Therefore, it is of great significance to evaluate AIGIs more comprehensively. Although previous work has established several human perception-based AIGC image quality assessment (AIGCIQA) databases for text-generated images, the AI image generation technology includes scenarios like text-to-image and image-to-image, and assessing only the images generated by text-to-image models is insufficient. To address this issue, we establish a human perception-based image-to-image AIGCIQA database, named PKU-I2IQA. We conduct a well-organized subjective experiment to collect quality labels for AIGIs and then conduct a comprehensive analysis of the PKU-I2IQA database. Furthermore, we have proposed two benchmark models: NR-AIGCIQA based on the no-reference image quality assessment method and FR-AIGCIQA based on the full-reference image quality assessment method. Finally, leveraging this database, we conduct benchmark experiments and compare the performance of the proposed benchmark models. The PKU-I2IQA database and benchmarks will be released to facilitate future research on \url{https://github.com/jiquan123/I2IQA}. △ Less

Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 18 pages

arXiv:2310.10823 [pdf, other]

Implicit Representation of GRAPPA Kernels for Fast MRI Reconstruction

Authors: Daniel Abraham, Mark Nishimura, Xiaozhi Cao, Congyu Liao, Kawin Setsompop

Abstract: MRI data is acquired in Fourier space/k-space. Data acquisition is typically performed on a Cartesian grid in this space to enable the use of a fast Fourier transform algorithm to achieve fast and efficient reconstruction. However, it has been shown that for multiple applications, non-Cartesian data acquisition can improve the performance of MR imaging by providing fast and more efficient data acq… ▽ More MRI data is acquired in Fourier space/k-space. Data acquisition is typically performed on a Cartesian grid in this space to enable the use of a fast Fourier transform algorithm to achieve fast and efficient reconstruction. However, it has been shown that for multiple applications, non-Cartesian data acquisition can improve the performance of MR imaging by providing fast and more efficient data acquisition, and improving motion robustness. Nonetheless, the image reconstruction process of non-Cartesian data is more involved and can be time-consuming, even through the use of efficient algorithms such as non-uniform FFT (NUFFT). Reconstruction complexity is further exacerbated when imaging in the presence of field imperfections. This work (implicit GROG) provides an efficient approach to transform the field corrupted non-Cartesian data into clean Cartesian data, to achieve simpler and faster reconstruction which should help enable non-Cartesian data sampling to be performed more widely in MRI. △ Less

Submitted 14 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.09025 [pdf, other]

Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission Perspectives

Authors: Xianbin Cao, Peng Yang, Xiaoning Su

Abstract: Near-space information networks (NSINs) composed of high-altitude platforms (HAPs) and high- and low-altitude unmanned aerial vehicles (UAVs) are a new regime for providing quick, robust, and cost-efficient sensing and communication services. Precipitated by innovations and breakthroughs in manufacturing, materials, communications, electronics, and control techniques, NSINs have been envisioned as… ▽ More Near-space information networks (NSINs) composed of high-altitude platforms (HAPs) and high- and low-altitude unmanned aerial vehicles (UAVs) are a new regime for providing quick, robust, and cost-efficient sensing and communication services. Precipitated by innovations and breakthroughs in manufacturing, materials, communications, electronics, and control techniques, NSINs have been envisioned as an essential component of the emerging sixth-generation of mobile communication systems. This article reveals some critical issues needing to be tackled in NSINs through conducting experiments and discusses the latest advances in NSINs in the research areas of channel modeling, networking, and transmission from a forward-looking, comparative, and technical evolutionary perspective. In this article, we highlight the characteristics of NSINs and present the promising use cases of NSINs. The impact of airborne platforms' unstable movements on the phase delays of onboard antenna arrays with diverse structures is mathematically analyzed. The recent advances in HAP channel modeling are elaborated on, along with the significant differences between HAP and UAV channel modeling. A comprehensive review of the networking techniques of NSINs in network deployment, handoff management, and network management aspects is provided. Besides, the promising techniques and communication protocols of the physical (PHY) layer, medium access control (MAC) layer, network layer, and transport layer of NSINs for achieving efficient transmission over NSINs are reviewed, and we have conducted experiments with practical NSINs to verify the performance of some techniques. Finally, we outline some open issues and promising directions for NSINs deserved for future study and discuss the corresponding challenges. △ Less

Submitted 13 May, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.02792 [pdf, other]

doi 10.1109/TMI.2024.3419780

Continuous 3D Myocardial Motion Tracking via Echocardiography

Authors: Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin

Abstract: Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally. However, current techniques suffer from incomplete and inaccurate motion estimation of the myocardium in both spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. To address these challenge… ▽ More Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally. However, current techniques suffer from incomplete and inaccurate motion estimation of the myocardium in both spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. To address these challenges, this paper introduces the Neural Cardiac Motion Field (NeuralCMF). NeuralCMF leverages implicit neural representation (INR) to model the 3D structure and the comprehensive 6D forward/backward motion of the heart. This method surpasses pixel-wise limitations by offering the capability to continuously query the precise shape and motion of the myocardium at any specific point throughout the cardiac cycle, enhancing the detailed analysis of cardiac dynamics beyond traditional speckle tracking. Notably, NeuralCMF operates without the need for paired datasets, and its optimization is self-supervised through the physics knowledge priors in both space and time dimensions, ensuring compatibility with both 2D and 3D echocardiogram video inputs. Experimental validations across three representative datasets support the robustness and innovative nature of the NeuralCMF, marking significant advantages over existing state-of-the-art methods in cardiac imaging and motion tracking. △ Less

Submitted 27 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 18 pages, 11 figures

Journal ref: IEEE Transactions on Medical Imaging, June 2024

arXiv:2310.02690 [pdf, other]

Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Clasification

Authors: Guoxin Wang, Xuyang Cao, Shan An, Fengmei Fan, Chao Zhang, Jinsong Wang, Feng Yu, Zhiren Wang

Abstract: Deep learning approaches, together with neuroimaging techniques, play an important role in psychiatric disorders classification. Previous studies on psychiatric disorders diagnosis mainly focus on using functional connectivity matrices of resting-state functional magnetic resonance imaging (rs-fMRI) as input, which still needs to fully utilize the rich temporal information of the time series of rs… ▽ More Deep learning approaches, together with neuroimaging techniques, play an important role in psychiatric disorders classification. Previous studies on psychiatric disorders diagnosis mainly focus on using functional connectivity matrices of resting-state functional magnetic resonance imaging (rs-fMRI) as input, which still needs to fully utilize the rich temporal information of the time series of rs-fMRI data. In this work, we proposed a multi-dimension-embedding-aware modality fusion transformer (MFFormer) for schizophrenia and bipolar disorder classification using rs-fMRI and T1 weighted structural MRI (T1w sMRI). Concretely, to fully utilize the temporal information of rs-fMRI and spatial information of sMRI, we constructed a deep learning architecture that takes as input 2D time series of rs-fMRI and 3D volumes T1w. Furthermore, to promote intra-modality attention and information fusion across different modalities, a fusion transformer module (FTM) is designed through extensive self-attention of hybrid feature maps of multi-modality. In addition, a dimension-up and dimension-down strategy is suggested to properly align feature maps of multi-dimensional from different modalities. Experimental results on our private and public OpenfMRI datasets show that our proposed MFFormer performs better than that using a single modality or multi-modality MRI on schizophrenia and bipolar disorder diagnosis. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.16372 [pdf, other]

Aperture Diffraction for Compact Snapshot Spectral Imaging

Authors: Tao Lv, Hao Ye, Quan Yuan, Zhan Shi, Yibo Wang, Shuming Wang, Xun Cao

Abstract: We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multip… ▽ More We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multiplexed to discrete encoding locations on the mosaic filter sensor by diffraction-based spatial-spectral projection engineering generated from the orthogonal mask. The orthogonal projection is uniformly accepted to obtain a weakly calibration-dependent data form to enhance modulation robustness. Meanwhile, the Cascade Shift-Shuffle Spectral Transformer (CSST) with strong perception of the diffraction degeneration is designed to solve a sparsity-constrained inverse problem, realizing the volume reconstruction from 2D measurements with Large amount of aliasing. Our system is evaluated by elaborating the imaging optical theory and reconstruction algorithm with demonstrating the experimental imaging under a single exposure. Ultimately, we achieve the sub-super-pixel spatial resolution and high spectral resolution imaging. The code will be available at: https://github.com/Krito-ex/CSST. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: accepted by International Conference on Computer Vision (ICCV) 2023

arXiv:2309.13166 [pdf, other]

Invisible Watermarking for Audio Generation Diffusion Models

Authors: Xirong Cao, Xiang Li, Divyesh Jadav, Yanzhao Wu, Zhehui Chen, Chen Zeng, Wenqi Wei

Abstract: Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains. In the rapidly evolving field of audio-based machine learning, safeguarding model integrity and establishing data copyright are of paramount importance. This paper presents the first waterm… ▽ More Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains. In the rapidly evolving field of audio-based machine learning, safeguarding model integrity and establishing data copyright are of paramount importance. This paper presents the first watermarking technique applied to audio diffusion models trained on mel-spectrograms. This offers a novel approach to the aforementioned challenges. Our model excels not only in benign audio generation, but also incorporates an invisible watermarking trigger mechanism for model verification. This watermark trigger serves as a protective layer, enabling the identification of model ownership and ensuring its integrity. Through extensive experiments, we demonstrate that invisible watermark triggers can effectively protect against unauthorized modifications while maintaining high utility in benign audio generation tasks. △ Less

Submitted 31 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: This is an invited paper for IEEE TPS, part of the IEEE CIC/CogMI/TPS 2023 conference

arXiv:2309.11745 [pdf, other]

PIE: Simulating Disease Progression via Progressive Image Editing

Authors: Kaizhao Liang, Xu Cao, Kuei-Da Liao, Tianren Gao, Wenqian Ye, Zhengyu Chen, Jianguo Cao, Tejas Nama, Jimeng Sun

Abstract: Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of dis… ▽ More Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes. △ Less

Submitted 5 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Code and checkpoints for replicating our results can be found at https://github.com/IrohXu/PIE and https://huggingface.co/IrohXu/stable-diffusion-mimic-cxr-v0.1

arXiv:2309.05964 [pdf, other]

Massive Access of Static and Mobile Users via Reconfigurable Intelligent Surfaces: Protocol Design and Performance Analysis

Authors: Xuelin Cao, Bo Yang, Chongwen Huang, George C. Alexandropoulos, Chau Yuen, Zhu Han, H. Vincent Poor, Lajos Hanzo

Abstract: The envisioned wireless networks of the future entail the provisioning of massive numbers of connections, heterogeneous data traffic, ultra-high spectral efficiency, and low latency services. This vision is spurring research activities focused on defining a next generation multiple access (NGMA) protocol that can accommodate massive numbers of users in different resource blocks, thereby, achieving… ▽ More The envisioned wireless networks of the future entail the provisioning of massive numbers of connections, heterogeneous data traffic, ultra-high spectral efficiency, and low latency services. This vision is spurring research activities focused on defining a next generation multiple access (NGMA) protocol that can accommodate massive numbers of users in different resource blocks, thereby, achieving higher spectral efficiency and increased connectivity compared to conventional multiple access schemes. In this article, we present a multiple access scheme for NGMA in wireless communication systems assisted by multiple reconfigurable intelligent surfaces (RISs). In this regard, considering the practical scenario of static users operating together with mobile ones, we first study the interplay of the design of NGMA schemes and RIS phase configuration in terms of efficiency and complexity. Based on this, we then propose a multiple access framework for RIS-assisted communication systems, and we also design a medium access control (MAC) protocol incorporating RISs. In addition, we give a detailed performance analysis of the designed RIS-assisted MAC protocol. Our extensive simulation results demonstrate that the proposed MAC design outperforms the benchmarks in terms of system throughput and access fairness, and also reveal a trade-off relationship between the system throughput and fairness. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.00907 [pdf, other]

A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading

Authors: Ruihuai Liang, Bo Yang, Zhiwen Yu, Xuelin Cao, Derrick Wing Kwan Ng, Chau Yuen

Abstract: Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether o… ▽ More Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether offloading or not) and computational resource allocation of MEC. The design can be formulated as a mixed-integer nonlinear programming (MINLP) problem, which is generally NP-hard and its effective solution can be obtained by performing online inference through a well-trained deep neural network (DNN) model. However, when the system environments change dynamically, the DNN model may lose efficacy due to the drift of input parameters, thereby decreasing the generalization ability of the DNN model. To address this unique challenge, in this paper, we propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs). Specifically, the shared backbone will be invariant during the PHs training and the inferred results will be ensembled, thereby significantly reducing the required training overhead and improving the inference performance. As a result, the joint optimization problem for offloading decision and resource allocation can be efficiently solved even in a time-varying wireless environment. Experimental results show that the proposed MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2308.16612 [pdf, other]

Neural Gradient Regularizer

Authors: Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc Van Gool

Abstract: Owing to its significant success, the prior imposed on gradient maps has consistently been a subject of great interest in the field of image processing. Total variation (TV), one of the most representative regularizers, is known for its ability to capture the intrinsic sparsity prior underlying gradient maps. Nonetheless, TV and its variants often underestimate the gradient maps, leading to the we… ▽ More Owing to its significant success, the prior imposed on gradient maps has consistently been a subject of great interest in the field of image processing. Total variation (TV), one of the most representative regularizers, is known for its ability to capture the intrinsic sparsity prior underlying gradient maps. Nonetheless, TV and its variants often underestimate the gradient maps, leading to the weakening of edges and details whose gradients should not be zero in the original image (i.e., image structures is not describable by sparse priors of gradient maps). Recently, total deep variation (TDV) has been introduced, assuming the sparsity of feature maps, which provides a flexible regularization learned from large-scale datasets for a specific task. However, TDV requires to retrain the network with image/task variations, limiting its versatility. To alleviate this issue, in this paper, we propose a neural gradient regularizer (NGR) that expresses the gradient map as the output of a neural network. Unlike existing methods, NGR does not rely on any subjective sparsity or other prior assumptions on image gradient maps, thereby avoiding the underestimation of gradient maps. NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer. Extensive experimental results demonstrate the superior performance of NGR over state-of-the-art counterparts for a range of different tasks, further validating its effectiveness and versatility. △ Less

Submitted 13 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.10196 [pdf, other]

Blind Face Restoration for Under-Display Camera via Dictionary Guided Transformer

Authors: Jingfan Tan, Xiaoxu Chen, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaocun Cao

Abstract: By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for res… ▽ More By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for restoring UDC face images, which may be the most common problem in the UDC scene. To this end, considering color filtering, brightness attenuation, and diffraction in the imaging process of UDC, we propose a two-stage network UDC Degradation Model Network named UDC-DMNet to synthesize UDC images by modeling the processes of UDC imaging. Then we use UDC-DMNet and high-quality face images from FFHQ and CelebA-Test to create UDC face training datasets FFHQ-P/T and testing datasets CelebA-Test-P/T for UDC face restoration. We propose a novel dictionary-guided transformer network named DGFormer. Introducing the facial component dictionary and the characteristics of the UDC image in the restoration makes DGFormer capable of addressing blind face restoration in UDC scenarios. Experiments show that our DGFormer and UDC-DMNet achieve state-of-the-art performance. △ Less

Submitted 1 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: To appear in IEEE TCSVT

arXiv:2308.07127 [pdf, other]

A Lightweight Sensor Scheduler Based on AoI Function for Remote State Estimation over Lossy Wireless Channels

Authors: Taige Chang, Xianghui Cao, Wei Xing Zheng

Abstract: This paper investigates the problem of sensor scheduling for remotely estimating the states of heterogeneous dynamical systems over resource-limited and lossy wireless channels. Considering the low time complexity and high versatility requirements of schedulers deployed on the transport layer, we propose a lightweight scheduler based on an Age of Information (AoI) function built with the tight sca… ▽ More This paper investigates the problem of sensor scheduling for remotely estimating the states of heterogeneous dynamical systems over resource-limited and lossy wireless channels. Considering the low time complexity and high versatility requirements of schedulers deployed on the transport layer, we propose a lightweight scheduler based on an Age of Information (AoI) function built with the tight scalar upper bound of the remote estimation error. We show that the proposed scheduler is indexable and sub-optimal. We derive an upper and a lower bound of the proposed scheduler and give stability conditions for estimation error. Numerical simulations demonstrate that, compared to existing policies, the proposed scheduler achieves estimation performance very close to the optimal at a much lower computation time. △ Less

Submitted 30 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.12264 [pdf, ps, other]

QoE-Driven Video Transmission: Energy-Efficient Multi-UAV Network Optimization

Authors: Kesong Wu, Xianbin Cao, Peng Yang, Zongyang Yu, Dapeng Oliver Wu, Tony Q. S. Quek

Abstract: This paper is concerned with the issue of improving video subscribers' quality of experience (QoE) by deploying a multi-unmanned aerial vehicle (UAV) network. Different from existing works, we characterize subscribers' QoE by video bitrates, latency, and frame freezing and propose to improve their QoE by energy-efficiently and dynamically optimizing the multi-UAV network in terms of serving UAV se… ▽ More This paper is concerned with the issue of improving video subscribers' quality of experience (QoE) by deploying a multi-unmanned aerial vehicle (UAV) network. Different from existing works, we characterize subscribers' QoE by video bitrates, latency, and frame freezing and propose to improve their QoE by energy-efficiently and dynamically optimizing the multi-UAV network in terms of serving UAV selection, UAV trajectory, and UAV transmit power. The dynamic multi-UAV network optimization problem is formulated as a challenging sequential-decision problem with the goal of maximizing subscribers' QoE while minimizing the total network power consumption, subject to some physical resource constraints. We propose a novel network optimization algorithm to solve this challenging problem, in which a Lyapunov technique is first explored to decompose the sequential-decision problem into several repeatedly optimized sub-problems to avoid the curse of dimensionality. To solve the sub-problems, iterative and approximate optimization mechanisms with provable performance guarantees are then developed. Finally, we design extensive simulations to verify the effectiveness of the proposed algorithm. Simulation results show that the proposed algorithm can effectively improve the QoE of subscribers and is 66.75\% more energy-efficient than benchmarks. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2306.17799 [pdf, other]

A Low-rank Matching Attention based Cross-modal Feature Fusion Method for Conversational Emotion Recognition

Authors: Yuntao Shou, Xiangyong Cao, Deyu Meng, Bo Dong, Qinghua Zheng

Abstract: Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although deep learning (DL) based CER approaches have achieved excellent performance, existing cross-modal feature fusion methods used in these DL-based approaches either ignore the intra-modal and inter-modal emotional interaction or have high computational complexity. To address these issues,… ▽ More Conversational emotion recognition (CER) is an important research topic in human-computer interactions. Although deep learning (DL) based CER approaches have achieved excellent performance, existing cross-modal feature fusion methods used in these DL-based approaches either ignore the intra-modal and inter-modal emotional interaction or have high computational complexity. To address these issues, this paper develops a novel cross-modal feature fusion method for the CER task, i.e., the low-rank matching attention method (LMAM). By setting a matching weight and calculating attention scores between modal features row by row, LMAM contains fewer parameters than the self-attention method. We further utilize the low-rank decomposition method on the weight to make the parameter number of LMAM less than one-third of the self-attention. Therefore, LMAM can potentially alleviate the over-fitting issue caused by a large number of parameters. Additionally, by computing and fusing the similarity of intra-modal and inter-modal features, LMAM can also fully exploit the intra-modal contextual information within each modality and the complementary semantic information across modalities (i.e., text, video and audio) simultaneously. Experimental results on some benchmark datasets show that LMAM can be embedded into any existing state-of-the-art DL-based CER methods and help boost their performance in a plug-and-play manner. Also, experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods. Moreover, LMAM is a general cross-modal fusion method and can thus be applied to other multi-modal recognition tasks, e.g., session recommendation and humour detection. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 10 pages, 4 figures

arXiv:2306.17797 [pdf, other]

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

Authors: Li Pang, Weizhen Gu, Xiangyong Cao, Xiangyu Rui, Jiangjun Peng, Shuang Xu, Gang Yang, Deyu Meng

Abstract: Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, current deep learning-based approaches ignore this fact and restore the clean image with deterministic mapping (i.e., the network receives a noisy HSI and outputs a clean HSI). To alleviate this issue, this paper proposes a flow-based HSI denoising network (HIDFlowNet)… ▽ More Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, current deep learning-based approaches ignore this fact and restore the clean image with deterministic mapping (i.e., the network receives a noisy HSI and outputs a clean HSI). To alleviate this issue, this paper proposes a flow-based HSI denoising network (HIDFlowNet) to directly learn the conditional distribution of the clean HSI given the noisy HSI and thus diverse clean HSIs can be sampled from the conditional distribution. Overall, our HIDFlowNet is induced from the flow methodology and contains an invertible decoder and a conditional encoder, which can fully decouple the learning of low-frequency and high-frequency information of HSI. Specifically, the invertible decoder is built by staking a succession of invertible conditional blocks (ICBs) to capture the local high-frequency details since the invertible network is information-lossless. The conditional encoder utilizes down-sampling operations to obtain low-resolution images and uses transformers to capture correlations over a long distance so that global low-frequency information can be effectively extracted. Extensive experimental results on simulated and real HSI datasets verify the superiority of our proposed HIDFlowNet compared with other state-of-the-art methods both quantitatively and visually. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 10 pages, 8 figures

arXiv:2306.15412 [pdf, other]

doi 10.21437/Interspeech.2023-528

RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music

Authors: Haojie Wei, Xueke Cao, Tangpeng Dan, Yueguo Chen

Abstract: Vocal pitch is an important high-level feature in music audio processing. However, extracting vocal pitch in polyphonic music is more challenging due to the presence of accompaniment. To eliminate the influence of the accompaniment, most previous methods adopt music source separation models to obtain clean vocals from polyphonic music before predicting vocal pitches. As a result, the performance o… ▽ More Vocal pitch is an important high-level feature in music audio processing. However, extracting vocal pitch in polyphonic music is more challenging due to the presence of accompaniment. To eliminate the influence of the accompaniment, most previous methods adopt music source separation models to obtain clean vocals from polyphonic music before predicting vocal pitches. As a result, the performance of vocal pitch estimation is affected by the music source separation models. To address this issue and directly extract vocal pitches from polyphonic music, we propose a robust model named RMVPE. This model can extract effective hidden features and accurately predict vocal pitches from polyphonic music. The experimental results demonstrate the superiority of RMVPE in terms of raw pitch accuracy (RPA) and raw chroma accuracy (RCA). Additionally, experiments conducted with different types of noise show that RMVPE is robust across all signal-to-noise ratio (SNR) levels. The code of RMVPE is available at https://github.com/Dream-High/RMVPE. △ Less

Submitted 27 June, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: This paper has been accepted by INTERSPEECH 2023

arXiv:2305.15635 [pdf]

Vehicle-in-Virtual-Environment (VVE)

Authors: Xincheng Cao, Haochong Chen, Sukru Yaren Gelbal, Bilin Aksun-Guvenc, Levent Guvenc

Abstract: The current approach to connected and autonomous driving function development and evaluation uses model-in-the-loop simulation, hardware-in-the-loop simulation, and limited proving ground work followed by public road deployment of beta version of software and technology. The rest of the road users are involuntarily forced into taking part in the development and evaluation of these connected and au… ▽ More The current approach to connected and autonomous driving function development and evaluation uses model-in-the-loop simulation, hardware-in-the-loop simulation, and limited proving ground work followed by public road deployment of beta version of software and technology. The rest of the road users are involuntarily forced into taking part in the development and evaluation of these connected and autonomous driving functions in this approach. This is an unsafe, costly and inefficient method. Motivated by these shortcomings, this paper introduces the Vehicle-in-Virtual-Environment (VVE) method of safe, efficient and low cost connected and autonomous driving function development, evaluation and demonstration. The VVE method is compared to the existing state-of-the-art. Its basic implementation for a path following task is used to explain the method where the actual autonomous vehicle operates in a large empty area with its sensor feeds being replaced by realistic sensor feeds corresponding to its location and pose in the virtual environment. It is possible to easily change the development virtual environment and inject rare and difficult events which can be tested very safely. Vehicle-to-Pedestrian (V2P) communication based pedestrian safety is chosen as the application use case for VVE and corresponding experimental results are presented and discussed. It is noted that actual pedestrians and other vulnerable road users can be used very safely in this approach. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.10925 [pdf, other]

Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model

Authors: Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, Deyu Meng

Abstract: Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a la… ▽ More Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a large number of paired training data and suffer from poor generalization ability. To address these issues, in this work, we propose a low-rank diffusion model for hyperspectral pansharpening by simultaneously leveraging the power of the pre-trained deep diffusion model and better generalization ability of Bayesian methods. Specifically, we assume that the HRHS image can be recovered from the product of two low-rank tensors, i.e., the base tensor and the coefficient matrix. The base tensor lies on the image field and has a low spectral dimension. Thus, we can conveniently utilize a pre-trained remote sensing diffusion model to capture its image structures. Additionally, we derive a simple yet quite effective way to pre-estimate the coefficient matrix from the observed LRHS image, which preserves the spectral information of the HRHS. Experimental results demonstrate that the proposed method performs better than some popular traditional approaches and gains better generalization ability than some DL-based methods. The code is released in https://github.com/xyrui/PLRDiff. △ Less

Submitted 19 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.07774 [pdf, other]

PanFlowNet: A Flow-Based Deep Network for Pan-sharpening

Authors: Gang Yang, Xiangyong Cao, Wenzhe Xiao, Man Zhou, Aiping Liu, Xun chen, Deyu Meng

Abstract: Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learn… ▽ More Pan-sharpening aims to generate a high-resolution multispectral (HRMS) image by integrating the spectral information of a low-resolution multispectral (LRMS) image with the texture details of a high-resolution panchromatic (PAN) image. It essentially inherits the ill-posed nature of the super-resolution (SR) task that diverse HRMS images can degrade into an LRMS image. However, existing deep learning-based methods recover only one HRMS image from the LRMS image and PAN image using a deterministic mapping, thus ignoring the diversity of the HRMS image. In this paper, to alleviate this ill-posed issue, we propose a flow-based pan-sharpening network (PanFlowNet) to directly learn the conditional distribution of HRMS image given LRMS image and PAN image instead of learning a deterministic mapping. Specifically, we first transform this unknown conditional distribution into a given Gaussian distribution by an invertible network, and the conditional distribution can thus be explicitly defined. Then, we design an invertible Conditional Affine Coupling Block (CACB) and further build the architecture of PanFlowNet by stacking a series of CACBs. Finally, the PanFlowNet is trained by maximizing the log-likelihood of the conditional distribution given a training set and can then be used to predict diverse HRMS images. The experimental results verify that the proposed PanFlowNet can generate various HRMS images given an LRMS image and a PAN image. Additionally, the experimental results on different kinds of satellite datasets also demonstrate the superiority of our PanFlowNet compared with other state-of-the-art methods both visually and quantitatively. △ Less

Submitted 16 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Showing 1–50 of 123 results for author: Cao, X