Search | arXiv e-print repository

Dynamic Pricing of Electric Vehicle Charging Station Alliances Under Information Asymmetry

Authors: Zeyu Liu, Yun Zhou, Donghan Feng, Shaolun Xu, Yin Yi, Hengjie Li, Haojing Wang

Abstract: Due to the centralization of charging stations (CSs), CSs are organized as charging station alliances (CSAs) in the commercial competition. Under this situation, this paper studies the profit-oriented dynamic pricing strategy of CSAs. As the practicability basis, a privacy-protected bidirectional real-time information interaction framework is designed, under which the status of EVs is utilized as… ▽ More Due to the centralization of charging stations (CSs), CSs are organized as charging station alliances (CSAs) in the commercial competition. Under this situation, this paper studies the profit-oriented dynamic pricing strategy of CSAs. As the practicability basis, a privacy-protected bidirectional real-time information interaction framework is designed, under which the status of EVs is utilized as the reference for pricing, and the prices of CSs are the reference for charging decisions. Based on this framework, the decision-making models of EVs and CSs are established, in which the uncertainty caused by the information asymmetry between EVs and CSs and the bounded rationality of EV users are integrated. To solve the pricing decision model, the evolutionary game theory is adopted to describe the dynamic pricing game among CSAs, the equilibrium of which gives the optimal pricing strategy. Finally, the case study results in a real urban area in Shanghai, China verifies the practicability of the framework and the effectiveness of the dynamic pricing strategy. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.20852 [pdf, other]

Optimizing 5G-Advanced Networks for Time-critical Applications: The Role of L4S

Authors: Guangjin Pan, Shugong Xu, Pin Jiang

Abstract: As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the la… ▽ More As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the latest 3GPP 5G-Advanced Release 18 (R18) standard. Our case study, using a prototype system for a real-time communication (RTC) application, demonstrates the superiority of L4S technology. The experimental results show that, compared with the GCC algorithm, the proposed L4S-GCC algorithm can reduce the stalling rate by 1.51%-2.80% and increase the bandwidth utilization by 11.4%-31.4%. The results emphasize the immense potential of L4S technology in enhancing transmission performance in time-critical applications. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 7 pages, 3 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2407.20518 [pdf, other]

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Authors: Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

Abstract: Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, exi… ▽ More Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, existing methods struggle to capture rich image features effectively or rely on low-dimensional positional coordinates, making it difficult to accurately predict high-resolution gene expression profiles. To address these limitations, we developed HisToSGE, a method that employs a Pathology Image Large Model (PILM) to extract rich image features from histological images and utilizes a feature learning module to robustly generate high-resolution gene expression profiles. We evaluated HisToSGE on four ST datasets, comparing its performance with five state-of-the-art baseline methods. The results demonstrate that HisToSGE excels in generating high-resolution gene expression profiles and performing downstream tasks such as spatial domain identification. All code and public datasets used in this paper are available at https://github.com/wenwenmin/HisToSGE and https://zenodo.org/records/12792163. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.08509 [pdf, other]

Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration

Authors: Shuang Xu, Chang Yu, Jiangjun Peng, Xiangyong Cao

Abstract: Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derive… ▽ More Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derived from the 2-D frontal slice-wise Haar discrete wavelet transform, effectively modeling the low-rank prior for separated coarse-grained structure and fine-grained textures in the image. Experimental evaluations conducted on hyperspectral image inpainting, multi-temporal image cloud removal, and hyperspectral image denoising have revealed the HNN's potential. Typically, HNN achieves a performance improvement of 1-4 dB and a speedup of 10-28x compared to some state-of-the-art methods (e.g., tensor correlated total variation, and fully-connected tensor network) for inpainting tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03308 [pdf, other]

Accelerated Proton Resonance Frequency-based Magnetic Resonance Thermometry by Optimized Deep Learning Method

Authors: Sijie Xu, Shenyan Zong, Chang-Sheng Mei, Guofeng Shen, Yueran Zhao, He Wang

Abstract: Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruc… ▽ More Proton resonance frequency (PRF) based MR thermometry is essential for focused ultrasound (FUS) thermal ablation therapies. This work aims to enhance temporal resolution in dynamic MR temperature map reconstruction using an improved deep learning method. The training-optimized methods and five classical neural networks were applied on the 2-fold and 4-fold under-sampling k-space data to reconstruct the temperature maps. The enhanced training modules included offline/online data augmentations, knowledge distillation, and the amplitude-phase decoupling loss function. The heating experiments were performed by a FUS transducer on phantom and ex vivo tissues, respectively. These data were manually under-sampled to imitate acceleration procedures and trained in our method to get the reconstruction model. The additional dozen or so testing datasets were separately obtained for evaluating the real-time performance and temperature accuracy. Acceleration factors of 1.9 and 3.7 were found for 2 times and 4 times k-space under-sampling strategies and the ResUNet-based deep learning reconstruction performed exceptionally well. In 2-fold acceleration scenario, the RMSE of temperature map patches provided the values of 0.888 degree centigrade and 1.145 degree centigrade on phantom and ex vivo testing datasets. The DICE value of temperature areas enclosed by 43 degree centigrade isotherm was 0.809, and the Bland-Altman analysis showed a bias of -0.253 degree centigrade with the apart of plus or minus 2.16 degree centigrade. In 4 times under-sampling case, these evaluating values decreased by approximately 10%. This study demonstrates that deep learning-based reconstruction can significantly enhance the accuracy and efficiency of MR thermometry for clinical FUS thermal therapies. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03034 [pdf, ps, other]

Attention Incorporated Network for Sharing Low-rank, Image and K-space Information during MR Image Reconstruction to Achieve Single Breath-hold Cardiac Cine Imaging

Authors: Siying Xu, Kerstin Hammernik, Andreas Lingg, Jens Kuebler, Patrick Krumm, Daniel Rueckert, Sergios Gatidis, Thomas Kuestner

Abstract: Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities,… ▽ More Cardiac Cine Magnetic Resonance Imaging (MRI) provides an accurate assessment of heart morphology and function in clinical practice. However, MRI requires long acquisition times, with recent deep learning-based methods showing great promise to accelerate imaging and enhance reconstruction quality. Existing networks exhibit some common limitations that constrain further acceleration possibilities, including single-domain learning, reliance on a single regularization term, and equal feature contribution. To address these limitations, we propose to embed information from multiple domains, including low-rank, image, and k-space, in a novel deep learning network for MRI reconstruction, which we denote as A-LIKNet. A-LIKNet adopts a parallel-branch structure, enabling independent learning in the k-space and image domain. Coupled information sharing layers realize the information exchange between domains. Furthermore, we introduce attention mechanisms into the network to assign greater weights to more critical coils or important temporal frames. Training and testing were conducted on an in-house dataset, including 91 cardiovascular patients and 38 healthy subjects scanned with 2D cardiac Cine using retrospective undersampling. Additionally, we evaluated A-LIKNet on the real-time 8x prospectively undersampled data from the OCMR dataset. The results demonstrate that our proposed A-LIKNet outperforms existing methods and provides high-quality reconstructions. The network can effectively reconstruct highly retrospectively undersampled dynamic MR images up to 24x accelerations, indicating its potential for single breath-hold imaging. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.11006 [pdf, other]

SPEAR: Receiver-to-Receiver Acoustic Neural Warping Field

Authors: Yuhang He, Shitong Xu, Jia-Xing Zhong, Sangyun Shin, Niki Trigoni, Andrew Markham

Abstract: We present SPEAR, a continuous receiver-to-receiver acoustic neural warping field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that require prior space acoustic properties knowledge to rigorously model audio propagation from source to receiver, we propose to predict by warping the spat… ▽ More We present SPEAR, a continuous receiver-to-receiver acoustic neural warping field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that require prior space acoustic properties knowledge to rigorously model audio propagation from source to receiver, we propose to predict by warping the spatial acoustic effects from one reference receiver position to another target receiver position, so that the warped audio essentially accommodates all spatial acoustic effects belonging to the target position. SPEAR can be trained in a data much more readily accessible manner, in which we simply ask two robots to independently record spatial audio at different positions. We further theoretically prove the universal existence of the warping field if and only if one audio source presents. Three physical principles are incorporated to guide SPEAR network design, leading to the learned warping field physically meaningful. We demonstrate SPEAR superiority on both synthetic, photo-realistic and real-world dataset, showing the huge potential of SPEAR to various down-stream robotic tasks. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures in main paper

arXiv:2406.09426 [pdf]

Analyzing phonetic structure of Mandarin using Audacity

Authors: Shizheng Xu

Abstract: Mandarin Chinese is the official language in China, Taiwan, and Singapore. It is also the main non-official language spoken predominantly at home in Toronto and Vancouver. This article employs the audio software Audacity and leverages theoretical knowledge to conduct a comprehensive analysis of Mandarin Chinese. The study initiates with an overview of the fundamental principles underlying Mandarin… ▽ More Mandarin Chinese is the official language in China, Taiwan, and Singapore. It is also the main non-official language spoken predominantly at home in Toronto and Vancouver. This article employs the audio software Audacity and leverages theoretical knowledge to conduct a comprehensive analysis of Mandarin Chinese. The study initiates with an overview of the fundamental principles underlying Mandarin pronunciation, aiming to provide insights into its phonetic structure. △ Less

Submitted 14 April, 2024; originally announced June 2024.

Comments: audio source: https://leetcafe.com/language-analysis/

arXiv:2406.07421 [pdf, other]

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this study, we delve into two speaker augmentation approaches: speed perturbation (SP) and vocal tract length perturbation (VTLP). Despite the empirical utilization of both methods, a comprehensive investigation into their efficacy is lacking. Our study, conducted using two public datasets, VoxCeleb and CN-Celeb, revealed that both SP and VTLP are proficient at generating new speakers, leading to significant performance improvements in speaker recognition. Furthermore, they exhibit distinct properties in sensitivity to perturbation factors and data complexity, hinting at the potential benefits of their fusion. Our research underscores the substantial potential of speaker augmentation, highlighting the importance of in-depth exploration and analysis. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: to be published in INTERSPEECH 2024

arXiv:2406.07310 [pdf, other]

MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting

Authors: Zhiqi Ai, Zhiyong Chen, Shugong Xu

Abstract: In this paper, we propose MM-KWS, a novel approach to user-defined keyword spotting leveraging multi-modal enrollments of text and speech templates. Unlike previous methods that focus solely on either text or speech features, MM-KWS extracts phoneme, text, and speech embeddings from both modalities. These embeddings are then compared with the query speech embedding to detect the target keywords. T… ▽ More In this paper, we propose MM-KWS, a novel approach to user-defined keyword spotting leveraging multi-modal enrollments of text and speech templates. Unlike previous methods that focus solely on either text or speech features, MM-KWS extracts phoneme, text, and speech embeddings from both modalities. These embeddings are then compared with the query speech embedding to detect the target keywords. To ensure the applicability of MM-KWS across diverse languages, we utilize a feature extractor incorporating several multilingual pre-trained models. Subsequently, we validate its effectiveness on Mandarin and English tasks. In addition, we have integrated advanced data augmentation tools for hard case mining to enhance MM-KWS in distinguishing confusable words. Experimental results on the LibriPhrase and WenetPhrase datasets demonstrate that MM-KWS outperforms prior methods significantly. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.02438 [pdf, other]

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan

Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesized using state-of-the-art methods from publicly accessible singing voice datasets. CtrSVDD includes 47.64 hours of bonafide and 260.34 hours of deepfake singing vocals, spanning 14 deepfake methods and involving 164 singer identities. We also present a baseline system with flexible front-end features, evaluated against a structured train/dev/eval split. The experiments show the importance of feature selection and highlight a need for generalization towards deepfake methods that deviate further from training distribution. The CtrSVDD dataset and baselines are publicly accessible. △ Less

Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2405.19450 [pdf, other]

FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining

Authors: Dong Li, Yidi Liu, Xueyang Fu, Senyan Xu, Zheng-Jun Zha

Abstract: Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely… ▽ More Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely exploit the correlation of different frequencies for conjuncting their learning procedures, limiting the full utilization of frequency information for image deraining. Alternatively, the recently emerged Mamba technique depicts its effectiveness and efficiency for modeling correlation in various domains (e.g., spatial, temporal), and we argue that introducing Mamba into its unexplored Fourier spaces to correlate different frequencies would help improve image deraining. This motivates us to propose a new framework termed FourierMamba, which performs image deraining with Mamba in the Fourier space. Owning to the unique arrangement of frequency orders in Fourier space, the core of FourierMamba lies in the scanning encoding of different frequencies, where the low-high frequency order formats exhibit differently in the spatial dimension (unarranged in axis) and channel dimension (arranged in axis). Therefore, we design FourierMamba that correlates Fourier space information in the spatial and channel dimensions with distinct designs. Specifically, in the spatial dimension Fourier space, we introduce the zigzag coding to scan the frequencies to rearrange the orders from low to high frequencies, thereby orderly correlating the connections between frequencies; in the channel dimension Fourier space with arranged orders of frequencies in axis, we can directly use Mamba to perform frequency correlation and improve the channel information representation. △ Less

Submitted 7 August, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.07689 [pdf, other]

Quality of Experience Optimization for Real-time XR Video Transmission with Energy Constraints

Authors: Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun

Abstract: Extended Reality (XR) is an important service in the 5G network and in future 6G networks. In contrast to traditional video on demand services, real-time XR video is transmitted frame-by-frame, requiring low latency and being highly sensitive to network fluctuations. In this paper, we model the quality of experience (QoE) for real-time XR video transmission on a frame-by-frame basis. Based on the… ▽ More Extended Reality (XR) is an important service in the 5G network and in future 6G networks. In contrast to traditional video on demand services, real-time XR video is transmitted frame-by-frame, requiring low latency and being highly sensitive to network fluctuations. In this paper, we model the quality of experience (QoE) for real-time XR video transmission on a frame-by-frame basis. Based on the proposed QoE model, we formulate an optimization problem that maximizes QoE with constraints on wireless resources and long-term energy consumption. We utilize Lyapunov optimization to transform the original problem into a single-frame optimization problem and then allocate wireless subchannels. We propose an adaptive XR video bitrate algorithm that employs a Long Short Term Memory (LSTM) based Deep Q-Network (DQN) algorithm for video bitrate selection. Through numerical results, we show that our proposed algorithm outperforms the baseline algorithms, with the average QoE improvements of 0.04 to 0.46. Specifically, compared to baseline algorithms, the proposed algorithm reduces average video quality variations by 29% to 50% and improves the frame transmission success rate by 5% to 48%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 6 pages, 5 figures

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.16223 [pdf, other]

Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: CVPR 2024 - NTIRE Workshop

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.09905 [pdf, other]

Quality of Experience Oriented Cross-layer Optimization for Real-time XR Video Transmission

Authors: Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun

Abstract: Extended reality (XR) is one of the most important applications of beyond 5G and 6G networks. Real-time XR video transmission presents challenges in terms of data rate and delay. In particular, the frame-by-frame transmission mode of XR video makes real-time XR video very sensitive to dynamic network environments. To improve the users' quality of experience (QoE), we design a cross-layer transmiss… ▽ More Extended reality (XR) is one of the most important applications of beyond 5G and 6G networks. Real-time XR video transmission presents challenges in terms of data rate and delay. In particular, the frame-by-frame transmission mode of XR video makes real-time XR video very sensitive to dynamic network environments. To improve the users' quality of experience (QoE), we design a cross-layer transmission framework for real-time XR video. The proposed framework allows the simple information exchange between the base station (BS) and the XR server, which assists in adaptive bitrate and wireless resource scheduling. We utilize the cross-layer information to formulate the problem of maximizing user QoE by finding the optimal scheduling and bitrate adjustment strategies. To address the issue of mismatched time scales between two strategies, we decouple the original problem and solve them individually using a multi-agent-based approach. Specifically, we propose the multi-step Deep Q-network (MS-DQN) algorithm to obtain a frame-priority-based wireless resource scheduling strategy and then propose the Transformer-based Proximal Policy Optimization (TPPO) algorithm for video bitrate adaptation. The experimental results show that the TPPO+MS-DQN algorithm proposed in this study can improve the QoE by 3.6% to 37.8%. More specifically, the proposed MS-DQN algorithm enhances the transmission quality by 49.9%-80.2%. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 14 pages, 13 figures. arXiv admin note: text overlap with arXiv:2402.01180

arXiv:2403.19251 [pdf, other]

Arbitrary State Transition of Open Qubit System Based on Switching Control

Authors: Guangpu Wu, Shibei Xue, Shan Ma, Sen Kuang, Daoyi Dong, Ian R. Petersen

Abstract: We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In compar… ▽ More We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In comparison to existing works, this control strategy relaxes the strict constraints on system models imposed by special target states. Furthermore, we identify conditions under which the open qubit system achieves finite-time stability (FTS) and finite-time contractive stability (FTCS), respectively. This represents a critical improvement in quantum state transitions, especially considering the asymptotic stability of arbitrary target states is unattainable in open quantum systems. The effectiveness of our proposed method is convincingly demonstrated through its application in a qubit system affected by various types of decoherence, including amplitude, dephasing and polarization decoherence. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 12 pages, 7 figures

arXiv:2403.16643 [pdf, other]

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

Authors: Qingping Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

Abstract: Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such… ▽ More Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such artifacts, ranging from trivial noise to unauthentic textures, deviate from the true structure of the source image, thus challenging the integrity of the super-resolution process. In this work, we propose Self-Adaptive Reality-Guided Diffusion (SARGD), a training-free method that delves into the latent space to effectively identify and mitigate the propagation of artifacts. Our SARGD begins by using an artifact detector to identify implausible pixels, creating a binary mask that highlights artifacts. Following this, the Reality Guidance Refinement (RGR) process refines artifacts by integrating this mask with realistic latent representations, improving alignment with the original image. Nonetheless, initial realistic-latent representations from lower-quality images result in over-smoothing in the final output. To address this, we introduce a Self-Adaptive Guidance (SAG) mechanism. It dynamically computes a reality score, enhancing the sharpness of the realistic latent. These alternating mechanisms collectively achieve artifact-free super-resolution. Extensive experiments demonstrate the superiority of our method, delivering detailed artifact-free high-resolution images while reducing sampling steps by 2X. We release our code at https://github.com/ProAirVerse/Self-Adaptive-Guidance-Diffusion.git. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15145 [pdf, ps, other]

Robust Resource Allocation for STAR-RIS Assisted SWIPT Systems

Authors: Guangyu Zhu, Xidong Mu, Li Guo, Ao Huang, Shibiao Xu

Abstract: A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is proposed. More particularly, an STAR-RIS is deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), where two practica… ▽ More A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted simultaneous wireless information and power transfer (SWIPT) system is proposed. More particularly, an STAR-RIS is deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), where two practical STAR-RIS operating protocols, namely energy splitting (ES) and time switching (TS), are employed. Under the imperfect channel state information (CSI) condition, a multi-objective optimization problem (MOOP) framework, that simultaneously maximizes the minimum data rate and minimum harvested power, is employed to investigate the fundamental rate-energy trade-off between IUs and EUs. To obtain the optimal robust resource allocation strategy, the MOOP is first transformed into a single-objective optimization problem (SOOP) via the ε-constraint method, which is then reformulated by approximating semi-infinite inequality constraints with the S-procedure. For ES, an alternating optimization (AO)-based algorithm is proposed to jointly design AP active beamforming and STAR-RIS passive beamforming, where a penalty method is leveraged in STAR-RIS beamforming design. Furthermore, the developed algorithm is extended to optimize the time allocation policy and beamforming vectors in a two-layer iterative manner for TS. Numerical results reveal that: 1) deploying STAR-RISs achieves a significant performance gain over conventional RISs, especially in terms of harvested power for EUs; 2) the ES protocol obtains a better user fairness performance when focusing only on IUs or EUs, while the TS protocol yields a better balance between IUs and EUs; 3) the imperfect CSI affects IUs more significantly than EUs, whereas TS can confer a more robust design to attenuate these effects. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.13245 [pdf, other]

Federated reinforcement learning for robot motion planning with zero-shot generalization

Authors: Zhenyuan Yuan, Siyuan Xu, Minghui Zhu

Abstract: This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing… ▽ More This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework. △ Less

Submitted 7 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.18070 [pdf, other]

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 7 figures, conference

arXiv:2402.01180 [pdf, other]

Real-time Extended Reality Video Transmission Optimization Based on Frame-priority Scheduling

Authors: Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun

Abstract: Extended reality (XR) is one of the most important applications of 5G. For real-time XR video transmission in 5G networks, a low latency and high data rate are required. In this paper, we propose a resource allocation scheme based on frame-priority scheduling to meet these requirements. The optimization problem is modelled as a frame-priority-based radio resource scheduling problem to improve tran… ▽ More Extended reality (XR) is one of the most important applications of 5G. For real-time XR video transmission in 5G networks, a low latency and high data rate are required. In this paper, we propose a resource allocation scheme based on frame-priority scheduling to meet these requirements. The optimization problem is modelled as a frame-priority-based radio resource scheduling problem to improve transmission quality. We propose a scheduling framework based on multi-step Deep Q-network (MS-DQN) and design a neural network model based on convolutional neural network (CNN). Simulation results show that the scheduling framework based on frame-priority and MS-DQN can improve transmission quality by 49.9%-80.2%. △ Less

Submitted 7 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 6 pages, 7 figures

arXiv:2401.09552 [pdf, other]

Centralized active reconfigurable intelligent surface: Architecture, path loss analysis and experimental verification

Authors: Changhao Liu, Fan Yang, Shenheng Xu, Yezhen Li, Maokun Li

Abstract: Reconfigurable intelligent surfaces (RISs) are promising candidate for the 6G communication. Recently, active RIS has been proposed to compensate the multiplicative fading effect inherent in passive RISs. However, conventional distributed active RISs, with at least one amplifier per element, are costly, complex, and power-intensive. To address these challenges, this paper proposes a novel architec… ▽ More Reconfigurable intelligent surfaces (RISs) are promising candidate for the 6G communication. Recently, active RIS has been proposed to compensate the multiplicative fading effect inherent in passive RISs. However, conventional distributed active RISs, with at least one amplifier per element, are costly, complex, and power-intensive. To address these challenges, this paper proposes a novel architecture of active RIS: the centralized active RIS (CA-RIS), which amplifies the energy using a centralized amplifying reflector to reduce the number of amplifiers. Under this architecture, only as low as one amplifier is needed for power amplification of the entire array, which can eliminate the mutual-coupling effect among amplifiers, and significantly reduce the cost, noise level, and power consumption. We evaluate the performance of CA-RIS, specifically its path loss, and compare it with conventional passive RISs, revealing a moderate amplification gain. Furthermore, the proposed CA-RIS and the path loss model are experimentally verified, achieving a 9.6 dB net gain over passive RIS at 4 GHz. The CA-RIS offers a substantial simplification of active RIS architecture while preserving performance, striking an optimal balance between system complexity and the performance, which is competitive in various scenarios. △ Less

Submitted 18 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.00135 [pdf]

Deep Radon Prior: A Fully Unsupervised Framework for Sparse-View CT Reconstruction

Authors: Shuo Xu, Yucheng Zhang, Gang Chen, Xincheng Xiang, Peng Cong, Yuewen Sun

Abstract: Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality tr… ▽ More Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality training data, weak interpretability, etc. In this study, we propose a fully unsupervised framework called Deep Radon Prior (DRP), inspired by Deep Image Prior (DIP), to address the aforementioned limitations. DRP introduces a neural network as an implicit prior into the iterative method, thereby realizing cross-domain gradient feedback. During the reconstruction process, the neural network is progressively optimized in multiple stages to narrow the solution space in radon domain for the under-constrained imaging protocol, and the convergence of the proposed method has been discussed in this work. Compared with the popular pre-trained method, the proposed framework requires no dataset and exhibits superior interpretability and generalization ability. The experimental results demonstrate that the proposed method can generate detailed images while effectively suppressing image artifacts.Meanwhile, DRP achieves comparable or better performance than the supervised methods. △ Less

Submitted 29 December, 2023; originally announced January 2024.

Comments: 11 pages, 12 figures, Journal paper

arXiv:2312.09063 [pdf, other]

Image Demoireing in RAW and sRGB Domains

Authors: Shuning Xu, Binbin Song, Xiangyu Chen, Xina Liu, Jiantao Zhou

Abstract: Moire patterns frequently appear when capturing screens with smartphones or cameras, potentially compromising image quality. Previous studies suggest that moire pattern elimination in the RAW domain offers greater effectiveness compared to demoireing in the sRGB domain. Nevertheless, relying solely on RAW data for image demoireing is insufficient in mitigating the color cast due to the absence of… ▽ More Moire patterns frequently appear when capturing screens with smartphones or cameras, potentially compromising image quality. Previous studies suggest that moire pattern elimination in the RAW domain offers greater effectiveness compared to demoireing in the sRGB domain. Nevertheless, relying solely on RAW data for image demoireing is insufficient in mitigating the color cast due to the absence of essential information required for the color correction by the image signal processor (ISP). In this paper, we propose to jointly utilize both RAW and sRGB data for image demoireing (RRID), which are readily accessible in modern smartphones and DSLR cameras. We develop Skip-Connection-based Demoireing Module (SCDM) with Gated Feedback Module (GFM) and Frequency Selection Module (FSM) embedded in skip-connections for the efficient and effective demoireing of RAW and sRGB features, respectively. Subsequently, we design a RGB Guided ISP (RGISP) to learn a device-dependent ISP, assisting the process of color recovery. Extensive experiments demonstrate that our RRID outperforms state-of-the-art approaches, in terms of the performance in moire pattern removal and color cast correction by 0.62dB in PSNR and 0.003 in SSIM. △ Less

Submitted 15 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.05256 [pdf, other]

Holistic Evaluation of GPT-4V for Biomedical Imaging

Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, Jingyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2312.01042 [pdf, ps, other]

Covert Communications in STAR-RIS-Aided Rate-Splitting Multiple Access Systems

Authors: Heng Chang, Hai Yang, Shuobo Xu, Xiyu Pang, Hongwu Liu

Abstract: In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transm… ▽ More In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transmissions, while the STAR-RIS is deployed not only to aid the public transmissions from Alice to Grace, but also to shield the covert transmissions from Alice to Bob against the warden (Willie). To characterize the covert performance of the considered STAR-RIS-aided RSMA (STAR-RIS-RSMA) system, we derive analytical expression for the minimum average detection error probability of Willie, based on which a covert rate maximization problem is formulated. To maximize Bob's covert rate while confusing Willie's monitoring, the transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming are jointly optimized subject to Grace's quality of service (QoS) requirements. The non-convex covert rate maximization problem, consisting of highly coupled system parameters are decoupled into three sub-problems of transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming, respectively. To obtain the rank-one constrained optimal solution for the sub-problem of optimizing the STAR-RIS reflection/transmission beamforming, a penalty-based successive convex approximation scheme is developed. Moreover, an alternative optimization (AO) algorithm is designed to determine the optimal solution for the sub-problem of optimizing the transmit power allocation, while the original problem is overall solved by a new AO algorithm. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 17 pages, submitted to journal

arXiv:2311.04116 [pdf, other]

Improved Topological Preservation in 3D Axon Segmentation and Centerline Detection using Geometric Assessment-driven Topological Smoothing (GATS)

Authors: Nina I. Shamsi, Alex S. Xu, Lars A. Gjesteby, Laura J. Brattain

Abstract: Automated axon tracing via fully supervised learning requires large amounts of 3D brain imagery, which is time consuming and laborious to obtain. It also requires expertise. Thus, there is a need for more efficient segmentation and centerline detection techniques to use in conjunction with automated annotation tools. Topology-preserving methods ensure that segmented components maintain geometric c… ▽ More Automated axon tracing via fully supervised learning requires large amounts of 3D brain imagery, which is time consuming and laborious to obtain. It also requires expertise. Thus, there is a need for more efficient segmentation and centerline detection techniques to use in conjunction with automated annotation tools. Topology-preserving methods ensure that segmented components maintain geometric connectivity, which is especially meaningful for applications where volumetric data is used, and these methods often make use of morphological thinning algorithms as the thinned outputs can be useful for both segmentation and centerline detection of curvilinear structures. Current morphological thinning approaches used in conjunction with topology-preserving methods are prone to over-thinning and require manual configuration of hyperparameters. We propose an automated approach for morphological smoothing using geometric assessment of the radius of tubular structures in brain microscopy volumes, and apply average pooling to prevent over-thinning. We use this approach to formulate a loss function, which we call Geo-metric Assessment-driven Topological Smoothing loss, or GATS. Our approach increased segmentation and center-line detection evaluation metrics by 2%-5% across multiple datasets, and improved the Betti error rates by 9%. Our ablation study showed that geometric assessment of tubular structures achieved higher segmentation and centerline detection scores, and using average pooling for morphological smoothing in place of thinning algorithms reduced the Betti errors. We observed increased topological preservation during automated annotation of 3D axons volumes from models trained with GATS. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.00263 [pdf, other]

The bottleneck and ceiling effects in quantized tracking control of heterogeneous multi-agent systems under DoS attacks

Authors: Shuai Feng, Maopeng Ran, Baoyong Zhang, Lihua Xie, Shengyuan Xu

Abstract: In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning… ▽ More In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning the inter-follower quantized controller, one cannot improve the resilience beyond a level determined by the data rate of leader-follower quantized communication, i.e., the ceiling effect. Otherwise, overflow of followers' state quantizer can occur. On the other hand, if one selects a "large" data rate for leader-follower quantized communication, then the inter-follower quantized communication determines the resilience, and further increasing the data rate for leader-follower quantized communication cannot improve the resilience, i.e., the bottleneck effect. Simulation examples are provided to justify the results of our paper. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.01565 [pdf, other]

Causality-informed Rapid Post-hurricane Building Damage Detection in Large Scale from InSAR Imagery

Authors: Chenguang Wang, Yepeng Liu, Xiaojian Zhang, Xuechun Li, Vladimir Paramygin, Arthriya Subgranon, Peter Sheng, Xilei Zhao, Susu Xu

Abstract: Timely and accurate assessment of hurricane-induced building damage is crucial for effective post-hurricane response and recovery efforts. Recently, remote sensing technologies provide large-scale optical or Interferometric Synthetic Aperture Radar (InSAR) imagery data immediately after a disastrous event, which can be readily used to conduct rapid building damage assessment. Compared to optical s… ▽ More Timely and accurate assessment of hurricane-induced building damage is crucial for effective post-hurricane response and recovery efforts. Recently, remote sensing technologies provide large-scale optical or Interferometric Synthetic Aperture Radar (InSAR) imagery data immediately after a disastrous event, which can be readily used to conduct rapid building damage assessment. Compared to optical satellite imageries, the Synthetic Aperture Radar can penetrate cloud cover and provide more complete spatial coverage of damaged zones in various weather conditions. However, these InSAR imageries often contain highly noisy and mixed signals induced by co-occurring or co-located building damage, flood, flood/wind-induced vegetation changes, as well as anthropogenic activities, making it challenging to extract accurate building damage information. In this paper, we introduced an approach for rapid post-hurricane building damage detection from InSAR imagery. This approach encoded complex causal dependencies among wind, flood, building damage, and InSAR imagery using a holistic causal Bayesian network. Based on the causal Bayesian network, we further jointly inferred the large-scale unobserved building damage by fusing the information from InSAR imagery with prior physical models of flood and wind, without the need for ground truth labels. Furthermore, we validated our estimation results in a real-world devastating hurricane -- the 2022 Hurricane Ian. We gathered and annotated building damage ground truth data in Lee County, Florida, and compared the introduced method's estimation results with the ground truth and benchmarked it against state-of-the-art models to assess the effectiveness of our proposed method. Results show that our method achieves rapid and accurate detection of building damage, with significantly reduced processing time compared to traditional manual inspection methods. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 6 pages, 3 figures

arXiv:2310.00593 [pdf, other]

Nonlinear Multi-Carrier System with Signal Clipping: Measurement, Analysis, and Optimization

Authors: Yuyang Du, Liang Hao, Yiming Lei, Qun Yang, Shiqi Xu

Abstract: Signal clipping is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clipping reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clipping distortion. Optimizing the j… ▽ More Signal clipping is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clipping reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clipping distortion. Optimizing the joint system performance with consideration of both PA nonlinearity and clipping distortion remains an open problem due to the complex PA modeling. In this paper, we analyze the PA nonlinearity through the Bessel-Fourier PA (BFPA) model and simplify its power expression using inter-modulation product (IMP) analysis. We derive expressions of the receiver signal-to-noise ratio (SNR) and system symbol error rate (SER) for the nonlinear clipped OFDM system. With the derivations, we investigate the optimal system setting to achieve the SER lower bound in a practical OFDM system that considers both PA nonlinearity and clipping distortion. The methods and results presented in this paper can serve as a useful reference for the system-level optimization of clipped OFDM systems with nonlinear PA. △ Less

Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.13819 [pdf, other]

A Two-Step Approach for Narrowband Source Localization in Reverberant Rooms

Authors: Wei-Ting Lai, Lachlan Birnie, Thushara Abhayapala, Amy Bastine, Shaoheng Xu, Prasanga Samarasinghe

Abstract: This paper presents a two-step approach for narrowband source localization within reverberant rooms. The first step involves dereverberation by modeling the homogeneous component of the sound field by an equivalent decomposition of planewaves using Iteratively Reweighted Least Squares (IRLS), while the second step focuses on source localization by modeling the dereverberated component as a sparse… ▽ More This paper presents a two-step approach for narrowband source localization within reverberant rooms. The first step involves dereverberation by modeling the homogeneous component of the sound field by an equivalent decomposition of planewaves using Iteratively Reweighted Least Squares (IRLS), while the second step focuses on source localization by modeling the dereverberated component as a sparse representation of point-source distribution using Orthogonal Matching Pursuit (OMP). The proposed method enhances localization accuracy with fewer measurements, particularly in environments with strong reverberation. A numerical simulation in a conference room scenario, using a uniform microphone array affixed to the wall, demonstrates real-world feasibility. Notably, the proposed method and microphone placement effectively localize sound sources within the 2D-horizontal plane without requiring prior knowledge of boundary conditions and room geometry, making it versatile for application in different room types. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.07152 [pdf]

Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, Jinwoo Lee, Woon-Hong Yeo

Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the current facial respirators do not inform the user of the air quality within the smart facepiece respirator in case of continuous long-term use. Here, we demonstrate the novel smart N-95 filtering facepiece respirator that incorporates the humidity sensor and pressure sensory feedback-enabled self-fit adjusting functionality for the effective performance of the facial respirator to prevent the transmission of airborne pathogens. The laser-induced graphene (LIG) constitutes the humidity sensor, and the pressure sensor array based on the dielectric elastomeric sponge monitors the respirator contact on the face of the user, providing the sensory information for a closed-loop feedback mechanism. As a result of the self-fit adjusting mode along with elastomeric lining, the fit factor is increased by 3.20 and 5 times at average and maximum respectively. We expect that the experimental proof-of-concept of this work will offer viable solutions to the current commercial respirators to address the limitations. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

MSC Class: 92C55

arXiv:2309.04335 [pdf, ps, other]

On the performance of an integrated communication and localization system: an analytical framework

Authors: Yuan Gao, Haonan Hu, Jiliang Zhang, Yanliang Jin, Shugong Xu, Xiaoli Chu

Abstract: Quantifying the performance bound of an integrated localization and communication (ILAC) system and the trade-off between communication and localization performance is critical. In this letter, we consider an ILAC system that can perform communication and localization via time-domain or frequency-domain resource allocation. We develop an analytical framework to derive the closed-form expression of… ▽ More Quantifying the performance bound of an integrated localization and communication (ILAC) system and the trade-off between communication and localization performance is critical. In this letter, we consider an ILAC system that can perform communication and localization via time-domain or frequency-domain resource allocation. We develop an analytical framework to derive the closed-form expression of the capacity loss versus localization Cramer-Rao lower bound (CRB) loss via time-domain and frequency-domain resource allocation. Simulation results validate the analytical model and demonstrate that frequency-domain resource allocation is preferable in scenarios with a smaller number of antennas at the next generation nodeB (gNB) and a larger distance between user equipment (UE) and gNB, while time-domain resource allocation is preferable in scenarios with a larger number of antennas and smaller distance between UE and the gNB. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures

arXiv:2308.16612 [pdf, other]

Neural Gradient Regularizer

Authors: Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc Van Gool

Abstract: Owing to its significant success, the prior imposed on gradient maps has consistently been a subject of great interest in the field of image processing. Total variation (TV), one of the most representative regularizers, is known for its ability to capture the intrinsic sparsity prior underlying gradient maps. Nonetheless, TV and its variants often underestimate the gradient maps, leading to the we… ▽ More Owing to its significant success, the prior imposed on gradient maps has consistently been a subject of great interest in the field of image processing. Total variation (TV), one of the most representative regularizers, is known for its ability to capture the intrinsic sparsity prior underlying gradient maps. Nonetheless, TV and its variants often underestimate the gradient maps, leading to the weakening of edges and details whose gradients should not be zero in the original image (i.e., image structures is not describable by sparse priors of gradient maps). Recently, total deep variation (TDV) has been introduced, assuming the sparsity of feature maps, which provides a flexible regularization learned from large-scale datasets for a specific task. However, TDV requires to retrain the network with image/task variations, limiting its versatility. To alleviate this issue, in this paper, we propose a neural gradient regularizer (NGR) that expresses the gradient map as the output of a neural network. Unlike existing methods, NGR does not rely on any subjective sparsity or other prior assumptions on image gradient maps, thereby avoiding the underestimation of gradient maps. NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer. Extensive experimental results demonstrate the superior performance of NGR over state-of-the-art counterparts for a range of different tasks, further validating its effectiveness and versatility. △ Less

Submitted 13 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.12617 [pdf, ps, other]

Quantized distributed Nash equilibrium seeking under DoS attacks: A quantized consensus based approach

Authors: Shuai Feng, Maojiao Ye, Lihua Xie, Shengyuan Xu

Abstract: This paper studies distributed Nash equilibrium (NE) seeking under Denial-of-Service (DoS) attacks and quantization. The players can only exchange information with their own direct neighbors. The transmitted information is subject to quantization and packet losses induced by malicious DoS attacks. We propose a quantized distributed NE seeking strategy based on the approach of dynamic quantized con… ▽ More This paper studies distributed Nash equilibrium (NE) seeking under Denial-of-Service (DoS) attacks and quantization. The players can only exchange information with their own direct neighbors. The transmitted information is subject to quantization and packet losses induced by malicious DoS attacks. We propose a quantized distributed NE seeking strategy based on the approach of dynamic quantized consensus. To solve the quantizer saturation problem caused by DoS attacks, the quantization mechanism is equipped to have zooming-in and holding capabilities, in which the holding capability is consistent with the results in quantized consensus under DoS. A sufficient condition on the number of quantizer levels is provided, under which the quantizers are free from saturation under DoS attacks. The proposed distributed quantized NE seeking strategy is shown to have the so-called maximum resilience to DoS attacks. Namely, if the bound characterizing the maximum resilience is violated, an attacker can deny all the transmissions and hence distributed NE seeking is impossible. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.10181 [pdf]

Stochastic Optimization of Coupled Power Distribution-Urban Transportation Network Operations with Autonomous Mobility on Demand Systems

Authors: Han Wang, Xiaoyuan Xu, Yue Chen, Zheng Yan, Mohammad Shahidehpour, Jiaqi Li, Shaolun Xu

Abstract: Autonomous mobility on demand systems (AMoDS) will significantly affect the operation of coupled power distribution-urban transportation networks (PTNs) by the optimal dispatch of electric vehicles (EVs). This paper proposes an uncertainty method to analyze the operational states of PTNs with AMoDS. First, a PTN operation framework is designed considering the controllable EVs dispatched by AMoDS a… ▽ More Autonomous mobility on demand systems (AMoDS) will significantly affect the operation of coupled power distribution-urban transportation networks (PTNs) by the optimal dispatch of electric vehicles (EVs). This paper proposes an uncertainty method to analyze the operational states of PTNs with AMoDS. First, a PTN operation framework is designed considering the controllable EVs dispatched by AMoDS as well as the uncontrollable driving behaviors of other vehicle users. Then, a bi-level power-traffic flow (PTF) model is proposed to characterize the interaction of power distribution networks (PDNs) and urban transportation networks (UTNs). In the upper level, a social optimum model is established to minimize the operating cost of PDNs and UTNs embedded with controllable EVs. In the lower level, a stochastic user equilibrium (SUE) model is established to minimize the operating cost of uncontrollable EVs and gasoline vehicles (GVs) in UTNs. Finally, a probabilistic PTF analysis method is developed to evaluate PTN operations under environmental and human uncertainties. A regional sensitivity analysis method is proposed to identify the critical uncertainties and quantify the impacts of their distribution ranges on PTN operations. The effectiveness of the proposed method is verified by the PTN consisting of a 21-bus PDN and a 20-node UTN. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 10 pages, 13 figures

arXiv:2308.04163 [pdf, other]

Under-Display Camera Image Restoration with Scattering Effect

Authors: Binbin Song, Xiangyu Chen, Shuning Xu, Jiantao Zhou

Abstract: The under-display camera (UDC) provides consumers with a full-screen visual experience without any obstruction due to notches or punched holes. However, the semi-transparent nature of the display inevitably introduces the severe degradation into UDC images. In this work, we address the UDC image restoration problem with the specific consideration of the scattering effect caused by the display. We… ▽ More The under-display camera (UDC) provides consumers with a full-screen visual experience without any obstruction due to notches or punched holes. However, the semi-transparent nature of the display inevitably introduces the severe degradation into UDC images. In this work, we address the UDC image restoration problem with the specific consideration of the scattering effect caused by the display. We explicitly model the scattering effect by treating the display as a piece of homogeneous scattering medium. With the physical model of the scattering effect, we improve the image formation pipeline for the image synthesis to construct a realistic UDC dataset with ground truths. To suppress the scattering effect for the eventual UDC image recovery, a two-branch restoration network is designed. More specifically, the scattering branch leverages global modeling capabilities of the channel-wise self-attention to estimate parameters of the scattering effect from degraded images. While the image branch exploits the local representation advantage of CNN to recover clear scenes, implicitly guided by the scattering branch. Extensive experiments are conducted on both real-world and synthesized data, demonstrating the superiority of the proposed method over the state-of-the-art UDC restoration techniques. The source code and dataset are available at \url{https://github.com/NamecantbeNULL/SRUDC}. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV2023

arXiv:2308.01317 [pdf]

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Atilla Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden , et al. (3 additional authors not shown)

Abstract: In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach… ▽ More In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI. △ Less

Submitted 7 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.12027 [pdf, other]

On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement

Authors: Xin Luo, Yunan Zhu, Shunxin Xu, Dong Liu

Abstract: Several recent studies advocate the use of spectral discriminators, which evaluate the Fourier spectra of images for generative modeling. However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to s… ▽ More Several recent studies advocate the use of spectral discriminators, which evaluate the Fourier spectra of images for generative modeling. However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to spectral changes. Our analyses reveal that the spectral discriminator indeed performs better than the ordinary (a.k.a. spatial) discriminator in identifying the differences in the high-frequency range; however, the spatial discriminator holds an advantage in the low-frequency range. Thus, we suggest that the spectral and spatial discriminators shall be used simultaneously. Moreover, we improve the spectral discriminators by first calculating the patch-wise Fourier spectrum and then aggregating the spectra by Transformer. We verify the effectiveness of the proposed method twofold. On the one hand, thanks to the additional spectral discriminator, our obtained SR images have their spectra better aligned to those of the real images, which leads to a better PD tradeoff. On the other hand, our ensembled discriminator predicts the perceptual quality more accurately, as evidenced in the no-reference image quality assessment task. △ Less

Submitted 16 August, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: Accepted to ICCV 2023. Code and Models are publicly available at https://github.com/Luciennnnnnn/DualFormer

arXiv:2307.04827 [pdf, other]

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

Authors: Siting Xu, Yunlong Tang, Feng Zheng

Abstract: Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the la… ▽ More Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/. △ Less

Submitted 23 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted by International Computer Music Conference (ICMC) 2023

arXiv:2307.00307 [pdf, other]

Spatio-Temporal Classification of Lung Ventilation Patterns using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

Authors: Shuzhe Chen, Li Li, Zhichao Lin, Ke Zhang, Ying Gong, Lu Wang, Xu Wu, Maokun Li, Yuanlin Song, Fan Yang, Shenheng Xu

Abstract: The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventila… ▽ More The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder network with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then concatenated to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data collected from 137 subjects were finally used for training. The model is validated by ten-fold and leave-one-out cross-validation first. The accuracy and sensitivity of normal ventilation mode are 0.95 and 1.00, and the f1-score is 0.94. Furthermore, we check the reliability and feasibility of the proposed pipeline by testing it on newly recruited nine subjects. Our results show that the pipeline correctly predicts the ventilation mode of 8 out of 9 subjects. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.17797 [pdf, other]

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

Authors: Li Pang, Weizhen Gu, Xiangyong Cao, Xiangyu Rui, Jiangjun Peng, Shuang Xu, Gang Yang, Deyu Meng

Abstract: Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, current deep learning-based approaches ignore this fact and restore the clean image with deterministic mapping (i.e., the network receives a noisy HSI and outputs a clean HSI). To alleviate this issue, this paper proposes a flow-based HSI denoising network (HIDFlowNet)… ▽ More Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, current deep learning-based approaches ignore this fact and restore the clean image with deterministic mapping (i.e., the network receives a noisy HSI and outputs a clean HSI). To alleviate this issue, this paper proposes a flow-based HSI denoising network (HIDFlowNet) to directly learn the conditional distribution of the clean HSI given the noisy HSI and thus diverse clean HSIs can be sampled from the conditional distribution. Overall, our HIDFlowNet is induced from the flow methodology and contains an invertible decoder and a conditional encoder, which can fully decouple the learning of low-frequency and high-frequency information of HSI. Specifically, the invertible decoder is built by staking a succession of invertible conditional blocks (ICBs) to capture the local high-frequency details since the invertible network is information-lossless. The conditional encoder utilizes down-sampling operations to obtain low-resolution images and uses transformers to capture correlations over a long distance so that global low-frequency information can be effectively extracted. Extensive experimental results on simulated and real HSI datasets verify the superiority of our proposed HIDFlowNet compared with other state-of-the-art methods both quantitatively and visually. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 10 pages, 8 figures

arXiv:2306.10146 [pdf, other]

Multi-task 3D building understanding with multi-modal pretraining

Authors: Shicheng Xu

Abstract: This paper explores various learning strategies for 3D building type classification and part segmentation on the BuildingNet dataset. ULIP with PointNeXt and PointNeXt segmentation are extended for the classification and segmentation task on BuildingNet dataset. The best multi-task PointNeXt-s model with multi-modal pretraining achieves 59.36 overall accuracy for 3D building type classification, a… ▽ More This paper explores various learning strategies for 3D building type classification and part segmentation on the BuildingNet dataset. ULIP with PointNeXt and PointNeXt segmentation are extended for the classification and segmentation task on BuildingNet dataset. The best multi-task PointNeXt-s model with multi-modal pretraining achieves 59.36 overall accuracy for 3D building type classification, and 31.68 PartIoU for 3D building part segmentation on validation split. The final PointNeXt XL model achieves 31.33 PartIoU and 22.78 ShapeIoU on test split for BuildingNet-Points segmentation, which significantly improved over PointNet++ model reported from BuildingNet paper, and it won the 1st place in the BuildingNet challenge at CVPR23 StruCo3D workshop. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 8 pages, 9 figures, 9 tables

arXiv:2306.04242 [pdf, other]

4D Millimeter-Wave Radar in Autonomous Driving: A Survey

Authors: Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

Abstract: The 4D millimeter-wave (mmWave) radar, proficient in measuring the range, azimuth, elevation, and velocity of targets, has attracted considerable interest within the autonomous driving community. This is attributed to its robustness in extreme environments and the velocity and elevation measurement capabilities. However, despite the rapid advancement in research related to its sensing theory and a… ▽ More The 4D millimeter-wave (mmWave) radar, proficient in measuring the range, azimuth, elevation, and velocity of targets, has attracted considerable interest within the autonomous driving community. This is attributed to its robustness in extreme environments and the velocity and elevation measurement capabilities. However, despite the rapid advancement in research related to its sensing theory and application, there is a conspicuous absence of comprehensive surveys on the subject of 4D mmWave radar. In an effort to bridge this gap and stimulate future research, this paper presents an exhaustive survey on the utilization of 4D mmWave radar in autonomous driving. Initially, the paper provides reviews on the theoretical background and progress of 4D mmWave radars, encompassing aspects such as the signal processing workflow, resolution improvement approaches, and extrinsic calibration process. Learning-based radar data quality improvement methods are present following. Then, this paper introduces relevant datasets and application algorithms in autonomous driving perception, localization and mapping tasks. Finally, this paper concludes by forecasting future trends in the realm of 4D mmWave radar in autonomous driving. To the best of our knowledge, this is the first survey specifically dedicated to the 4D mmWave radar in autonomous driving. △ Less

Submitted 26 April, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.00279 [pdf, other]

doi 10.1109/TCNS.2023.3281555

Dynamic quantized consensus under DoS attacks: Towards a tight zooming-out factor

Authors: Shuai Feng, Maopeng Ran, Hideaki Ishii, Shengyuan Xu

Abstract: This paper deals with dynamic quantized consensus of dynamical agents in a general form under packet losses induced by Denial-of-Service (DoS) attacks. The communication channel has limited bandwidth and hence the transmitted signals over the network are subject to quantization. To deal with agent's output, an observer is implemented at each node. The state of the observer is quantized by a finite… ▽ More This paper deals with dynamic quantized consensus of dynamical agents in a general form under packet losses induced by Denial-of-Service (DoS) attacks. The communication channel has limited bandwidth and hence the transmitted signals over the network are subject to quantization. To deal with agent's output, an observer is implemented at each node. The state of the observer is quantized by a finite-level quantizer and then transmitted over the network. To solve the problem of quantizer overflow under malicious packet losses, a zooming-in and out dynamic quantization mechanism is designed. By the new quantized controller proposed in the paper, the zooming-out factor is lower bounded by the spectral radius of the agent's dynamic matrix. A sufficient condition of quantization range is provided under which the finite-level quantizer is free of overflow. A sufficient condition of tolerable DoS attacks for achieving consensus is also provided. At last, we study scalar dynamical agents as a special case and further tighten the zooming-out factor to a value smaller than the agent's dynamic parameter. Under such a zooming-out factor, it is possible to recover the level of tolerable DoS attacks to that of unquantized consensus, and the quantizer is free of overflow. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.14694 [pdf, other]

Analysis of Contagion Dynamics with Active Cyber Defenders

Authors: Keith Paarporn, Shouhuai Xu

Abstract: In this paper, we analyze the infection spreading dynamics of malware in a population of cyber nodes (i.e., computers or devices). Unlike most prior studies where nodes are reactive to infections, in our setting some nodes are active defenders meaning that they are able to clean up malware infections of their neighboring nodes, much like how spreading malware exploits the network connectivity prop… ▽ More In this paper, we analyze the infection spreading dynamics of malware in a population of cyber nodes (i.e., computers or devices). Unlike most prior studies where nodes are reactive to infections, in our setting some nodes are active defenders meaning that they are able to clean up malware infections of their neighboring nodes, much like how spreading malware exploits the network connectivity properties in order to propagate. We formulate these dynamics as an Active Susceptible-Infected-Susceptible (A-SIS) compartmental model of contagion. We completely characterize the system's asymptotic behavior by establishing conditions for the global asymptotic stability of the infection-free equilibrium and for an endemic equilibrium state. We show that the presence of active defenders counter-acts infectious spreading, effectively increasing the epidemic threshold on parameters for which an endemic state prevails. Leveraging this characterization, we investigate a general class of problems for finding optimal investments in active cyber defense capabilities given limited resources. We show that this class of problems has unique solutions under mild assumptions. We then analyze an Active Susceptible-Infected-Recovered (A-SIR) compartmental model, where the peak infection level of any trajectory is explicitly derived. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 3 figures

arXiv:2305.05085 [pdf, other]

doi 10.1117/1.AP.6.2.026004

Tensorial tomographic Fourier Ptychography with applications to muscle tissue imaging

Authors: Shiqi Xu, Xiang Dai, Paul Ritter, Kyung Chul Lee, Xi Yang, Lucas Kreiss, Kevin C. Zhou, Kanghyun Kim, Amey Chaware, Jadee Neff, Carolyn Glass, Seung Ah Lee, Oliver Friedrich, Roarke Horstmeyer

Abstract: We report Tensorial tomographic Fourier Ptychography (ToFu), a new non-scanning label-free tomographic microscopy method for simultaneous imaging of quantitative phase and anisotropic specimen information in 3D. Built upon Fourier Ptychography, a quantitative phase imaging technique, ToFu additionally highlights the vectorial nature of light. The imaging setup consists of a standard microscope equ… ▽ More We report Tensorial tomographic Fourier Ptychography (ToFu), a new non-scanning label-free tomographic microscopy method for simultaneous imaging of quantitative phase and anisotropic specimen information in 3D. Built upon Fourier Ptychography, a quantitative phase imaging technique, ToFu additionally highlights the vectorial nature of light. The imaging setup consists of a standard microscope equipped with an LED matrix, a polarization generator, and a polarization-sensitive camera. Permittivity tensors of anisotropic samples are computationally recovered from polarized intensity measurements across three dimensions. We demonstrate ToFu's efficiency through volumetric reconstructions of refractive index, birefringence, and orientation for various validation samples, as well as tissue samples from muscle fibers and diseased heart tissue. Our reconstructions of muscle fibers resolve their 3D fine-filament structure and yield consistent morphological measurements compared to gold-standard second harmonic generation scanning confocal microscope images found in the literature. Additionally, we demonstrate reconstructions of a heart tissue sample that carries important polarization information for detecting cardiac amyloidosis. △ Less

Submitted 13 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Journal ref: Tensorial tomographic Fourier Ptychography with applications to muscle tissue imaging, Adv. Photon. 6(2), 026004 (2024)

Showing 1–50 of 180 results for author: Xu, S