-
BreakNet: Discontinuity-Resilient Multi-Scale Transformer Segmentation of Retinal Layers
Authors:
Razieh Ganjee,
Bingjie Wang,
Lingyun Wang,
Chengcheng Zhao,
José-Alain Sahel,
Shaohua Pi
Abstract:
Visible light optical coherence tomography (vis-OCT) is gaining traction for retinal imaging due to its high resolution and functional capabilities. However, the significant absorption of hemoglobin in the visible light range leads to pronounced shadow artifacts from retinal blood vessels, posing challenges for accurate layer segmentation. In this study, we present BreakNet, a multi-scale Transfor…
▽ More
Visible light optical coherence tomography (vis-OCT) is gaining traction for retinal imaging due to its high resolution and functional capabilities. However, the significant absorption of hemoglobin in the visible light range leads to pronounced shadow artifacts from retinal blood vessels, posing challenges for accurate layer segmentation. In this study, we present BreakNet, a multi-scale Transformer-based segmentation model designed to address boundary discontinuities caused by these shadow artifacts. BreakNet utilizes hierarchical Transformer and convolutional blocks to extract multi-scale global and local feature maps, capturing essential contextual, textural, and edge characteristics. The model incorporates decoder blocks that expand pathwaproys to enhance the extraction of fine details and semantic information, ensuring precise segmentation. Evaluated on rodent retinal images acquired with prototype vis-OCT, BreakNet demonstrated superior performance over state-of-the-art segmentation models, such as TCCT-BP and U-Net, even when faced with limited-quality ground truth data. Our findings indicate that BreakNet has the potential to significantly improve retinal quantification and analysis.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge
Authors:
Jun Ma,
Yao Zhang,
Song Gu,
Cheng Ge,
Ershuai Wang,
Qin Zhou,
Ziyan Huang,
Pengju Lyu,
Jian He,
Bo Wang
Abstract:
Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar…
▽ More
Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a large-scale and diverse dataset, including 4650 CT scans with various cancer types from over 40 medical centers. The winning team established a new state-of-the-art with a deep learning-based cascaded framework, achieving average Dice Similarity Coefficient scores of 92.3% for organs and 64.9% for lesions on the hidden multi-national testing set. The dataset and code of top teams are publicly available, offering a benchmark platform to drive further innovations https://codalab.lisn.upsaclay.fr/competitions/12239.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Safety-Critical Stabilization of Force-Controlled Nonholonomic Robots
Authors:
Tianyu Han,
Bo Wang
Abstract:
We present a safety-critical controller for the problem of stabilization for force-controlled nonholonomic autonomous vehicles. The proposed control law is based on the constructions of control Lyapunov functions (CLFs) and control barrier functions (CBFs) for cascaded systems. To address nonholonomicity, we design the nominal controller that guarantees global asymptotic stability and local expone…
▽ More
We present a safety-critical controller for the problem of stabilization for force-controlled nonholonomic autonomous vehicles. The proposed control law is based on the constructions of control Lyapunov functions (CLFs) and control barrier functions (CBFs) for cascaded systems. To address nonholonomicity, we design the nominal controller that guarantees global asymptotic stability and local exponential stability for the closed-loop system in polar coordinates and construct a strict Lyapunov function valid on any compact sets. Furthermore, we present a procedure for constructing CBFs for cascaded systems, utilizing the CBF of the kinematic model through integrator backstepping. Quadratic programming is employed to combine CLFs and CBFs to integrate both stability and safety in the closed loop. The proposed control law is time-invariant, continuous along trajectories, and easy to implement. Our main results guarantee both safety and local asymptotic stability for the closed-loop system.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Content-decoupled Contrastive Learning-based Implicit Degradation Modeling for Blind Image Super-Resolution
Authors:
Jiang Yuan,
Ji Ma,
Bo Wang,
Weiming Hu
Abstract:
Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoup…
▽ More
Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoupled Contrastive Learning-based blind image super-resolution (CdCL) framework following the typical blind SR pipeline. This framework introduces negative-free contrastive learning technique for the first time to model the implicit degradation representation, in which a new cyclic shift sampling strategy is designed to ensure decoupling between content features and degradation features from the data perspective, thereby improving the purity and discriminability of the learned implicit degradation space. In addition, to improve the efficiency and effectiveness of implicit degradation-based blind super-resolving, we design a detail-aware implicit degradation adaption module with lower complexity, which adapts degradation information to the specific LR image from both channel and spatial perspectives. Extensive experiments on synthetic and real data prove that the proposed CdCL comprehensively improves the quantitative and qualitative results of contrastive learning-based implicit blind SR paradigm, and achieves SOTA PSNR in this field. Even if the number of parameters is halved, our method still achieves very competitive results.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
CSI-Free Position Optimization for Movable Antenna Communication Systems: A Black-Box Optimization Approach
Authors:
Xianlong Zeng,
Jun Fang,
Bin Wang,
Boyu Ning,
Hongbin Li
Abstract:
Movable antenna (MA) is a new technology which leverages local movement of antennas to improve channel qualities and enhance the communication performance. Nevertheless, to fully realize the potential of MA systems, complete channel state information (CSI) between the transmitter-MA and the receiver-MA is required, which involves estimating a large number of channel parameters and incurs an excess…
▽ More
Movable antenna (MA) is a new technology which leverages local movement of antennas to improve channel qualities and enhance the communication performance. Nevertheless, to fully realize the potential of MA systems, complete channel state information (CSI) between the transmitter-MA and the receiver-MA is required, which involves estimating a large number of channel parameters and incurs an excessive amount of training overhead. To address this challenge, in this paper, we propose a CSI-free MA position optimization method. The basic idea is to treat position optimization as a black-box optimization problem and calculate the gradient of the unknown objective function using zeroth-order (ZO) gradient approximation techniques. Simulation results show that the proposed ZO-based method, through adaptively adjusting the position of the MA, can achieve a favorable signal-to-noise-ratio (SNR) using a smaller number of position measurements than the CSI-based approach. Such a merit makes the proposed algorithm more adaptable to fast-changing propagation channels.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Authors:
Pengcheng Chen,
Jin Ye,
Guoan Wang,
Yanjun Li,
Zhongying Deng,
Wei Li,
Tianbin Li,
Haodong Duan,
Ziyan Huang,
Yanzhou Su,
Benyou Wang,
Shaoting Zhang,
Bin Fu,
Jianfei Cai,
Bohan Zhuang,
Eric J Seibel,
Junjun He,
Yu Qiao
Abstract:
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren…
▽ More
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Current benchmarks are often built upon specific academic literature, mainly focusing on a single domain, and lacking varying perceptual granularities. Thus, they face specific challenges, including limited clinical relevance, incomplete evaluations, and insufficient guidance for interactive LVLMs. To address these limitations, we developed the GMAI-MMBench, the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 285 datasets across 39 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format. Additionally, we implemented a lexical tree structure that allows users to customize evaluation tasks, accommodating various assessment needs and substantially supporting medical AI research and applications. We evaluated 50 LVLMs, and the results show that even the advanced GPT-4o only achieves an accuracy of 52%, indicating significant room for improvement. Moreover, we identified five key insufficiencies in current cutting-edge LVLMs that need to be addressed to advance the development of better medical applications. We believe that GMAI-MMBench will stimulate the community to build the next generation of LVLMs toward GMAI.
△ Less
Submitted 9 August, 2024; v1 submitted 6 August, 2024;
originally announced August 2024.
-
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Authors:
Jun Ma,
Sumin Kim,
Feifei Li,
Mohammed Baharoon,
Reza Asakereh,
Hongwei Lyu,
Bo Wang
Abstract:
Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing…
▽ More
Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing it to SAM1 and MedSAM. Then, we develop a transfer learning pipeline and demonstrate SAM2 can be quickly adapted to medical domain by fine-tuning. Furthermore, we implement SAM2 as a 3D slicer plugin and Gradio API for efficient 3D image and video segmentation. The code has been made publicly available at \url{https://github.com/bowang-lab/MedSAM}.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Illumination Design for Joint Imaging and Wireless Power Transfer Systems
Authors:
Qianyu Yang,
Haiyang Zhang,
Chunguo Li,
Ruiqi Liu,
Baoyun Wang
Abstract:
This paper presents a novel concept termed Integrated Imaging and Wireless Power Transfer (IWPT), wherein the integration of imaging and wireless power transfer functionalities is achieved on a unified hardware platform. IWPT leverages a transmitting array to efficiently illuminate a specific Region of Interest (ROI), enabling the extraction of ROI's scattering coefficients while concurrently prov…
▽ More
This paper presents a novel concept termed Integrated Imaging and Wireless Power Transfer (IWPT), wherein the integration of imaging and wireless power transfer functionalities is achieved on a unified hardware platform. IWPT leverages a transmitting array to efficiently illuminate a specific Region of Interest (ROI), enabling the extraction of ROI's scattering coefficients while concurrently providing wireless power to nearby users. The integration of IWPT offers compelling advantages, including notable reductions in power consumption and spectrum utilization, pivotal for the optimization of future 6G wireless networks. As an initial investigation, we explore two antenna architectures: a fully digital array and a digital/analog hybrid array. Our goal is to characterize the fundamental trade-off between imaging and wireless power transfer by optimizing the illumination signal. With imaging operating in the near-field, we formulate the illumination signal design as an optimization problem that minimizes the condition number of the equivalent channel. To address this optimization problem, we propose an semi-definite relaxation-based approach for the fully digital array and an alternating optimization algorithm for the hybrid array. Finally, numerical results verify the effectiveness of our proposed solutions and demonstrate the trade-off between imaging and wireless power transfer.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Beam Focusing for Near-Field Multi-User Localization
Authors:
Qianyu Yang,
Anna Guerra,
Francesco Guidi,
Nir Shlezinger,
Haiyang Zhang,
Davide Dardari,
Baoyun Wang,
Yonina C. Eldar
Abstract:
Extremely large-scale antenna arrays are poised to play a pivotal role in sixth-generation (6G) networks. Utilizing such arrays often results in a near-field spherical wave transmission environment, enabling the generation of focused beams, which introduces new degrees of freedom for wireless localization. In this paper, we consider a beam-focusing design for localizing multiple sources in the rad…
▽ More
Extremely large-scale antenna arrays are poised to play a pivotal role in sixth-generation (6G) networks. Utilizing such arrays often results in a near-field spherical wave transmission environment, enabling the generation of focused beams, which introduces new degrees of freedom for wireless localization. In this paper, we consider a beam-focusing design for localizing multiple sources in the radiating near-field. Our formulation accommodates various expected types of implementations of large antenna arrays, including hybrid analog/digital architectures and dynamic metasurface antennas (DMAs). We consider a direct localization estimation method exploiting curvature-of-arrival of impinging spherical wavefront to obtain user positions. In this regard, we adopt a two-stage approach configuring the array to optimize near-field positioning. In the first step, we focus only on adjusting the array coefficients to minimize the estimation error. We obtain a closed-form approximate solution based on projection and the better one based on the Riemann gradient algorithm. We then extend this approach to simultaneously localize and focus the beams via a sub-optimal iterative approach that does not rely on such knowledge. The simulation results show that near-field localization accuracy based on a hybrid array or DMA can achieve performance close to that of fully digital arrays at a lower cost, and DMAs can attain better performance than hybrid solutions with the same aperture.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues
Authors:
Tianrui Pan,
Jie Liu,
Bohan Wang,
Jie Tang,
Gangshan Wu
Abstract:
While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts ac…
▽ More
While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts across various segments of the separated speech. In this study, we propose a simultaneous multi-speaker separation framework that can facilitate the concurrent separation of multiple speakers within a singular process. We introduce speaker-wise interactions to establish distinctions and correlations among speakers. Experimental results on the VoxCeleb2 and LRS3 datasets demonstrate that our method achieves state-of-the-art performance in separating mixtures with 2, 3, 4, and 5 speakers, respectively. Additionally, our model can utilize speakers with complete audio-visual information to mitigate other visual-deficient speakers, thereby enhancing its resilience to missing visual cues. We also conduct experiments where visual information for specific speakers is entirely absent or visual frames are partially missing. The results demonstrate that our model consistently outperforms others, exhibiting the smallest performance drop across all settings involving 2, 3, 4, and 5 speakers.
△ Less
Submitted 29 July, 2024; v1 submitted 27 July, 2024;
originally announced July 2024.
-
Exploiting Target Location Distribution in MIMO Radar: PCRB vs. PSBP for Waveform Design
Authors:
Lingyun Xu,
Bowen Wang,
Huiyong Li,
Ziyang Cheng
Abstract:
This paper investigates the issue of how to exploit target location distribution for multiple input multiple output (MIMO) radar waveform design. We consider a MIMO radar aiming to estimate the unknown and random angular location parameters of a point target, whose distribution information can be exploited by the radar. First, we establish the models of the MIMO radar system and the target locatio…
▽ More
This paper investigates the issue of how to exploit target location distribution for multiple input multiple output (MIMO) radar waveform design. We consider a MIMO radar aiming to estimate the unknown and random angular location parameters of a point target, whose distribution information can be exploited by the radar. First, we establish the models of the MIMO radar system and the target location distribution. Based on the considered models, we propose the first category of target location distribution exploitation methods by analyzing the radar direction-of-angle (DoA) estimation performance and deriving a general form of posterior Cramer-Rao bound (PCRB) as the lower bound of the mean square error of DoA estimation. Following this, to explore more insights, we proposed the second category of target location distribution exploitation methods by introducing a novel radar metric, probability scaled beampattern (PSBP), from the perspective of radar beampattern. To compare the two methods, we formulate the PCRB and PSBP oriented radar waveform design problems and propose corresponding low-complexity and convergence-guaranteed algorithms to tackle them. Finally, numerical simulations are conducted in different scenarios to provide a comprehensive evaluation and comparison of the radar performance.
△ Less
Submitted 8 August, 2024; v1 submitted 27 July, 2024;
originally announced July 2024.
-
Leader-Follower Formation and Tracking Control of Underactuated Surface Vessels
Authors:
Bo Wang,
Antonio Loria
Abstract:
This paper presents a simple control approach for global trajectory tracking and formation control of underactuated surface vessels equipped with only two propellers. The control approach exploits the inherent cascaded structure of the vehicle dynamics and is divided into control designs at the kinematics level and the kinetics level. A controller with a low-gain feature is designed at the kinemat…
▽ More
This paper presents a simple control approach for global trajectory tracking and formation control of underactuated surface vessels equipped with only two propellers. The control approach exploits the inherent cascaded structure of the vehicle dynamics and is divided into control designs at the kinematics level and the kinetics level. A controller with a low-gain feature is designed at the kinematics level by incorporating the cascaded system method, persistency of excitation, and the small-gain theorem. Furthermore, a PD+ controller is designed to achieve the velocity tracking at the kinetics level. The proposed control laws are partially linear and saturated linear and easy to implement. Based on a leader-follower scheme, our control approach applies to the formation tracking control problem of multi-vehicle systems under a directed spanning tree topology. Our main results guarantee uniform global asymptotic stability for the closed-loop system, which implies robustness with respect to perturbations.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Cooperative Integrated Sensing and Communication Networks: Analysis and Distributed Design
Authors:
Bowen Wang,
Hongyu Li,
Fan Liu,
Ziyang Cheng,
Shanpu Shen
Abstract:
This paper proposes a cooperative integrated sensing and communication network (Co-ISACNet) adopting hybrid beamforming (HBF) architecture, which improves both radar sensing and communication performance. The main contributions of this work are four-fold. First, we introduce a novel cooperative sensing method for the considered Co-ISACNet, followed by a comprehensive analysis of this method. This…
▽ More
This paper proposes a cooperative integrated sensing and communication network (Co-ISACNet) adopting hybrid beamforming (HBF) architecture, which improves both radar sensing and communication performance. The main contributions of this work are four-fold. First, we introduce a novel cooperative sensing method for the considered Co-ISACNet, followed by a comprehensive analysis of this method. This analysis mathematically verifies the benefits of Co-ISACNet and provides insightful design guidelines. Second, to show the benefits of Co-ISACNet, we propose to jointly design the HBF to maximize the network communication capacity while satisfying the constraint of beampattern similarity for radar sensing, which results in a highly dimensional and non-convex problem. Third, to facilitate the joint design, we propose a novel distributed optimization framework based on proximal gradient and alternating direction method of multipliers, namely PANDA. Fourth, we further adopt the proposed PANDA framework to solve the joint HBF design problem for the Co-ISACNet. By using the proposed PANDA framework, all access points (APs) optimize the HBF in parallel, where each AP only requires local channel state information and limited message exchange among the APs. Such framework reduces significantly the computational complexity and thus has pronounced benefits in practical scenarios. Simulation results verify the effectiveness of the proposed algorithm compared with the conventional centralized algorithm and show the remarkable performance improvement of radar sensing and communication by deploying Co-ISACNet.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems
Authors:
Jianzhu Huai,
Binliang Wang,
Yuan Zhuang,
Yiwen Chen,
Qipeng Li,
Yulong Han,
Charles Toth
Abstract:
4D radars are increasingly favored for odometry and mapping of autonomous systems due to their robustness in harsh weather and dynamic environments. Existing datasets, however, often cover limited areas and are typically captured using a single platform. To address this gap, we present a diverse large-scale dataset specifically designed for 4D radar-based localization and mapping. This dataset was…
▽ More
4D radars are increasingly favored for odometry and mapping of autonomous systems due to their robustness in harsh weather and dynamic environments. Existing datasets, however, often cover limited areas and are typically captured using a single platform. To address this gap, we present a diverse large-scale dataset specifically designed for 4D radar-based localization and mapping. This dataset was gathered using three different platforms: a handheld device, an e-bike, and an SUV, under a variety of environmental conditions, including clear days, nighttime, and heavy rain. The data collection occurred from September 2023 to February 2024, encompassing diverse settings such as roads in a vegetated campus and tunnels on highways. Each route was traversed multiple times to facilitate place recognition evaluations. The sensor suite included a 3D lidar, 4D radars, stereo cameras, consumer-grade IMUs, and a GNSS/INS system. Sensor data packets were synchronized to GNSS time using a two-step process: a convex hull algorithm was applied to smooth host time jitter, and then odometry and correlation algorithms were used to correct constant time offsets. Extrinsic calibration between sensors was achieved through manual measurements and subsequent nonlinear optimization. The reference motion for the platforms was generated by registering lidar scans to a terrestrial laser scanner (TLS) point cloud map using a lidar inertial odometry (LIO) method in localization mode. Additionally, a data reversion technique was introduced to enable backward LIO processing. We believe this dataset will boost research in radar-based point cloud registration, odometry, mapping, and place recognition.
△ Less
Submitted 22 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Enhancing super-resolution ultrasound localisation through multi-frame deconvolution exploiting spatiotemporal coherence
Authors:
Su Yan,
Clotilde Vié,
Marcelo Lerendegui,
Herman Verinaz-Jadan,
Jipeng Yan,
Martina Tashkova,
James Burn,
Bingxue Wang,
Gary Frost,
Kevin G. Murphy,
Meng-Xing Tang
Abstract:
Super-resolution ultrasound imaging through microbubble (MB) localisation and tracking, also known as ultrasound localisation microscopy, allows non-invasive sub-diffraction resolution imaging of microvasculature in animals and humans. The number of MBs localised from the acquired contrast-enhanced ultrasound (CEUS) images and the localisation precision directly influence the quality of the result…
▽ More
Super-resolution ultrasound imaging through microbubble (MB) localisation and tracking, also known as ultrasound localisation microscopy, allows non-invasive sub-diffraction resolution imaging of microvasculature in animals and humans. The number of MBs localised from the acquired contrast-enhanced ultrasound (CEUS) images and the localisation precision directly influence the quality of the resulting super-resolution microvasculature images. However, non-negligible noise present in the CEUS images can make localising MBs challenging. To enhance the MB localisation performance, we propose a Multi-Frame Deconvolution (MF-Decon) framework that can exploit the spatiotemporal coherence inherent in the CEUS data, with new spatial and temporal regularisers designed based on total variation (TV) and regularisation by denoising (RED). Based on the MF-Decon framework, we introduce two novel methods: MF-Decon with spatial and temporal TVs (MF-Decon+3DTV) and MF-Decon with spatial RED and temporal TV (MF-Decon+RED+TV). Results from in silico simulations indicate that our methods outperform two widely used methods using deconvolution or normalised cross-correlation across all evaluation metrics, including precision, recall, $F_1$ score, mean and standard localisation errors. In particular, our methods improve MB localisation precision by up to 39% and recall by up to 12%. Super-resolution microvasculature maps generated with our methods on a publicly available in vivo rat brain dataset show less noise, better contrast, higher resolution and more vessel structures.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Classification of Power Quality Disturbances Using Resnet with Channel Attention Mechanism
Authors:
Su Pan,
Xingyang Nie,
Xiaoyu Zhai,
Biao Wang,
Huilin Ge,
Cheng He,
Zhenping Ding
Abstract:
The detection and classification of power quality disturbances (PQDs) carries significant importance for power systems. In response to this imperative, numerous intelligent diagnostic methods have been developed. However, existing identification methods usually concentrate on single-type signals or on complex signals with two types, rendering them susceptible to noisy labels and environmental effe…
▽ More
The detection and classification of power quality disturbances (PQDs) carries significant importance for power systems. In response to this imperative, numerous intelligent diagnostic methods have been developed. However, existing identification methods usually concentrate on single-type signals or on complex signals with two types, rendering them susceptible to noisy labels and environmental effects. This study proposes a novel method for the classification of PQDs, termed ST-GSResNet, which utilizes the S-Transform and an improved residual neural network (ResNet) with a channel attention mechanism. The ST-GSResNet approach initially uses the S-Transform to transform a time-series signal into a 2D time-frequency image for feature enhancement. Then, an improved ResNet model is introduced, which employs grouped convolution instead of the traditional convolution operation. This improvement aims to facilitate learning with a block-diagonal structured sparsity on the channel dimension, the highly-correlated filters are learned in a more structured way in the networks with filter groups. By reducing the number of parameters in the network in this significant manner, the model becomes less prone to overfitting. Furthermore, the SE module concentrates on primary components, which enhances the model's robustness in recognition and immunity to noise. Experimental results demonstrate that, compared to existing deep learning models, our approach has advantages in computational efficiency and classification accuracy.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
Jingping Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo Jin,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset
Authors:
Muwei Jian,
Haoran Zhang,
Mingju Shao,
Hongyu Chen,
Huihui Huang,
Yanjie Zhong,
Changlei Zhang,
Bin Wang,
Penghui Gao
Abstract:
Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im…
▽ More
Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in imaging data from various captured periods of lung cancer. If the evolution patterns of nodules across various periods in the patients' CT sequences can be explored, it will play a crucial role in guiding the precise screening identification of lung cancer. Therefore, a cross spatio-temporal lung nodule dataset with pathological information for nodule identification and diagnosis is constructed, which contains 328 CT sequences and 362 annotated nodules from 109 patients. This comprehensive database is intended to drive research in the field of CAD towards more practical and robust methods, and also contribute to the further exploration of precision medicine related field. To ensure patient confidentiality, we have removed sensitive information from the dataset.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
AudioBench: A Universal Benchmark for Audio Large Language Models
Authors:
Bin Wang,
Xunlong Zou,
Geyu Lin,
Shuo Sun,
Zhuohan Liu,
Wenyu Zhang,
Zhengyuan Liu,
AiTi Aw,
Nancy F. Chen
Abstract:
We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co…
▽ More
We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in comprehensive benchmarks for thoroughly evaluating their capabilities. AudioBench addresses this gap by providing relevant datasets and evaluation metrics. In our study, we evaluated the capabilities of four models across various aspects and found that no single model excels consistently across all tasks. We outline the research outlook for AudioLLMs and anticipate that our open-source code, data, and leaderboard will offer a robust testbed for future model developments.
△ Less
Submitted 25 June, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device
Authors:
Fulan Li,
Yunfei Guo,
Wenda Xu,
Weide Zhang,
Fangyun Zhao,
Baiyu Wang,
Huaguang Du,
Chengkun Zhang
Abstract:
This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit…
▽ More
This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorithm (IEVC) highlighted for its state-of-the-art accuracy, stability, efficiency and generalizability in motion restriction control. Second, an Admittance Virtual Dynamics simulation algorithm that achieves a smooth and natural human interaction with the non-backdrivable end-effector. Third, a generalized impedance force calculation algorithm allowing efficient impedance control on any trajectory or area boundary. Experimental validation demonstrated the system's effectiveness in accurate end-effector position control across various trajectories and configurations. The proposed upper limb end-effector-based rehabilitation device, with its high performance and adaptability, holds significant promise for extensive clinical application, potentially improving rehabilitation outcomes for stroke patients.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Authors:
Jizhong Liu,
Gang Li,
Junbo Zhang,
Heinrich Dinkel,
Yongqing Wang,
Zhiyong Yan,
Yujun Wang,
Bin Wang
Abstract:
Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED)…
▽ More
Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED) is used to improve the effectivity of acoustic tokens, with a querying transformer (Q-Former) bridging the modality gap to LLM and compress acoustic tokens; 2) we investigate the advantages of using a Llama 2 with 7B parameters as the decoder; 3) another pre-trained LLM corrects text errors caused by insufficient training data and annotation ambiguities. Both the audio encoder and text decoder are optimized by low-rank adaptation (LoRA). Experiments show that each of these enhancements is effective. Our method obtains a 33.0 SPIDEr-FL score, outperforming the winner of DCASE 2023 Task 6A.
△ Less
Submitted 25 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs
Authors:
Min Hua,
Dong Chen,
Kun Jiang,
Fanggang Zhang,
Jinhai Wang,
Bo Wang,
Quan Zhou,
Hongming Xu
Abstract:
Cooperative adaptive cruise control (CACC) has been recognized as a fundamental function of autonomous driving, in which platoon stability and energy efficiency are outstanding challenges that are difficult to accommodate in real-world operations. This paper studied the CACC of connected and autonomous vehicles (CAVs) based on the multi-agent reinforcement learning algorithm (MARL) to optimize pla…
▽ More
Cooperative adaptive cruise control (CACC) has been recognized as a fundamental function of autonomous driving, in which platoon stability and energy efficiency are outstanding challenges that are difficult to accommodate in real-world operations. This paper studied the CACC of connected and autonomous vehicles (CAVs) based on the multi-agent reinforcement learning algorithm (MARL) to optimize platoon stability and energy efficiency simultaneously. The optimal use of communication bandwidth is the key to guaranteeing learning performance in real-world driving, and thus this paper proposes a communication-efficient MARL by incorporating the quantified stochastic gradient descent (QSGD) and a binary differential consensus (BDC) method into a fully-decentralized MARL framework. We benchmarked the performance of our proposed BDC-MARL algorithm against several several non-communicative andcommunicative MARL algorithms, e.g., IA2C, FPrint, and DIAL, through the evaluation of platoon stability, fuel economy, and driving comfort. Our results show that BDC-MARL achieved the highest energy savings, improving by up to 5.8%, with an average velocity of 15.26 m/s and an inter-vehicle spacing of 20.76 m. In addition, we conducted different information-sharing analyses to assess communication efficacy, along with sensitivity analyses and scalability tests with varying platoon sizes. The practical effectiveness of our approach is further demonstrated using real-world scenarios sourced from open-sourced OpenACC.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations
Authors:
Shi Yan,
Yan Liang,
Huayu Zhang,
Le Zheng,
Difan Zou,
Binglu Wang
Abstract:
Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformati…
▽ More
Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformation incorporates crucial global state information with support for bi-directional recursive computation. For the transformed model, the joint state-memory-trend Bayesian filtering and smoothing frameworks are derived by introducing the bidirectional memory iteration mechanism and offline data into Bayesian estimation theory. The derived frameworks are implemented using the Gaussian approximation to ensure analytical properties and computational efficiency. Finally, the neural network modules within EBRNS and its two-stage training scheme are designed. Unlike most existing approaches that artificially combine deep learning and model-based estimation, the bidirectional recursion and internal gated structures of EBRNS are naturally derived from Bayesian estimation theory, explainably integrating prior model knowledge, online measurement, and offline data. Experiments on representative real-world datasets demonstrate that the high smoothing accuracy of EBRNS is accompanied by data efficiency and a lightweight parameter scale.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Bridging Language Gaps in Audio-Text Retrieval
Authors:
Zhiyong Yan,
Heinrich Dinkel,
Yongqing Wang,
Jizhong Liu,
Junbo Zhang,
Yujun Wang,
Bin Wang
Abstract:
Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multi…
▽ More
Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multilingual text encoder (SONAR) to encode the text data with language-specific information. Additionally, we optimize the audio encoder through the application of consistent ensemble distillation (CED), enhancing support for variable-length audio-text retrieval. Our methodology excels in English audio-text retrieval, demonstrating state-of-the-art (SOTA) performance on commonly used datasets such as AudioCaps and Clotho. Simultaneously, the approach exhibits proficiency in retrieving content in seven other languages with only 10% of additional language-enhanced training data, yielding promising results. The source code is publicly available https://github.com/zyyan4/ml-clap.
△ Less
Submitted 16 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Scaling up masked audio encoder learning for general audio classification
Authors:
Heinrich Dinkel,
Zhiyong Yan,
Yongqing Wang,
Junbo Zhang,
Yujun Wang,
Bin Wang
Abstract:
Despite progress in audio classification, a generalization gap remains between speech and other sound domains, such as environmental sounds and music. Models trained for speech tasks often fail to perform well on environmental or musical audio tasks, and vice versa. While self-supervised (SSL) audio representations offer an alternative, there has been limited exploration of scaling both model and…
▽ More
Despite progress in audio classification, a generalization gap remains between speech and other sound domains, such as environmental sounds and music. Models trained for speech tasks often fail to perform well on environmental or musical audio tasks, and vice versa. While self-supervised (SSL) audio representations offer an alternative, there has been limited exploration of scaling both model and dataset sizes for SSL-based general audio classification. We introduce Dasheng, a simple SSL audio encoder, based on the efficient masked autoencoder framework. Trained with 1.2 billion parameters on 272,356 hours of diverse audio, Dasheng obtains significant performance gains on the HEAR benchmark. It outperforms previous works on CREMA-D, LibriCount, Speech Commands, VoxLingua, and competes well in music and environment classification. Dasheng features inherently contain rich speech, music, and environmental information, as shown in nearest-neighbor classification experiments. Code is available https://github.com/richermans/dasheng/.
△ Less
Submitted 13 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors
Authors:
Bingheng Wang,
Rui Huang,
Kuankuan Sima,
Lin Zhao
Abstract:
Designing motion control and planning algorithms for multilift systems remains challenging due to the complexities of dynamics, collision avoidance, actuator limits, and scalability. Existing methods that use optimization and distributed techniques effectively address these constraints and scalability issues. However, they often require substantial manual tuning, leading to suboptimal performance.…
▽ More
Designing motion control and planning algorithms for multilift systems remains challenging due to the complexities of dynamics, collision avoidance, actuator limits, and scalability. Existing methods that use optimization and distributed techniques effectively address these constraints and scalability issues. However, they often require substantial manual tuning, leading to suboptimal performance. This paper proposes Auto-Multilift, a novel framework that automates the tuning of model predictive controllers (MPCs) for multilift systems. We model the MPC cost functions with deep neural networks (DNNs), enabling fast online adaptation to various scenarios. We develop a distributed policy gradient algorithm to train these DNNs efficiently in a closed-loop manner. Central to our algorithm is distributed sensitivity propagation, which is built on fully exploiting the unique dynamic couplings within the multilift system. It parallelizes gradient computation across quadrotors and focuses on actual system state sensitivities relative to key MPC parameters. Extensive simulations demonstrate favorable scalability to a large number of quadrotors. Our method outperforms a state-of-the-art open-loop MPC tuning approach by effectively learning adaptive MPCs from trajectory tracking errors. It also excels in learning an adaptive reference for reconfiguring the system when traversing multiple narrow slots.
△ Less
Submitted 15 July, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
A novel fault localization with data refinement for hydroelectric units
Authors:
Jialong Huang,
Junlin Song,
Penglong Lian,
Mengjie Gan,
Zhiheng Su,
Benhao Wang,
Wenji Zhu,
Xiaomin Pu,
Jianxiao Zou,
Shicai Fan
Abstract:
Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni…
▽ More
Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals
Authors:
Xiran Xu,
Bo Wang,
Boda Xiao,
Yadong Niu,
Yiwen Wang,
Xihong Wu,
Jing Chen
Abstract:
Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were…
▽ More
Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were doubted by some researchers, and they argued that such decoding accuracy was overestimated due to the inherent temporal autocorrelation of EEG signals. However, the coupling between the stimulus-driven neural responses and the EEG temporal autocorrelations makes it difficult to confirm whether this overestimation exists in truth. Furthermore, the underlying pitfalls behind overestimated decoding accuracy have not been fully explained due to a lack of appropriate formulation. In this work, we formulate the pitfall in various EEG decoding tasks in a unified framework. EEG data were recorded from watermelons to remove stimulus-driven neural responses. Labels were assigned to continuous EEG according to the experimental design for EEG recording of several typical datasets, and then the decoding methods were conducted. The results showed the label can be successfully decoded as long as continuous EEG data with the same label were split into training and test sets. Further analysis indicated that high accuracy of various BCI decoding tasks could be achieved by associating labels with EEG intrinsic temporal autocorrelation features. These results underscore the importance of choosing the right experimental designs and data splits in BCI decoding tasks to prevent inflated accuracies due to EEG temporal autocorrelation.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Massive MIMO-ISAC System With 1-Bit ADCs/DACs
Authors:
Bowen Wang,
Hongyu Li,
Bin Liao,
Ziyang Cheng
Abstract:
This paper investigates a hardware-efficient massive multiple-input multiple-output integrated sensing and communication (MIMO-ISAC) system with 1-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs). The proposed system, referred to as 1BitISAC, employs 1-bit DACs at the ISAC transmitter and 1-bit ADCs at the sensing receiver, achieving significant reductions in power consu…
▽ More
This paper investigates a hardware-efficient massive multiple-input multiple-output integrated sensing and communication (MIMO-ISAC) system with 1-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs). The proposed system, referred to as 1BitISAC, employs 1-bit DACs at the ISAC transmitter and 1-bit ADCs at the sensing receiver, achieving significant reductions in power consumption and hardware costs. For such kind of systems, two 1BitISAC joint transceiver designs, i.e., i) quality of service constrained 1BitISAC design and ii) quality of detection constrained design, are considered and the corresponding problems are formulated. In order to address these problems, we thoroughly analyze the radar detection performance after 1-bit ADCs quantization and the communication bit error rate. This analysis yields new design insights and leads to unique radar and communication metrics, which enables us to simplify the original problems and employ majorization-minimization and integer linear programming methods to solve the problems. Numerical results are provided to validate the performance analysis of the proposed 1BitISAC and to compare with other ISAC configurations. The superiority of the proposed 1BitISAC system in terms of balancing ISAC performance and energy efficiency is also demonstrated.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm
Authors:
Boxiang Wang,
Junwei Ji,
Xiaoyi Shen,
Dongyuan Shi,
Woon-Seng Gan
Abstract:
Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conven…
▽ More
Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conventional multichannel virtual sensing ANC (MVANC) system based on the multichannel filtered reference least mean square (MCFxLMS) algorithm often suffers from high computational complexity. This paper proposes a feedforward MVANC system that incorporates the multichannel adjoint least mean square (MCALMS) algorithm to overcome these limitations effectively. Computational analysis demonstrates the improvement of computational efficiency and numerical simulations exhibit comparable noise reduction performance at virtual locations compared to the conventional MCFxLMS algorithm. Additionally, the effects of varied tuning noises on system performance are also investigated, providing insightful findings on optimizing MVANC systems.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning
Authors:
Zheyuan Zhang,
Elif Keles,
Gorkem Durak,
Yavuz Taktak,
Onkar Susladkar,
Vandan Gorade,
Debesh Jha,
Asli C. Ormeci,
Alpay Medetalibeyoglu,
Lanhong Yao,
Bin Wang,
Ilkin Sevgi Isler,
Linkai Peng,
Hongyi Pan,
Camila Lopes Vendrami,
Amir Bourhani,
Yury Velichko,
Boqing Gong,
Concetto Spampinato,
Ayis Pyrros,
Pallavi Tiwari,
Derk C. F. Klatte,
Megan Engels,
Sanne Hoogenboom,
Candice W. Bolan
, et al. (13 additional authors not shown)
Abstract:
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st…
▽ More
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.
△ Less
Submitted 25 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Implementation of the Feedforward Multichannel Virtual Sensing Active Noise Control (MVANC) by Using MATLAB
Authors:
Boxiang Wang
Abstract:
The multichannel virtual sensing active noise control (MVANC) methodology is an advanced approach that may provide a wide area of silence at specific virtual positions that are distant from the physical error microphones. Currently, there is a scarcity of open-source programs available for the MVANC algorithm. This work presents a MATLAB code for the MVANC approach, utilizing the multichannel filt…
▽ More
The multichannel virtual sensing active noise control (MVANC) methodology is an advanced approach that may provide a wide area of silence at specific virtual positions that are distant from the physical error microphones. Currently, there is a scarcity of open-source programs available for the MVANC algorithm. This work presents a MATLAB code for the MVANC approach, utilizing the multichannel filtered-x least mean square (MCFxLMS) algorithm. The code is designed to be applicable to systems with any number of channels. The code can be found on GitHub.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation
Authors:
Debesh Jha,
Nikhil Kumar Tomar,
Koushik Biswas,
Gorkem Durak,
Matthew Antalek,
Zheyuan Zhang,
Bin Wang,
Md Mostafijur Rahman,
Hongyi Pan,
Alpay Medetalibeyoglu,
Yury Velichko,
Daniela Ladner,
Amir Borhani,
Ulas Bagci
Abstract:
Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple di…
▽ More
Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Preventive Audits for Data Applications Before Data Sharing in the Power IoT
Authors:
Bohong Wang,
Qinglai Guo,
Yanxi Lin,
Yang Yu
Abstract:
With the increase in data volume, more types of data are being used and shared, especially in the power Internet of Things (IoT). However, the processes of data sharing may lead to unexpected information leakage because of the ubiquitous relevance among the different data, thus it is necessary for data owners to conduct preventive audits for data applications before data sharing to avoid the risk…
▽ More
With the increase in data volume, more types of data are being used and shared, especially in the power Internet of Things (IoT). However, the processes of data sharing may lead to unexpected information leakage because of the ubiquitous relevance among the different data, thus it is necessary for data owners to conduct preventive audits for data applications before data sharing to avoid the risk of key information leakage. Considering that the same data may play completely different roles in different application scenarios, data owners should know the expected data applications of the data buyers in advance and provide modified data that are less relevant to the private information of the data owners and more relevant to the nonprivate information that the data buyers need. In this paper, data sharing in the power IoT is regarded as the background, and the mutual information of the data and their implicit information is selected as the data feature parameter to indicate the relevance between the data and their implicit information or the ability to infer the implicit information from the data. Therefore, preventive audits should be conducted based on changes in the data feature parameters before and after data sharing. The probability exchange adjustment method is proposed as the theoretical basis of preventive audits under simplified consumption, and the corresponding optimization models are constructed and extended to more practical scenarios with multivariate characteristics. Finally, case studies are used to validate the effectiveness of the proposed preventive audits.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Robust Proximity Detection using On-Device Gait Monitoring
Authors:
Yuqian Hu,
Guozhen Zhu,
Beibei Wang,
K. J. Ray Liu
Abstract:
Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates…
▽ More
Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates gait presence by leveraging the speed estimated from the autocorrelation function (ACF) of channel state information (CSI). By combining this gait score with a proximity feature, our approach effectively distinguishes different transition patterns, enabling more reliable proximity detection. In addition, to enhance the stability of the detection process, we employ a state machine and extract temporal information, ensuring continuous proximity detection even during subtle movements. Extensive experiments conducted in different environments demonstrate an overall detection rate of 92.5% and a low false alarm rate of 1.12% with a delay of 0.825s.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission
Authors:
Mingyu Yang,
Bowen Liu,
Boyang Wang,
Hun-Seok Kim
Abstract:
Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated to be an effective approach for wireless image transmission. Nevertheless, most existing work adopts an autoencoder framework to optimize conventional criteria such as Mean Squared Error (MSE) and Structural Similarity Index (SSIM) which do not suffice to maintain the perceptual quality of reconstructed images. Such…
▽ More
Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated to be an effective approach for wireless image transmission. Nevertheless, most existing work adopts an autoencoder framework to optimize conventional criteria such as Mean Squared Error (MSE) and Structural Similarity Index (SSIM) which do not suffice to maintain the perceptual quality of reconstructed images. Such an issue is more prominent under stringent bandwidth constraints or low signal-to-noise ratio (SNR) conditions. To tackle this challenge, we propose DiffJSCC, a novel framework that leverages the prior knowledge of the pre-trained Statble Diffusion model to produce high-realism images via the conditional diffusion denoising process. Our DiffJSCC first extracts multimodal spatial and textual features from the noisy channel symbols in the generation phase. Then, it produces an initial reconstructed image as an intermediate representation to aid robust feature extraction and a stable training process. In the following diffusion step, DiffJSCC uses the derived multimodal features, together with channel state information such as the signal-to-noise ratio (SNR), as conditions to guide the denoising diffusion process, which converts the initial random noise to the final reconstruction. DiffJSCC employs a novel control module to fine-tune the Stable Diffusion model and adjust it to the multimodal conditions. Extensive experiments on diverse datasets reveal that our method significantly surpasses prior deep JSCC approaches on both perceptual metrics and downstream task performance, showcasing its ability to preserve the semantics of the original transmitted images. Notably, DiffJSCC can achieve highly realistic reconstructions for 768x512 pixel Kodak images with only 3072 symbols (<0.008 symbols per pixel) under 1dB SNR channels.
△ Less
Submitted 17 July, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Authors:
Bin Wang,
Fei Deng,
Peifan Jiang
Abstract:
Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbas…
▽ More
Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbased approaches exhibit substantial potential for enhancing the signal-to-noise ratio of EEG data compared to traditional methods. In the realm of large-scale language models (LLMs), the Retentive Network (Retnet) infrastructure, prevalent for some models, demonstrates robust feature extraction and global modeling capabilities. Recognizing the temporal similarities between EEG signals and natural language, we introduce the Retnet from natural language processing to EEG denoising. This integration presents a novel approach to EEG denoising, opening avenues for a profound understanding of brain activities and accurate diagnosis of neurological diseases. Nonetheless, direct application of Retnet to EEG denoising is unfeasible due to the one-dimensional nature of EEG signals, while natural language processing deals with two-dimensional data. To facilitate Retnet application to EEG denoising, we propose the signal embedding method, transforming one-dimensional EEG signals into two dimensions for use as network inputs. Experimental results validate the substantial improvement in denoising effectiveness achieved by the proposed method.
△ Less
Submitted 20 May, 2024; v1 submitted 20 March, 2024;
originally announced April 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Task-Oriented Hybrid Beamforming for OFDM-DFRC Systems with Flexibly Controlled Space-Frequency Spectra
Authors:
Lingyun Xu,
Bowen Wang,
Ziyang Cheng
Abstract:
This paper investigates the issues of the hybrid beamforming design for the orthogonal frequency division multiplexing dual-function radar-communication (DFRC) system in multiple task scenarios involving the radar scanning and detection task and the target tracking task. To meet different task requirements of the DFRC system, we introduce two novel radar beampattern metrics, the average integrated…
▽ More
This paper investigates the issues of the hybrid beamforming design for the orthogonal frequency division multiplexing dual-function radar-communication (DFRC) system in multiple task scenarios involving the radar scanning and detection task and the target tracking task. To meet different task requirements of the DFRC system, we introduce two novel radar beampattern metrics, the average integrated sidelobe to minimum mainlobe ratio (AISMMR) and average peak sidelobe to integrated mainlobe ratio (APSIMR), to characterize the space-frequency spectra in different scenarios. Then, two HBF design problems are formulated for two task scenarios by minimizing the AISMMR and APSIMR respectively subject to the constraints of communication quality-of-service (QoS), power budget, and hardware. Due to the non-linearity and close coupling between the analog and digital beamformers in both the objective functions and QoS constraint, the resultant formulated problems are challenging to solve. Towards that end, a unified optimization algorithm based on a consensus alternating direction method of multipliers (CADMM) is proposed to solve these two problems. Moreover, under the unified CADMM framework, the closed-form solutions of primal variables in the original two problems are obtained with low complexity. Numerical simulations are provided to demonstrate the feasibility and effectiveness of the proposed algorithm.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Beamforming Design for Double-Active-RIS-aided Communication Systems with Inter-Excitation
Authors:
Boshi Wang,
Cunhua Pan,
Hong Ren,
Zhiyuan Yu,
Yang Zhang,
Mengyu Liu,
Gui Zhou
Abstract:
In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, they can effectively mitigate the multiplicative fading effect. However, t…
▽ More
In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, they can effectively mitigate the multiplicative fading effect. However, this also induces signal bouncing between the two active RISs that cannot be ignored. This phenomenon is termed as the "inter-excitation" effect and is characterized in the received signal by proposing a feedback-type model. Based on the signal model, we formulate a weighted sum rate (WSR) maximization problem by jointly optimizing the beamforming matrix at the BS and the reflecting coefficient matrices at the two active RISs, subject to power constraints at the BS and active RISs, as well as the maximum amplification gain constraints of the active RISs. To solve this non-convex problem, we first transform the problem into a more tractable form using the fractional programming (FP) method. Then, by introducing auxiliary variables, the problem can be converted into an equivalent form that can be solved by using a penalty dual decomposition (PDD) algorithm. Finally, simulation results indicate that it proposed scheme outperforms benchmark schemes with single active RIS and double passive RISs in terms of achievable rate. Furthermore, the results demonstrate that the proposed scheme can enhance the WSR by 30\% compared to scenarios that do not take this effect into account when the maximum amplification gain is 40 dB. Additionally, the proposed scheme is capable of achieving high WSR performance at most locations where double active RISs are deployed between the BS and the users, thereby providing greater flexibility in their positioning.
△ Less
Submitted 23 August, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Enhancing Physical Layer Security in Dual-Function Radar-Communication Systems with Hybrid Beamforming Architecture
Authors:
Lingyun Xu,
Bowen Wang,
Huiyong Li,
Ziyang Cheng
Abstract:
In this letter, we investigate enhancing the physical layer security (PLS) for the dual-function radar-communication (DFRC) system with hybrid beamforming (HBF) architecture, where the base station (BS) achieves downlink communication and radar target detection simultaneously. We consider an eavesdropper intercepting the information transmitted from the BS to the downlink communication users with…
▽ More
In this letter, we investigate enhancing the physical layer security (PLS) for the dual-function radar-communication (DFRC) system with hybrid beamforming (HBF) architecture, where the base station (BS) achieves downlink communication and radar target detection simultaneously. We consider an eavesdropper intercepting the information transmitted from the BS to the downlink communication users with imperfectly known channel state information. Additionally, the location of the radar target is also imperfectly known by the BS. To enhance PLS in the considered DFRC system, we propose a novel HBF architecture, which introduces a new integrated sensing and security (I2S) symbol. The secure HBF design problem for DFRC is formulated by maximizing the minimum legitimate user communication rate subject to radar signal-to-interference-plus-noise ratio, eavesdropping rate, hardware and power constraints. To solve this non-convex problem, we propose an alternating optimization based method to jointly optimize transmit and receive beamformers. Numerical simulation results validate the effectiveness of the proposed algorithm and show the superiority of the proposed I2S-aided HBF architecture for achieving DFRC and enhancing PLS.
△ Less
Submitted 4 April, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Efficient dual-scale generalized Radon-Fourier transform detector family for long time coherent integration
Authors:
Suqi Li,
Yihan Wang,
Bailu Wang,
Giorgio Battistelli,
Luigi Chisci,
Guolong Cui
Abstract:
Long Time Coherent Integration (LTCI) aims to accumulate target energy through long time integration, which is an effective method for the detection of a weak target. However, for a moving target, defocusing can occur due to range migration (RM) and Doppler frequency migration (DFM). To address this issue, RM and DFM corrections are required in order to achieve a well-focused image for the subsequ…
▽ More
Long Time Coherent Integration (LTCI) aims to accumulate target energy through long time integration, which is an effective method for the detection of a weak target. However, for a moving target, defocusing can occur due to range migration (RM) and Doppler frequency migration (DFM). To address this issue, RM and DFM corrections are required in order to achieve a well-focused image for the subsequent detection. Since RM and DFM are induced by the same motion parameters, existing approaches such as the generalized Radon-Fourier transform (GRFT) or the keystone transform (KT)-matching filter process (MFP) adopt the same search space for the motion parameters in order to eliminate both effects, thus leading to large redundancy in computation. To this end, this paper first proposes a dual-scale decomposition of the target motion parameters, consisting of well designed coarse and fine motion parameters. Then, utilizing this decomposition, the joint correction of the RM and DFM effects is decoupled into a cascade procedure, first RM correction on the coarse search space and then DFM correction on the fine search spaces. As such, step size of the search space can be tailored to RM and DFM corrections, respectively, thus avoiding large redundant computation effectively. The resulting algorithms are called dual-scale GRFT (DS-GRFT) or dual-scale GRFT (DS-KTMFP) which provide comparable performance while achieving significant improvement in computational efficiency compared to standard GRFT (KT-MFP). Simulation experiments verify their effectiveness and efficiency.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
Authors:
Junxiong Lin,
Yan Wang,
Zeng Tao,
Boyang Wang,
Qing Zhao,
Haorang Wang,
Xuan Tong,
Xinji Mai,
Yuxuan Lin,
Wei Song,
Jiawen Yu,
Shaoqi Yan,
Wenqiang Zhang
Abstract:
Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf…
▽ More
Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation information on the diffusion process. Furthermore, these methods fail to consider the spatial variability inherent in the estimated blur kernel, stemming from factors such as motion jitter and out-of-focus elements in open-environment scenarios. This oversight results in a notable deviation of the image super-resolution effect from fundamental realities. To address these concerns, we introduce a framework known as Adaptive Multi-modal Fusion of \textbf{S}patially Variant Kernel Refinement with Diffusion Model for Blind Image \textbf{S}uper-\textbf{R}esolution (SSR). Within the SSR framework, we propose a Spatially Variant Kernel Refinement (SVKR) module. SVKR estimates a Depth-Informed Kernel, which takes the depth information into account and is spatially variant. Additionally, SVKR enhance the accuracy of depth information acquired from LR images, allowing for mutual enhancement between the depth map and blur kernel estimates. Finally, we introduce the Adaptive Multi-Modal Fusion (AMF) module to align the information from three modalities: low-resolution images, depth maps, and blur kernels. This alignment can constrain the diffusion model to generate more authentic SR results.
△ Less
Submitted 9 July, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Authors:
Boyang Wang,
Fengyu Yang,
Xihang Yu,
Chao Zhang,
Hanbin Zhao
Abstract:
While real-world anime super-resolution (SR) has gained increasing attention in the SR community, existing methods still adopt techniques from the photorealistic domain. In this paper, we analyze the anime production workflow and rethink how to use characteristics of it for the sake of the real-world anime SR. First, we argue that video networks and datasets are not necessary for anime SR due to t…
▽ More
While real-world anime super-resolution (SR) has gained increasing attention in the SR community, existing methods still adopt techniques from the photorealistic domain. In this paper, we analyze the anime production workflow and rethink how to use characteristics of it for the sake of the real-world anime SR. First, we argue that video networks and datasets are not necessary for anime SR due to the repetition use of hand-drawing frames. Instead, we propose an anime image collection pipeline by choosing the least compressed and the most informative frames from the video sources. Based on this pipeline, we introduce the Anime Production-oriented Image (API) dataset. In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts. We address the first issue by introducing a prediction-oriented compression module in the image degradation model and a pseudo-ground truth preparation with enhanced hand-drawn lines. In addition, we introduce the balanced twin perceptual loss combining both anime and photorealistic high-level features to mitigate unwanted color artifacts and increase visual clarity. We evaluate our method through extensive experiments on the public benchmark, showing our method outperforms state-of-the-art anime dataset-trained approaches.
△ Less
Submitted 4 April, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Communication Efficient ConFederated Learning: An Event-Triggered SAGA Approach
Authors:
Bin Wang,
Jun Fang,
Hongbin Li,
Yonina C. Eldar
Abstract:
Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data dispersed over various data sources. Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability. In this work, we consider a multi-server FL framework, referred to as \emph{Confederated Learning} (CFL), in order to…
▽ More
Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data dispersed over various data sources. Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability. In this work, we consider a multi-server FL framework, referred to as \emph{Confederated Learning} (CFL), in order to accommodate a larger number of users. A CFL system is composed of multiple networked edge servers, with each server connected to an individual set of users. Decentralized collaboration among servers is leveraged to harness all users' data for model training. Due to the potentially massive number of users involved, it is crucial to reduce the communication overhead of the CFL system. We propose a stochastic gradient method for distributed learning in the CFL framework. The proposed method incorporates a conditionally-triggered user selection (CTUS) mechanism as the central component to effectively reduce communication overhead. Relying on a delicately designed triggering condition, the CTUS mechanism allows each server to select only a small number of users to upload their gradients, without significantly jeopardizing the convergence performance of the algorithm. Our theoretical analysis reveals that the proposed algorithm enjoys a linear convergence rate. Simulation results show that it achieves substantial improvement over state-of-the-art algorithms in terms of communication efficiency.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Rate Splitting Multiple Access-Enabled Adaptive Panoramic Video Semantic Transmission
Authors:
Haixiao Gao,
Mengying Sun,
Xiaodong Xu,
Shujun Han,
Bizhu Wang,
Jingxuan Zhang,
Ping Zhang
Abstract:
In this paper, we propose an adaptive panoramic video semantic transmission (APVST) framework enabled by rate splitting multiple access (RSMA). The APVST framework consists of a semantic transmitter and receiver, utilizing a deep joint source-channel coding structure to adaptively extract and encode semantic features from panoramic frames. To achieve higher spectral efficiency and conserve bandwid…
▽ More
In this paper, we propose an adaptive panoramic video semantic transmission (APVST) framework enabled by rate splitting multiple access (RSMA). The APVST framework consists of a semantic transmitter and receiver, utilizing a deep joint source-channel coding structure to adaptively extract and encode semantic features from panoramic frames. To achieve higher spectral efficiency and conserve bandwidth, APVST employs an entropy model and a dimension-adaptive module to control the transmission rate. Additionally, we take weighted-to-spherically-uniform peak signal-to-noise ratio (WS-PSNR) and weighted-to-spherically-uniform structural similarity (WS-SSIM) as distortion evaluation metrics for panoramic videos and design a weighted self-attention module for APVST. This module integrates weights and feature maps to enhance the quality of the immersive experience. Considering the overlap in the field of view when users watch panoramic videos, we further utilize RSMA to split the required panoramic video semantic streams into common and private messages for transmission. We propose an RSMA-enabled semantic stream transmission scheme and formulate a joint problem of latency and immersive experience quality by optimizing the allocation ratios of power, common rate, and channel bandwidth, aiming to maximize the quality of service (QoS) scores for users. To address the above problem, we propose a deep reinforcement learning algorithm based on proximal policy optimization (PPO) with high efficiency to handle dynamically changing environments. Simulation results demonstrate that our proposed APVST framework saves up to 20% and 50% of channel bandwidth compared to other semantic and traditional video transmission schemes, respectively. Moreover, our study confirms the efficiency of RSMA in panoramic video transmission, achieving performance gains of 13% and 20% compared to NOMA and OFDMA.
△ Less
Submitted 23 June, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Implementation of the Multichannel Filtered Reference Least Mean Square (McFxLMS) Algorithm with an Arbitrary Number of Channels by Using MATLAB
Authors:
Boxiang Wang
Abstract:
Multichannel filtered reference least mean square (McFxLMS) algorithms are widely utilized in adaptive multichannel active noise control (MCANC) applications. As a critical and high-computationally efficient adaptive critical algorithm, it also typically works as a benchmark for comparative studies of the new algorithms proposed by peers and researchers. However, up to now, there are few open-sour…
▽ More
Multichannel filtered reference least mean square (McFxLMS) algorithms are widely utilized in adaptive multichannel active noise control (MCANC) applications. As a critical and high-computationally efficient adaptive critical algorithm, it also typically works as a benchmark for comparative studies of the new algorithms proposed by peers and researchers. However, up to now, there are few open-source codes for the FxLMS algorithm, especially for large-count channels. Therefore, this work provides a MATLAB code for the McFxLMS algorithm, which can be used for the arbitrary number of channels system. The code is available on GitHub and Mathworks.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Reconfigurable Intelligent Surface-Aided Dual-Function Radar and Communication Systems With MU-MIMO Communication
Authors:
Yasheng Jin,
Hong Ren,
Cunhua Pan,
Zhiyuan Yu,
Ruisong Weng,
Boshi Wang,
Gui Zhou,
Yongchao He,
Maged Elkashlan
Abstract:
In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem i…
▽ More
In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem into an equivalent one. Then, we utilize an alternating optimization (AO) { algorithm} to decouple the optimization variables and decompose this challenging problem into two subproblems. Given reflecting coefficients, a penalty-based algorithm is utilized to deal with the non-convex radar signal-to-noise ratio (SNR) constraints. For the given beamforming matrix of the BS, we apply majorization-minimization (MM) to transform the problem into a quadratic constraint quadratic programming (QCQP) problem, which is ultimately solved using a semidefinite relaxation (SDR)-based algorithm. Simulation results illustrate the advantage of deploying RIS in the considered multi-user MIMO (MU-MIMO) ISAC systems.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Joint Beamforming Design for Double Active RIS-assisted Radar-Communication Coexistence Systems
Authors:
Mengyu Liu,
Hong Ren,
Cunhua Pan,
Boshi Wang,
Zhiyuan Yu,
Ruisong Weng,
Kangda Zhi,
Yongchao He
Abstract:
Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigur…
▽ More
Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigurable intelligent surface (RIS) has been introduced into RCC systems to reduce the interference. However, the "multiplicative fading" effect introduced by passive RIS limits its performance. To tackle this issue, we consider a double active RIS-assisted RCC system, which focuses on the design of the radar's beamforming vector and the active RISs' reflecting coefficient matrices, to maximize the achievable data rate of the communication system. The considered system needs to meet the radar detection constraint and the power budgets at the radar and the RISs. Since the problem is non-convex, we propose an algorithm based on the penalty dual decomposition (PDD) framework. Specifically, we initially introduce auxiliary variables to reformulate the coupled variables into equation constraints and incorporate these constraints into the objective function through the PDD framework. Then, we decouple the equivalent problem into several subproblems by invoking the block coordinate descent (BCD) method. Furthermore, we employ the Lagrange dual method to alternately optimize these subproblems. Simulation results verify the effectiveness of the proposed algorithm. Furthermore, the results also show that under the same power budget, deploying double active RISs in RCC systems can achieve higher data rate than those with single active RIS and double passive RISs.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Secure Wireless Communication in Active RIS-Assisted DFRC System
Authors:
Yang Zhang,
Hong Ren,
Cunhua Pan,
Boshi Wang,
Zhiyuan Yu,
Ruisong Weng,
Tuo Wu,
Yongchao He
Abstract:
This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-r…
▽ More
This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-ratio (SINR) constraint of the radar echo and the power consumption constraints at the DFRC-BS and active RIS. An alternating optimization (AO) algorithm based on semi-definite relaxation (SDR) and majorizationminimization (MM) is applied to solve the SR-maximization problem by alternately optimizing the beamforming matrix and the reflecting coefficients. Specifically, we first apply the SDR and successive convex approximation (SCA) methods to transform the two subproblems into more tractable forms, then the MM method is applied to derive a concave surrogate function and iteratively solve the subproblems. Finally, simulation results indicate that the active RIS can better confront the impact of "multiplicative fading" and outperforms traditional passive RIS in terms of both secure data rate and radar sensing performance.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.