Search | arXiv e-print repository

Addressing the Mutual Interference in Uplink ISAC Receivers: A Projection Method

Authors: Zhiyuan Yu, Hong Ren, Cunhua Pan, Gui Zhou, Ruizhe Wang, Mengyu Liu, Jiangzhou Wang

Abstract: Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estim… ▽ More Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estimation from the echoes, leading to issues with mutual interference. In this paper, a projection-based scheme is proposed to equivalently transform the joint signal detection and target estimation problem into a joint signal detection process across multiple snapshots. Compared with conventional successive interference cancellation (SIC) schemes, our proposed approach achieves a higher signal-to-noise ratio (SNR), and a higher ergodic rate when the radar signal is non-negligible. Nonetheless, it introduces an ill-conditioned signal detection problem, which is addressed using a non-linear detector. By jointly processing an increased number of snapshots, the proposed scheme can achieve high S&C performance simultaneously. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 5 pages, 3 figures, accepted by IEEE WCL

arXiv:2408.16277 [pdf]

Fine-grained Classification of Port Wine Stains Using Optical Coherence Tomography Angiography

Authors: Xiaofeng Deng, Defu Chen, Bowen Liu, Xiwan Zhang, Haixia Qiu, Wu Yuan, Hongliang Ren

Abstract: Accurate classification of port wine stains (PWS, vascular malformations present at birth), is critical for subsequent treatment planning. However, the current method of classifying PWS based on the external skin appearance rarely reflects the underlying angiopathological heterogeneity of PWS lesions, resulting in inconsistent outcomes with the common vascular-targeted photodynamic therapy (V-PDT)… ▽ More Accurate classification of port wine stains (PWS, vascular malformations present at birth), is critical for subsequent treatment planning. However, the current method of classifying PWS based on the external skin appearance rarely reflects the underlying angiopathological heterogeneity of PWS lesions, resulting in inconsistent outcomes with the common vascular-targeted photodynamic therapy (V-PDT) treatments. Conversely, optical coherence tomography angiography (OCTA) is an ideal tool for visualizing the vascular malformations of PWS. Previous studies have shown no significant correlation between OCTA quantitative metrics and the PWS subtypes determined by the current classification approach. This study proposes a new classification approach for PWS using both OCT and OCTA. By examining the hypodermic histopathology and vascular structure of PWS, we have devised a fine-grained classification method that subdivides PWS into five distinct types. To assess the angiopathological differences of various PWS subtypes, we have analyzed six metrics related to vascular morphology and depth information of PWS lesions. The five PWS types present significant differences across all metrics compared to the conventional subtypes. Our findings suggest that an angiopathology-based classification accurately reflects the heterogeneity in PWS lesions. This research marks the first attempt to classify PWS based on angiopathology, potentially guiding more effective subtyping and treatment strategies for PWS. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2408.04593 [pdf, other]

SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

Authors: Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie Wang, Long Bai, Hongliang Ren

Abstract: The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-sh… ▽ More The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts, alongside its robustness against real-world corruption. For static images, we employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame. Through extensive experimentation on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 2, when utilizing bounding box prompts, outperforms state-of-the-art (SOTA) methods in comparative evaluations. The results with point prompts also exhibit a substantial enhancement over SAM's capabilities, nearing or even surpassing existing unprompted SOTA methodologies. Besides, SAM 2 demonstrates improved inference speed and less performance degradation against various image corruption. Although slightly unsatisfactory results remain in specific edges or regions, SAM 2's robust adaptability to 1-point prompts underscores its potential for downstream surgical tasks with limited prompt requirements. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Empirical study. Previous work "SAM Meets Robotic Surgery" is accessible at: arXiv:2308.07156

arXiv:2407.03228 [pdf, other]

Movable Antenna-enabled RIS-aided Integrated Sensing and Communication

Authors: Haisu Wu, Hong Ren, Cunhua Pan

Abstract: In this paper, we investigate a movable antenna (MA)-aided integrated sensing and communication (ISAC) system, where a reconfigurable intelligent surface (RIS) is employed to enhance wireless communication and sensing performance in dead zones. Specifically, this paper aims to maximize the minimum beampattern gain at the RIS by jointly optimizing beamforming matrix at the base station (BS), the re… ▽ More In this paper, we investigate a movable antenna (MA)-aided integrated sensing and communication (ISAC) system, where a reconfigurable intelligent surface (RIS) is employed to enhance wireless communication and sensing performance in dead zones. Specifically, this paper aims to maximize the minimum beampattern gain at the RIS by jointly optimizing beamforming matrix at the base station (BS), the reflecting coefficients at the RIS and the positions of the MAs, subject to signal-to-interference-plus-noise ratio (SINR) constraint for the users and maximum transmit power at the BS. To tackle this non-convex optimization problem, we propose an alternating optimization (AO) algorithm and employ semidefinite relaxation (SDR), sequential rank-one constraint relaxation (SRCR) and successive convex approximation (SCA) techniques. Numerical results indicate that the MA and RIS-aided ISAC system outperforms conventional fixed position antenna (FPA) and RIS-aided systems. In addition, the application of MAs can reduce the similarity of user channels and enhance channel gain in the ISAC system. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 14 pages, 11 figures

arXiv:2406.16876 [pdf, other]

Near-Field Mobile Tracking: A Framework of Using XL-RIS Information

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Junteng Yao, Hong Ren, Maged Elkashlan, Chau Yuen

Abstract: This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more… ▽ More This paper introduces a novel mobile tracking framework leveraging the high-dimensional signal received from extremely large-scale (XL) reconfigurable intelligent surfaces (RIS). This received signal, named XL-RIS information, has a much larger data dimension and therefore offers a richer feature set compared to the traditional base station (BS) received signal, i.e., BS information, enabling more accurate tracking of mobile users (MUs). As the first step, we present an XL-RIS information reconstruction (XL-RIS-IR) algorithm to reconstruct the high-dimensional XL-RIS information from the low-dimensional BS information. Building on this, this paper proposes a comprehensive framework for mobile tracking, consisting of a Feature Extraction Module and a Mobile Tracking Module. The Feature Extraction Module incorporates a convolutional neural network (CNN) extractor for spatial features, a time and frequency (T$\&$F) extractor for domain features, and a near-field angles of arrival (AoAs) extractor for capturing AoA features within the XL-RIS. These features are combined into a comprehensive feature vector, forming a time-varying sequence fed into the Mobile Tracking Module, which employs an Auto-encoder (AE) with a stacked bidirectional long short-term memory (Bi-LSTM) encoder and a standard LSTM decoder to predict MUs' positions in the upcoming time slot. Simulation results confirm that the tracking accuracy of our proposed framework is significantly enhanced by using reconstructed XL-RIS information and exhibits substantial robustness to signal-to-noise ratio (SNR) variations. △ Less

Submitted 5 August, 2024; v1 submitted 3 April, 2024; originally announced June 2024.

arXiv:2406.13705 [pdf, other]

EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DiT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DiT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance. △ Less

Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

arXiv:2405.18775 [pdf, other]

Synchronization Scheme based on Pilot Sharing in Cell-Free Massive MIMO Systems

Authors: Qihao Peng, Hong Ren, Zhendong Peng, Cunhua Pan, Maged Elkashlan, Dongming Wang, Jiangzhou Wang, Xiaohu You

Abstract: This paper analyzes the impact of pilot-sharing scheme on synchronization performance in a scenario where several slave access points (APs) with uncertain carrier frequency offsets (CFOs) and timing offsets (TOs) share a common pilot sequence. First, the Cramer-Rao bound (CRB) with pilot contamination is derived for pilot-pairing estimation. Furthermore, a maximum likelihood algorithm is presented… ▽ More This paper analyzes the impact of pilot-sharing scheme on synchronization performance in a scenario where several slave access points (APs) with uncertain carrier frequency offsets (CFOs) and timing offsets (TOs) share a common pilot sequence. First, the Cramer-Rao bound (CRB) with pilot contamination is derived for pilot-pairing estimation. Furthermore, a maximum likelihood algorithm is presented to estimate the CFO and TO among the pairing APs. Then, to minimize the sum of CRBs, we devise a synchronization strategy based on a pilot-sharing scheme by jointly optimizing the cluster classification, synchronization overhead, and pilot-sharing scheme, while simultaneously considering the overhead and each AP's synchronization requirements. To solve this NP-hard problem, we simplify it into two sub-problems, namely cluster classification problem and the pilot sharing problem. To strike a balance between synchronization performance and overhead, we first classify the clusters by using the K-means algorithm, and propose a criteria to find a good set of master APs. Then, the pilot-sharing scheme is obtained by using the swap-matching operations. Simulation results validate the accuracy of our derivations and demonstrate the effectiveness of the proposed scheme over the benchmark schemes. △ Less

Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE Journal for pos

arXiv:2405.10948 [pdf, other]

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, Jinlin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recognizing long-range dependencies and aligning multimodal information. In this paper, we introduce Surgical-LVLM, a novel personalized large vision-language model tailored for complex surgical scenarios. Leveraging the pre-trained large vision-language model and specialized Visual Perception LoRA (VP-LoRA) blocks, our model excels in understanding complex visual-language tasks within surgical contexts. In addressing the visual grounding task, we propose the Token-Interaction (TIT) module, which strengthens the interaction between the grounding module and the language responses of the Large Visual Language Model (LVLM) after projecting them into the latent space. We demonstrate the effectiveness of Surgical-LVLM on several benchmarks, including EndoVis-17-VQLA, EndoVis-18-VQLA, and a newly introduced EndoVis Conversations dataset, which sets new performance standards. Our work contributes to advancing the field of automated surgical mentorship by providing a context-aware solution. △ Less

Submitted 22 March, 2024; originally announced May 2024.

arXiv:2405.10550 [pdf, other]

LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.08672 [pdf, other]

EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at https://github.com/BeileiCui/EndoDAC. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: early accepted by MICCAI 2024

arXiv:2405.07218 [pdf, other]

Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal Transitivity

Authors: Sishen Yuan, Guang Li, Baijia Liang, Lailu Li, Qingzhuo Zheng, Shuang Song, Zhen Li, Hongliang Ren

Abstract: Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduce… ▽ More Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduces a chained flexible capsule endoscope (FCE) design concept, specifically conceived to navigate the inherent volume constraints of capsule endoscopes whilst augmenting their therapeutic functionalities. The FCE's distinctive flexibility originates from a conventional rotating joint design and the incision pattern in the flexible material. In vitro experiments validated the passive navigation ability of the FCE in rugged intestinal tracts. Further, the FCE demonstrates consistent reptile-like peristalsis under the influence of an external magnetic field, and possesses the capability for film expansion and disintegration under high-frequency electromagnetic stimulation. These findings illuminate a promising path toward amplifying the therapeutic capacities of capsule endoscopes without necessitating a size compromise. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07216 [pdf, other]

Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

Authors: Sishen Yuan, Baijia Liang, Po Wa Wong, Mingjing Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

Abstract: Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (P… ▽ More Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (PDT) emerges as a promising prospect in this context. This study presents the development and implementation of a magnetically-guided origami robot, incorporating flexible printed circuit units for sustained and stable phototherapy of Helicobacter pylori. Each integrated unit is equipped with wireless charging capabilities, producing an optimal power output that can concurrently illuminate up to 15 LEDs at their maximum intensity. Crucially, these units can be remotely manipulated via a magnetic field, facilitating both translational and rotational movements. We propose an open-loop manual control sequence that allows the formation of a stable, compliant triangular structure through the interaction of internal magnets. This adaptable configuration is uniquely designed to withstand the dynamic squeezing environment prevalent in real-world gastric applications. The research herein represents a significant stride in leveraging technology for innovative medical solutions, particularly in the management of antibiotic-resistant Helicobacter pylori infections. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: IEEE ICRA 2024

arXiv:2405.06946 [pdf, other]

Two-Timescale Design for Reconfigurable Intelligent Surface-Aided URLLC

Authors: Qihao Peng, Hong Ren, Cunhua Pan, Maged Elkashlan, Ana Garcia Armada, Petar Popovski

Abstract: In this paper, to tackle the blockage issue in massive multiple-input-multiple-output (mMIMO) systems, a reconfigurable intelligent surface (RIS) is seamlessly deployed to support devices with ultra-reliable and low-latency communications (URLLC). The transmission power of the base station and the phase shifts of the RIS are jointly devised to maximize the weighted sum rate while considering the s… ▽ More In this paper, to tackle the blockage issue in massive multiple-input-multiple-output (mMIMO) systems, a reconfigurable intelligent surface (RIS) is seamlessly deployed to support devices with ultra-reliable and low-latency communications (URLLC). The transmission power of the base station and the phase shifts of the RIS are jointly devised to maximize the weighted sum rate while considering the spatially correlation and channel estimation errors. Firstly, \textcolor{black}{the relationship between the channel estimation error and spatially correlated RIS's elements is revealed by using the linear minimum mean square error}. Secondly, based on the maximum-ratio transmission precoding, a tight lower bound of the rate under short packet transmission is derived. Finally, the NP-hard problem is decomposed into two optimization problems, where the transmission power is obtained by geometric programming and phase shifts are designed by using gradient ascent method. Besides, we have rigorously proved that the proposed algorithm can rapidly converge to a sub-optimal solution with low complexity. Simulation results confirm the tightness between the analytic results and Monte Carlo simulations. Furthermore, the two-timescale scheme provides a practical solution for the short packet transmission. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: This paper has already been accepted by IEEE Transactions on Wireless Communications

arXiv:2405.00734 [pdf, other]

EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Mani… ▽ More Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Manifold Attention and Confidence Stratification (MACS) to diagnose neurodegenerative disorders based on EEG signals sourced from four centers with unreliable annotations. The MACS framework's effectiveness stems from these features: 1) The Augmentor generates various EEG-represented brain variants to enrich the data space; 2) The Switcher enhances the feature space for trusted samples and reduces overfitting on incorrectly labeled samples; 3) The Encoder uses the Riemannian manifold and Euclidean metrics to capture spatiotemporal variations and dynamic synchronization in EEG; 4) The Projector, equipped with dual heads, monitors consistency across multiple brain variants and ensures diagnostic accuracy; 5) The Stratifier adaptively stratifies learned samples by confidence levels throughout the training process; 6) Forward and backpropagation in MACS are constrained by confidence stratification to stabilize the learning system amid unreliable annotations. Our subject-independent experiments, conducted on both neurocognitive and movement disorders using cross-center corpora, have demonstrated superior performance compared to existing related algorithms. This work not only improves EEG-based diagnostics for cross-center and small-setting brain diseases but also offers insights into extending MACS techniques to other data analyses, tackling data heterogeneity and annotation unreliability in multimedia and multimodal content understanding. △ Less

Submitted 13 August, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

arXiv:2404.15854 [pdf, other]

CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation… ▽ More The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD). △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE TDSC

arXiv:2404.15469 [pdf, other]

NMBEnet: Efficient Near-field mmWave Beam Training for Multiuser OFDM Systems Using Sub-6 GHz Pilots

Authors: Wang Liu, Cunhua Pan, Hong Ren, Cheng-Xiang Wang, Jiangzhou Wang, Xiaohu You

Abstract: Combining millimetre-wave (mmWave) communications with an extremely large-scale antenna array (ELAA) presents a promising avenue for meeting the spectral efficiency demands of the future sixth generation (6G) mobile communications. However, beam training for mmWave ELAA systems is challenged by excessive pilot overheads as well as insufficient accuracy, as the huge near-field codebook has to be ac… ▽ More Combining millimetre-wave (mmWave) communications with an extremely large-scale antenna array (ELAA) presents a promising avenue for meeting the spectral efficiency demands of the future sixth generation (6G) mobile communications. However, beam training for mmWave ELAA systems is challenged by excessive pilot overheads as well as insufficient accuracy, as the huge near-field codebook has to be accounted for. In this paper, inspired by the similarity between far-field sub-6 GHz channels and near-field mmWave channels, we propose to leverage sub-6 GHz uplink pilot signals to directly estimate the optimal near-field mmWave codeword, which aims to reduce pilot overhead and bypass the channel estimation. Moreover, we adopt deep learning to perform this dual mapping function, i.e., sub-6 GHz to mmWave, far-field to near-field, and a novel neural network structure called NMBEnet is designed to enhance the precision of beam training. Specifically, when considering the orthogonal frequency division multiplexing (OFDM) communication scenarios with high user density, correlations arise both between signals from different users and between signals from different subcarriers. Accordingly, the convolutional neural network (CNN) module and graph neural network (GNN) module included in the proposed NMBEnet can leverage these two correlations to further enhance the precision of beam training. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.11889 [pdf, other]

doi 10.1145/3664647.3681154

Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

Authors: Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d… ▽ More X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($π$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images. △ Less

Submitted 30 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, 10 figures, ACM MM2024

arXiv:2404.10640 [pdf, other]

Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos

Authors: Jieming Yu, Long Bai, Guankun Wang, An Wang, Xiaoxiao Yang, Huxin Gao, Hongliang Ren

Abstract: The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instr… ▽ More The precise tracking and segmentation of surgical instruments have led to a remarkable enhancement in the efficiency of surgical procedures. However, the challenge lies in achieving accurate segmentation of surgical instruments while minimizing the need for manual annotation and reducing the time required for the segmentation process. To tackle this, we propose a novel framework for surgical instrument segmentation and tracking. Specifically, with a tiny subset of frames for segmentation, we ensure accurate segmentation across the entire surgical video. Our method adopts a two-stage approach to efficiently segment videos. Initially, we utilize the Segment-Anything (SAM) model, which has been fine-tuned using the Low-Rank Adaptation (LoRA) on the EndoVis17 Dataset. The fine-tuned SAM model is applied to segment the initial frames of the video accurately. Subsequently, we deploy the XMem++ tracking algorithm to follow the annotated frames, thereby facilitating the segmentation of the entire video sequence. This workflow enables us to precisely segment and track objects within the video. Through extensive evaluation of the in-distribution dataset (EndoVis17) and the out-of-distribution datasets (EndoVis18 & the endoscopic submucosal dissection surgery (ESD) dataset), our framework demonstrates exceptional accuracy and robustness, thus showcasing its potential to advance the automated robotic-assisted surgery. △ Less

Submitted 8 August, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: IEEE ICRA 2024 C4SR+ Workshop

arXiv:2403.16529 [pdf, other]

Exploit High-Dimensional RIS Information to Localization: What Is the Impact of Faulty Element?

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Cheng-Xiang Wang, Robert Schober, Xiaohu You

Abstract: This paper proposes a novel localization algorithm using the reconfigurable intelligent surface (RIS) received signal, i.e., RIS information. Compared with BS received signal, i.e., BS information, RIS information offers higher dimension and richer feature set, thereby providing an enhanced capacity to distinguish positions of the mobile users (MUs). Additionally, we address a practical scenario w… ▽ More This paper proposes a novel localization algorithm using the reconfigurable intelligent surface (RIS) received signal, i.e., RIS information. Compared with BS received signal, i.e., BS information, RIS information offers higher dimension and richer feature set, thereby providing an enhanced capacity to distinguish positions of the mobile users (MUs). Additionally, we address a practical scenario where RIS contains some unknown (number and places) faulty elements that cannot receive signals. Initially, we employ transfer learning to design a two-phase transfer learning (TPTL) algorithm, designed for accurate detection of faulty elements. Then our objective is to regain the information lost from the faulty elements and reconstruct the complete high-dimensional RIS information for localization. To this end, we propose a transfer-enhanced dual-stage (TEDS) algorithm. In \emph{Stage I}, we integrate the CNN and variational autoencoder (VAE) to obtain the RIS information, which in \emph{Stage II}, is input to the transferred DenseNet 121 to estimate the location of the MU. To gain more insight, we propose an alternative algorithm named transfer-enhanced direct fingerprint (TEDF) algorithm which only requires the BS information. The comparison between TEDS and TEDF reveals the effectiveness of faulty element detection and the benefits of utilizing the high-dimensional RIS information for localization. Besides, our empirical results demonstrate that the performance of the localization algorithm is dominated by the high-dimensional RIS information and is robust to unoptimized phase shifts and signal-to-noise ratio (SNR). △ Less

Submitted 28 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 17 pages, Accepted by IEEE JSAC

arXiv:2403.16521 [pdf, other]

Employing High-Dimensional RIS Information for RIS-aided Localization Systems

Authors: Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Jiangzhou Wang, Chau Yuen

Abstract: Reconfigurable intelligent surface (RIS)-aided localization systems have attracted extensive research attention due to their accuracy enhancement capabilities. However, most studies primarily utilized the base stations (BS) received signal, i.e., BS information, for localization algorithm design, neglecting the potential of RIS received signal, i.e., RIS information. Compared with BS information,… ▽ More Reconfigurable intelligent surface (RIS)-aided localization systems have attracted extensive research attention due to their accuracy enhancement capabilities. However, most studies primarily utilized the base stations (BS) received signal, i.e., BS information, for localization algorithm design, neglecting the potential of RIS received signal, i.e., RIS information. Compared with BS information, RIS information offers higher dimension and richer feature set, thereby significantly improving the ability to extract positions of the mobile users (MUs). Addressing this oversight, this paper explores the algorithm design based on the high-dimensional RIS information. Specifically, we first propose a RIS information reconstruction (RIS-IR) algorithm to reconstruct the high-dimensional RIS information from the low-dimensional BS information. The proposed RIS-IR algorithm comprises a data processing module for preprocessing BS information, a convolution neural network (CNN) module for feature extraction, and an output module for outputting the reconstructed RIS information. Then, we propose a transfer learning based fingerprint (TFBF) algorithm that employs the reconstructed high-dimensional RIS information for MU localization. This involves adapting a pre-trained DenseNet-121 model to map the reconstructed RIS signal to the MU's three-dimensional (3D) position. Empirical results affirm that the localization performance is significantly influenced by the high-dimensional RIS information and maintains robustness against unoptimized phase shifts. △ Less

Submitted 16 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.11061 [pdf, other]

Beamforming Design for Double-Active-RIS-aided Communication Systems with Inter-Excitation

Authors: Boshi Wang, Cunhua Pan, Hong Ren, Zhiyuan Yu, Yang Zhang, Mengyu Liu, Gui Zhou

Abstract: In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, they can effectively mitigate the multiplicative fading effect. However, t… ▽ More In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, they can effectively mitigate the multiplicative fading effect. However, this also induces signal bouncing between the two active RISs that cannot be ignored. This phenomenon is termed as the "inter-excitation" effect and is characterized in the received signal by proposing a feedback-type model. Based on the signal model, we formulate a weighted sum rate (WSR) maximization problem by jointly optimizing the beamforming matrix at the BS and the reflecting coefficient matrices at the two active RISs, subject to power constraints at the BS and active RISs, as well as the maximum amplification gain constraints of the active RISs. To solve this non-convex problem, we first transform the problem into a more tractable form using the fractional programming (FP) method. Then, by introducing auxiliary variables, the problem can be converted into an equivalent form that can be solved by using a penalty dual decomposition (PDD) algorithm. Finally, simulation results indicate that it proposed scheme outperforms benchmark schemes with single active RIS and double passive RISs in terms of achievable rate. Furthermore, the results demonstrate that the proposed scheme can enhance the WSR by 30\% compared to scenarios that do not take this effect into account when the maximum amplification gain is 40 dB. Additionally, the proposed scheme is capable of achieving high WSR performance at most locations where double active RISs are deployed between the BS and the users, thereby providing greater flexibility in their positioning. △ Less

Submitted 23 August, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10009 [pdf]

Temporal-spatial Adaptation of Promptable SAM Enhance Accuracy and Generalizability of cine CMR Segmentation

Authors: Zhennong Chen, Sekeun Kim, Hui Ren, Quanzheng Li, Xiang Li

Abstract: Accurate myocardium segmentation across all phases in one cardiac cycle in cine cardiac magnetic resonance (CMR) scans is crucial for comprehensively cardiac function analysis. Despite advancements in deep learning (DL) for automatic cine CMR segmentation, generalizability on unseen data remains a significant challenge. Recently, the segment-anything-model (SAM) has been invented as a segmentation… ▽ More Accurate myocardium segmentation across all phases in one cardiac cycle in cine cardiac magnetic resonance (CMR) scans is crucial for comprehensively cardiac function analysis. Despite advancements in deep learning (DL) for automatic cine CMR segmentation, generalizability on unseen data remains a significant challenge. Recently, the segment-anything-model (SAM) has been invented as a segmentation foundation model, known for its accurate segmentation and more importantly, zero-shot generalization. SAM was trained on two-dimensional (2D) natural images; to adapt it for comprehensive cine CMR segmentation, we propose cineCMR-SAM which incorporates both temporal and spatial information through a modified model architecture. Compared to other state-of-the-art (SOTA) methods, our model achieved superior data-specific model segmentation accuracy on the STACOM2011 when fine-tuned on this dataset and demonstrated superior zero-shot generalization on two other large public datasets (ACDC and M&Ms) unseen during fine-tuning. Additionally, we introduced a text prompt feature in cineCMR-SAM to specify the view type of input slices (short-axis or long-axis), enhancing performance across all view types. △ Less

Submitted 15 July, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 10 pages, 3 figures

arXiv:2403.09058 [pdf, ps, other]

Performance Analysis on RIS-Aided Wideband Massive MIMO OFDM Systems with Low-Resolution ADCs

Authors: Xianzhe Chen, Hong Ren, Cunhua Pan, Zhangjie Peng, Kangda Zhi, Yong Liu, Xiaojun Xi, Ana Garcia Armada, Cheng-Xiang Wang

Abstract: This paper investigates a reconfigurable intelligent surface (RIS)-aided wideband massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) system with low-resolution analog-to-digital converters (ADCs). Frequency-selective Rician fading channels are considered, and the OFDM data transmission process is presented in time domain. This paper derives the closed-f… ▽ More This paper investigates a reconfigurable intelligent surface (RIS)-aided wideband massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) system with low-resolution analog-to-digital converters (ADCs). Frequency-selective Rician fading channels are considered, and the OFDM data transmission process is presented in time domain. This paper derives the closed-form approximate expression of the uplink achievable rate, based on which the asymptotic system performance is analyzed when the number of the antennas at the base station and the number of reflecting elements at the RIS grow to infinity. Besides, the power scaling laws of the considered system are revealed to provide energy-saving insights. Furthermore, this paper proposes a gradient ascent-based algorithm to design the phase shifts of the RIS for maximizing the minimum user rate. Finally, numerical results are presented to verify the correctness of analytical conclusions and draw insights. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.06940 [pdf, other]

Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

Authors: Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

Abstract: Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffe… ▽ More Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffer from temporal sparsity and incompleteness, presenting substantial challenges in modeling the disease's progression accurately. Existing methods are limited, focusing primarily on datasets without missing entries or requiring predefined assumptions about CTh progression. To overcome these obstacles, we propose a conditional score-based diffusion model specifically designed to generate CTh trajectories with the given baseline information, such as age, sex, and initial diagnosis. Our conditional diffusion model utilizes all available data during the training phase to make predictions based solely on baseline information during inference without needing prior history about CTh progression. The prediction accuracy of the proposed CTh prediction pipeline using a conditional score-based model was compared for sub-groups consisting of cognitively normal, mild cognitive impairment, and AD subjects. The Bland-Altman analysis shows our diffusion-based prediction model has a near-zero bias with narrow 95% confidential interval compared to the ground-truth CTh in 6-36 months. In addition, our conditional diffusion model has a stochastic generative nature, therefore, we demonstrated an uncertainty analysis of patient-specific CTh prediction through multiple realizations. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.04269 [pdf, other]

Secure MIMO Communication Relying on Movable Antennas

Authors: Jun Tang, Cunhua Pan, Yang Zhang, Hong Ren, Kezhi Wang

Abstract: This paper considers a movable antenna (MA)-aided secure multiple-input multiple-output (MIMO) communication system consisting of a base station (BS), a legitimate information receiver (IR) and an eavesdropper (Eve), where the BS is equipped with MAs to enhance the system's physical layer security (PLS). Specifically, we aim to maximize the secrecy rate (SR) by jointly optimizing the transmit prec… ▽ More This paper considers a movable antenna (MA)-aided secure multiple-input multiple-output (MIMO) communication system consisting of a base station (BS), a legitimate information receiver (IR) and an eavesdropper (Eve), where the BS is equipped with MAs to enhance the system's physical layer security (PLS). Specifically, we aim to maximize the secrecy rate (SR) by jointly optimizing the transmit precoding (TPC) matrix, the artificial noise (AN) covariance matrix and the MAs' positions under the constraints of the maximum transmit power and the minimum distance between MAs. To solve this non-convex problem with highly coupled optimization variables, the block coordinate descent (BCD) method is applied to alternately update the variables. Specifically, we first reformulate the SR into a tractable form by utilizing the minimum mean square error (MMSE) method, and derive the optimal TPC matrix and the AN covariance matrix with fixed MAs' positions by applying the Lagrangian multiplier method in semi-closed forms. Then, the majorization-minimization (MM) algorithm is employed to iteratively optimize each MA's position while keeping others fixed. Finally, simulation results are provided to demonstrate the effectiveness of the proposed algorithms and the significant advantages of the MA-aided system over conventional fixed position antenna (FPA)-based system in enhancing system's security. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02942 [pdf, other]

Channel Estimation for mmWave MIMO-OFDM Systems in High-Mobility Scenarios: Instantaneous Model or Statistical Model?

Authors: Ruizhe Wang, Hong Ren, Cunhua Pan, Gui Zhou, Ruisong Weng, Jiangzhou Wang

Abstract: Classical linear statistical models, like the first-order auto-regressive (AR) model, are commonly used as channel model in high-mobility scenarios. However, compared to sub-6G, the effect of Doppler frequency shifts is more significant at millimeter wave (mmWave) frequencies, and the effectiveness of the statistical channel model in high-mobility mmWave scenarios should be reconsidered. In this p… ▽ More Classical linear statistical models, like the first-order auto-regressive (AR) model, are commonly used as channel model in high-mobility scenarios. However, compared to sub-6G, the effect of Doppler frequency shifts is more significant at millimeter wave (mmWave) frequencies, and the effectiveness of the statistical channel model in high-mobility mmWave scenarios should be reconsidered. In this paper, we investigate the channel estimation for mmWave multiple-input multiple-output-(MIMO) orthogonal frequency division multiplexing (OFDM) systems in high-mobility scenarios, with the focus on the comparison between the instantaneous channel model and the statistical channel model. For the instantaneous model, by leveraging the low-rank nature of mmWave channels and the multidimensional characteristics of MIMO-OFDM signals across space, time, and frequency, the received signals are structured as a fourth-order tensor fitting a low-rank CANDECOMP/PARAFAC (CP) model. Then, to solve the CP decomposition problem, an estimation of signal parameters via rotational invariance techniques (ESPRIT)-type decomposition based channel estimation method is proposed by exploring the Vandermonde structure of factor matrix, and the channel parameters are then estimated from the factor matrices. We analyze the uniqueness condition of the CP decomposition and develop a concise derivation of the Cramer-Rao bound (CRB) for channel parameters. Simulations show that our method outperforms the existing benchmarks. Furthermore, the results based on the wireless environment generated by Wireless InSite verify that the channel estimation based on the instantaneous channel model performs better than that based on the statistical channel model. Therefore, the instantaneous channel model is recommended for designing channel estimation algorithm for mmWave systems in high-mobility scenarios. △ Less

Submitted 27 August, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02028 [pdf, other]

Target Localization in Cooperative ISAC Systems: A Scheme Based on 5G NR OFDM Signals

Authors: Zhenkun Zhang, Hong Ren, Cunhua Pan, Sheng Hong, Dongming Wang, Jiangzhou Wang, Xiaohu You

Abstract: The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availabili… ▽ More The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availability and accuracy. This paper proposes a cooperative integrated sensing and communication (ISAC) framework by leveraging information-bearing orthogonal frequency division multiplexing (OFDM) signals transmitted by access points (APs). Specifically, we propose a two-stage scheme for target localization, where communication signals are reused as sensing reference signals based on the system information shared at the central processing unit (CPU). In Stage I, we propose a twodimensional fast Fourier transform (2D-FFT)-based algorithm to measure the ranges of scattered paths induced by targets, through the extraction of delay and Doppler information from the sensing channels between APs. Then, the target locations are estimated in Stage II based on these range measurements. Considering the potential occurrence of ill-conditioned measurements with large error during the extraction of time-frequency information, we propose an efficient algorithm to match the range measurements with the targets while eliminating ill-onditioned measurements, achieving high-accuracy target localization. In addition, based on the transmission configurations defined in the fifth generation (5G) standards, we elucidate the performance trade-offs in both communication and sensing, and extend the proposed sensing scheme for general scenarios. Finally, numerical results confirm the effectiveness of our sensing scheme and the cooperative gain of the ISAC framework. △ Less

Submitted 22 August, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.13798 [pdf, other]

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

Authors: Haobo Liu, Zhengyang Qian, Wei Wu, Hongwei Ren, Zhiwei Liu, Leibin Ni

Abstract: Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propo… ▽ More Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propose an analog-domain floating-point CIM architecture (AFPR-CIM) based on resistive random-access memory (RRAM). A novel adaptive dynamic-range FP-ADC is designed to convert the analog computation results into FP codes. Output current with high dynamic range is converted to a normalized voltage range for readout, to prevent precision loss at low power consumption. Moreover, a novel FP-DAC is also implemented which reconstructs FP digital codes into analog values to perform analog computation. The proposed AFPR-CIM architecture enables neural network acceleration with FP8 (E2M5) activation for better accuracy and energy efficiency. Evaluation results show that AFPR-CIM can achieve 19.89 TFLOPS/W energy efficiency and 1474.56 GOPS throughput. Compared to traditional FP8 accelerator, digital FP-CIM, and analog INT8-CIM, this work achieves 4.135x, 5.376x, and 2.841x energy efficiency enhancement, respectively. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted by DATE 2024

arXiv:2402.13692 [pdf, other]

Reconfigurable Intelligent Surface assisted Integrated Communication, Sensing, and Computation Systems

Authors: Jiahua Wan, Hong Ren, Zhiyuan Yu, Zhenkun Zhang, Yang Zhang, Cunhua Pan, Jiangzhou Wang

Abstract: This paper studies a mobile edge computing (MEC) assisted integrated sensing and communication (ISAC), where reconfigurable intelligent surface (RIS) is used to alleviate the attenuation of communication links during computational offloading. In this paradigm, the dual function radar and communication (DFRC)-enabled user equipments (UEs) simultaneously perform radar sensing and communication tasks… ▽ More This paper studies a mobile edge computing (MEC) assisted integrated sensing and communication (ISAC), where reconfigurable intelligent surface (RIS) is used to alleviate the attenuation of communication links during computational offloading. In this paradigm, the dual function radar and communication (DFRC)-enabled user equipments (UEs) simultaneously perform radar sensing and communication tasks, facilitating the offloading of partial computational tasks to an edge computing server (ECS) through an RIS. To satisfy the critical need for time-efficient sensing, we investigate the latency minimization problem for both single-UE and multi-UE scenarios, subject to constraints on UEs' transmit power, radar signal-to-interference-plus-noise-ratio (SINR), RIS phase shift, and computation capability. To address the formulated non-convex problem, we propose an algorithm based on the block coordinate descent (BCD) method to decouple the original problem into two subproblems, where the computational and beamforming settings are optimized alternately. Specifically, we employ the Lagrangian method for the optimization of computational settings. In addition, two equivalent transformations are applied to transform the objective function (OF) of the subproblem for active and passive beamforming into a form of weighted sum-rate. Then, a combination of minimum variance distortionless response (MVDR), successive convex approximation (SCA), majorization-minimization (MM), and weighted minimum mean-square error (WMMSE) techniques are applied to tackle this subproblem. Moreover, a closed-form solution is obtained in the simplified single UE scenario. Finally, simulation results substantiate the effectiveness of the proposed system. △ Less

Submitted 14 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13597 [pdf, other]

Near-Field Multiuser Beam-Training for Extremely Large-Scale MIMO Systems

Authors: Wang Liu, Cunhua Pan, Hong Ren, Jiangzhou Wang, Robert Schober, Lajos Hanzo

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) systems are capable of improving spectral efficiency by employing far more antennas than conventional massive MIMO at the base station (BS). However, beam training in multiuser XL-MIMO systems is challenging. To tackle these issues, we conceive a three-phase graph neural network (GNN)-based beam training scheme for multiuser XL-MIMO sy… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) systems are capable of improving spectral efficiency by employing far more antennas than conventional massive MIMO at the base station (BS). However, beam training in multiuser XL-MIMO systems is challenging. To tackle these issues, we conceive a three-phase graph neural network (GNN)-based beam training scheme for multiuser XL-MIMO systems. In the first phase, only far-field wide beams have to be tested for each user and the GNN is utilized to map the beamforming gain information of the far-field wide beams to the optimal near-field beam for each user. In addition, the proposed GNN-based scheme can exploit the position-correlation between adjacent users for further improvement of the accuracy of beam training. In the second phase, a beam allocation scheme based on the probability vectors produced at the outputs of GNNs is proposed to address the above beam-direction conflicts between users. In the third phase, the hybrid TBF is designed for further reducing the inter-user interference. Our simulation results show that the proposed scheme improves the beam training performance of the benchmarks. Moreover, the performance of the proposed beam training scheme approaches that of an exhaustive search, despite requiring only about 7% of the pilot overhead. △ Less

Submitted 25 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: submitted to IEEE

arXiv:2402.05847 [pdf, other]

Reconfigurable Intelligent Surface-Aided Dual-Function Radar and Communication Systems With MU-MIMO Communication

Authors: Yasheng Jin, Hong Ren, Cunhua Pan, Zhiyuan Yu, Ruisong Weng, Boshi Wang, Gui Zhou, Yongchao He, Maged Elkashlan

Abstract: In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem i… ▽ More In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem into an equivalent one. Then, we utilize an alternating optimization (AO) { algorithm} to decouple the optimization variables and decompose this challenging problem into two subproblems. Given reflecting coefficients, a penalty-based algorithm is utilized to deal with the non-convex radar signal-to-noise ratio (SNR) constraints. For the given beamforming matrix of the BS, we apply majorization-minimization (MM) to transform the problem into a quadratic constraint quadratic programming (QCQP) problem, which is ultimately solved using a semidefinite relaxation (SDR)-based algorithm. Simulation results illustrate the advantage of deploying RIS in the considered multi-user MIMO (MU-MIMO) ISAC systems. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04532 [pdf, other]

Joint Beamforming Design for Double Active RIS-assisted Radar-Communication Coexistence Systems

Authors: Mengyu Liu, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Kangda Zhi, Yongchao He

Abstract: Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigur… ▽ More Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigurable intelligent surface (RIS) has been introduced into RCC systems to reduce the interference. However, the "multiplicative fading" effect introduced by passive RIS limits its performance. To tackle this issue, we consider a double active RIS-assisted RCC system, which focuses on the design of the radar's beamforming vector and the active RISs' reflecting coefficient matrices, to maximize the achievable data rate of the communication system. The considered system needs to meet the radar detection constraint and the power budgets at the radar and the RISs. Since the problem is non-convex, we propose an algorithm based on the penalty dual decomposition (PDD) framework. Specifically, we initially introduce auxiliary variables to reformulate the coupled variables into equation constraints and incorporate these constraints into the objective function through the PDD framework. Then, we decouple the equivalent problem into several subproblems by invoking the block coordinate descent (BCD) method. Furthermore, we employ the Lagrange dual method to alternately optimize these subproblems. Simulation results verify the effectiveness of the proposed algorithm. Furthermore, the results also show that under the same power budget, deploying double active RISs in RCC systems can achieve higher data rate than those with single active RIS and double passive RISs. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02122 [pdf, other]

Secure Wireless Communication in Active RIS-Assisted DFRC System

Authors: Yang Zhang, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Tuo Wu, Yongchao He

Abstract: This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-r… ▽ More This work considers a dual-functional radar and communication (DFRC) system with an active reconfigurable intelligent surface (RIS) and a potential eavesdropper. Our purpose is to maximize the secrecy rate (SR) of the system by jointly designing the beamforming matrix at the DFRC base station (BS) and the reflecting coefficients at the active RIS, subject to the signal-to-interference-plus-noise-ratio (SINR) constraint of the radar echo and the power consumption constraints at the DFRC-BS and active RIS. An alternating optimization (AO) algorithm based on semi-definite relaxation (SDR) and majorizationminimization (MM) is applied to solve the SR-maximization problem by alternately optimizing the beamforming matrix and the reflecting coefficients. Specifically, we first apply the SDR and successive convex approximation (SCA) methods to transform the two subproblems into more tractable forms, then the MM method is applied to derive a concave surrogate function and iteratively solve the subproblems. Finally, simulation results indicate that the active RIS can better confront the impact of "multiplicative fading" and outperforms traditional passive RIS in terms of both secure data rate and radar sensing performance. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures

arXiv:2401.07446 [pdf, other]

Quantized RIS-aided mmWave Massive MIMO Channel Estimation with Uniform Planar Arrays

Authors: Ruizhe Wang, Hong Ren, Cunhua Pan, Shi Jin, Petar Popovski, Jiangzhou Wang

Abstract: In this paper, we investigate a cascaded channel estimation method for a millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) system aided by a reconfigurable intelligent surface (RIS) with the BS equipped with low-resolution analog-to-digital converters (ADCs), where the BS and the RIS are both equipped with a uniform planar array (UPA). Due to the sparse property of mmWave chan… ▽ More In this paper, we investigate a cascaded channel estimation method for a millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) system aided by a reconfigurable intelligent surface (RIS) with the BS equipped with low-resolution analog-to-digital converters (ADCs), where the BS and the RIS are both equipped with a uniform planar array (UPA). Due to the sparse property of mmWave channel, the channel estimation can be solved as a compressed sensing (CS) problem. However, the low-resolution quantization cause severe information loss of signals, and traditional CS algorithms are unable to work well. To recovery the signal and the sparse angular domain channel from quantization, we introduce Bayesian inference and efficient vector approximate message passing (VAMP) algorithm to solve the quantize output CS problem. To further improve the efficiency of the VAMP algorithm, a Fast Fourier Transform (FFT) based fast computation method is derived. Simulation results demonstrate the effectiveness and the accuracy of the proposed cascaded channel estimation method for the RIS-aided mmWave massive MIMO system with few-bit ADCs. Furthermore, the proposed channel estimation method can reach an acceptable performance gap between the low-resolution ADCs and the infinite ADCs for the low signal-to-noise ratio (SNR), which implies the applicability of few-bit ADCs in practice. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.05137 [pdf, other]

doi 10.1016/j.artmed.2024.102803

DISCOVER: 2-D Multiview Summarization of Optical Coherence Tomography Angiography for Automatic Diabetic Retinopathy Diagnosis

Authors: Mostafa El Habib Daho, Yihao Li, Rachid Zeghlache, Hugo Le Boité, Pierre Deman, Laurent Borderie, Hugang Ren, Niranchana Mannivanan, Capucine Lepicard, Béatrice Cochener, Aude Couturier, Ramin Tadayoni, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec

Abstract: Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality o… ▽ More Diabetic Retinopathy (DR), an ocular complication of diabetes, is a leading cause of blindness worldwide. Traditionally, DR is monitored using Color Fundus Photography (CFP), a widespread 2-D imaging modality. However, DR classifications based on CFP have poor predictive power, resulting in suboptimal DR management. Optical Coherence Tomography Angiography (OCTA) is a recent 3-D imaging modality offering enhanced structural and functional information (blood flow) with a wider field of view. This paper investigates automatic DR severity assessment using 3-D OCTA. A straightforward solution to this task is a 3-D neural network classifier. However, 3-D architectures have numerous parameters and typically require many training samples. A lighter solution consists in using 2-D neural network classifiers processing 2-D en-face (or frontal) projections and/or 2-D cross-sectional slices. Such an approach mimics the way ophthalmologists analyze OCTA acquisitions: 1) en-face flow maps are often used to detect avascular zones and neovascularization, and 2) cross-sectional slices are commonly analyzed to detect macular edemas, for instance. However, arbitrary data reduction or selection might result in information loss. Two complementary strategies are thus proposed to optimally summarize OCTA volumes with 2-D images: 1) a parametric en-face projection optimized through deep learning and 2) a cross-sectional slice selection process controlled through gradient-based attribution. The full summarization and DR classification pipeline is trained from end to end. The automatic 2-D summary can be displayed in a viewer or printed in a report to support the decision. We show that the proposed 2-D summarization and classification pipeline outperforms direct 3-D classification with the advantage of improved interpretability. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Journal ref: Artificial Intelligence in Medicine 2024, 102803

arXiv:2312.05832 [pdf, other]

Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains

Authors: Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren

Abstract: Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting i… ▽ More Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 10 pages, 6 figures

arXiv:2310.09937 [pdf, other]

Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

Authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren

Abstract: Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture… ▽ More Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the pre-processed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors, and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects, and excellent quantitative performance in terms of spectral distortion, correlation coefficient, MSE, NIQE, BRISQUE, and PIQE. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: To appear in IEEE Sensors Journal

arXiv:2308.07156 [pdf, other]

SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation

Authors: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations,… ▽ More The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compare the performance of SAM with state-of-the-art supervised models. We conduct all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, SAM struggles to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempt to fine-tune SAM using Low-rank Adaptation (LoRA) and propose SurgicalSAM, which shows the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted as Oral Presentation at MedAGI Workshop - MICCAI 2023 1st International Workshop on Foundation Models for General Medical AI. arXiv admin note: substantial text overlap with arXiv:2304.14674

arXiv:2308.02845 [pdf, other]

Landmark Detection using Transformer Toward Robot-assisted Nasal Airway Intubation

Authors: Tianhang Liu, Hechen Li, Long Bai, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long… ▽ More Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long-range dependence. However, current DeTR requires long iterations to converge, and does not perform well in detecting small objects. This paper proposes a transformer-based landmark detection solution with deformable DeTR and the semantic-aligned-matching module for detecting landmarks in robot-assisted intubation. The semantics aligner can effectively align the semantics of object queries and image features in the same embedding space using the most discriminative features. To evaluate the performance of our solution, we utilize a publicly accessible glottis dataset and automatically annotate a nostril detection dataset. The experimental results demonstrate our competitive performance in detection accuracy. Our code is publicly accessible. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: ICBIR 2023 (Best Student Paper Award). Code availability: https://github.com/ConorLTH/airway_intubation_landmarks_detection

arXiv:2307.02514 [pdf, other]

Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data

Authors: Hongmin Cai, Xiaoke Huang, Zhengliang Liu, Wenxiong Liao, Haixing Dai, Zihao Wu, Dajiang Zhu, Hui Ren, Quanzheng Li, Tianming Liu, Xiang Li

Abstract: Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach invo… ▽ More Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2307.02452 [pdf, other]

LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

Authors: Long Bai, Tong Chen, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

Abstract: Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gr… ▽ More Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gradually attracts researchers. Given the exuberant development of the denoising diffusion probabilistic model (DDPM) in computer vision, we introduce a WCE LLIE framework based on the multi-scale convolutional neural network (CNN) and reverse diffusion process. The multi-scale design allows models to preserve high-resolution representation and context information from low-resolution, while the curved wavelet attention (CWA) block is proposed for high-frequency and local feature learning. Furthermore, we combine the reverse diffusion procedure to further optimize the shallow output and generate the most realistic image. The proposed method is compared with ten state-of-the-art (SOTA) LLIE methods and significantly outperforms quantitatively and qualitatively. The superior performance on GI disease segmentation further demonstrates the clinical potential of our proposed model. Our code is publicly accessible. △ Less

Submitted 22 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/LLCaps

arXiv:2306.16285 [pdf, other]

Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

Authors: An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract: Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging… ▽ More Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: First two authors contributed equally. Accepted by IROS2023

arXiv:2306.03581 [pdf]

Optimal sizing of solar photovoltaic and lithium battery storage to reduce grid electricity reliance in buildings

Authors: Han Kun Ren, Malcolm McCulloch, David Wallom

Abstract: In alignment with the Paris Agreement, the city of Oxford in the UK aims to become carbon neutral by 2040. Renewable energy help achieve this target by reducing the reliance on carbon-intensive grid electricity. This research seeks to optimally size solar photovoltaic and lithium battery storage systems, reducing Oxford's grid electricity reliance in buildings. The analysis starts with modeling th… ▽ More In alignment with the Paris Agreement, the city of Oxford in the UK aims to become carbon neutral by 2040. Renewable energy help achieve this target by reducing the reliance on carbon-intensive grid electricity. This research seeks to optimally size solar photovoltaic and lithium battery storage systems, reducing Oxford's grid electricity reliance in buildings. The analysis starts with modeling the electricity demand. The model uses Elexon electricity settlement profiles, and assembles them into the demand profile according to the quantity and types of buildings in Oxford. Then, solar generation is modeled using Pfenninger and Staffell's method. Solar photovoltaic and lithium storage systems are sized using a hybridized analytical and iterative method. First, the method calculates the solar system size search range, then iterates through the range. At each solar size, the method calculates and iterates through the storage system size search range. Within each iteration, the renewable system is simulated using demand and generation data with a simplified system set-up and the conventional operation strategy. The method outputs combinations of solar system capacity, storage system capacity, and grid electricity import. Each combination's levelized cost of electricity is calculated, and the lowest cost combination is the optimal sizing. Solar and storage system costs are projected from 2019 to 2100, and the optimal sizing is calculated for each year. The result shows that solar photovoltaic is economically competitive, but lithium storage cost is still too high. As solar and storage prices continue to drop, they will take up greater portions of the energy system. However, there will always be a need for the grid, as it provides flexibility and can meet demands that are too costly for solar and storage △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 10 pages, 8 figures, published in the conference of ECEEE 2022 Summer Study on energy efficiency: agents of change

Report number: 8-096-22

Journal ref: ECEEE 2022 Summer Study on energy efficiency: agents of change, (2022), 1199-1208, ECEEE

arXiv:2306.03511 [pdf, other]

Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical Image Segmentation

Authors: An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract: Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance ac… ▽ More Accurate and robust medical image segmentation is fundamental and crucial for enhancing the autonomy of computer-aided diagnosis and intervention systems. Medical data collection normally involves different scanners, protocols, and populations, making domain adaptation (DA) a highly demanding research field to alleviate model degradation in the deployment site. To preserve the model performance across multiple testing domains, this work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation. In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift in the deployment phase, where the higher the shift is, the harder to recognize the variance. Considering this, we progressively introduce more amplitude information from the target domain to the source domain in the frequency space during the curriculum-style training to smoothly schedule the semantic knowledge transfer in an easier-to-harder manner. Besides, we incorporate the training-time chained augmentation mixing to help expand the data distributions while preserving the domain-invariant semantics, which is beneficial for the acquired model to be more robust and generalize better to unseen domains. Extensive experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance. Meanwhile, our approach proves to be more robust under various corruption types and increasing severity levels. In addition, we show our method is also beneficial in the domain-adaptive classification task with skin lesion datasets. The code is available at https://github.com/lofrienger/Curri-AFDA. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: Work under review. First three authors contributed equally

arXiv:2306.00451 [pdf, other]

S$^2$ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for Scribble-supervised Polyp Segmentation

Authors: An Wang, Mengya Xu, Yang Zhang, Mobarakol Islam, Hongliang Ren

Abstract: Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues… ▽ More Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues, including data shifts and corruption, put forward further requests for model generalization and robustness. To address these concerns, we design a framework of Spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning (S$^2$ME). Concretely, for the first time in weakly-supervised medical image segmentation, we promote the dual-branch co-teaching framework by leveraging the intrinsic complementarity of features extracted from the spatial and spectral domains and encouraging cross-space consistency through collaborative optimization. Furthermore, to produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning, we introduce a novel adaptive pixel-wise fusion technique based on the entropy guidance from the spatial and spectral branches. Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels and surpasses previous alternatives in terms of efficacy. Ultimately, we formulate a holistic optimization objective to learn from the hybrid supervision of scribbles and pseudo labels. Extensive experiments and evaluation on four public datasets demonstrate the superiority of our method regarding in-distribution accuracy, out-of-distribution generalization, and robustness, highlighting its promising clinical significance. Our code is available at https://github.com/lofrienger/S2ME. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: MICCAI 2023 Early Acceptance

arXiv:2305.11686 [pdf, other]

Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual envi… ▽ More Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual environment to assist the training process. Specifically, this work introduces a virtual dataset generated by the Simulation Open Framework Architecture (SOFA) framework to overcome the limited availability of actual endoscopic images. We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer techniques to address discrepancies between datasets. Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models, improving segmentation accuracy and training stability. In the practical application, the trained segmentation model holds great promise for robot-assisted intubation surgery and intelligent surgical navigation. △ Less

Submitted 27 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms). arXiv admin note: text overlap with arXiv:2305.10883

arXiv:2305.10883 [pdf, other]

Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real d… ▽ More Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability. △ Less

Submitted 27 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release

arXiv:2304.14674 [pdf, other]

SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective

Authors: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren

Abstract: Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corrupt… ▽ More Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Work under active progress

arXiv:2304.09974 [pdf, other]

SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery

Authors: Lalithkumar Seenivasan, Mobarakol Islam, Gokul Kannan, Hongliang Ren

Abstract: Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional atten… ▽ More Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional attention or models employing fusion techniques are often employed to capture the context of multiple modalities all at once. As GPT does not natively process vision tokens, to exploit the advancements in GPT models for VQA in robotic surgery, we design an end-to-end trainable Language-Vision GPT (LV-GPT) model that expands the GPT2 model to include vision input (image). The proposed LV-GPT incorporates a feature extractor (vision tokenizer) and vision token embedding (token type and pose). Given the limitations of unidirectional attention in GPT models and their ability to generate coherent long paragraphs, we carefully sequence the word tokens before vision tokens, mimicking the human thought process of understanding the question to infer an answer from an image. Quantitatively, we prove that the LV-GPT model outperforms other state-of-the-art VQA models on two publically available surgical-VQA datasets (based on endoscopic vision challenge robotic scene segmentation 2018 and CholecTriplet2021) and on our newly annotated dataset (based on the holistic surgical scene dataset). We further annotate all three datasets to include question-type annotations to allow sub-type analysis. Furthermore, we extensively study and present the effects of token sequencing, token type and pose embedding for vision tokens in the LV-GPT model. △ Less

Submitted 22 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: The manuscript is accepted in MICCAI 2023. Code are available at: https://github.com/lalithjets/SurgicalGPT

arXiv:2304.00172 [pdf, other]

Performance Analysis and Low-Complexity Design for XL-MIMO with Near-Field Spatial Non-Stationarities

Authors: Kangda Zhi, Cunhua Pan, Hong Ren, Kok Keong Chai, Cheng-Xiang Wang, Robert Schober, Xiaohu You

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is capable of supporting extremely high system capacities with large numbers of users. In this work, we build a framework for the analysis and low-complexity design of XL-MIMO in the near field with spatial non-stationarities. Specifically, we first analyze the theoretical performance of discrete-aperture XL-MIMO using an electromagnet… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) is capable of supporting extremely high system capacities with large numbers of users. In this work, we build a framework for the analysis and low-complexity design of XL-MIMO in the near field with spatial non-stationarities. Specifically, we first analyze the theoretical performance of discrete-aperture XL-MIMO using an electromagnetic (EM) channel model based on the near-field spherical wavefront. We analytically reveal the impact of the discrete aperture and polarization mismatch on the received power. We also complement the classical Fraunhofer distance based on the considered EM channel model. Our analytical results indicate that a limited part of the XL-array receives the majority of the signal power in the near field, which leads to a notion of visibility region (VR) of a user. Thus, we propose a VR detection algorithm and leverage the acquired VR information to devise a low-complexity symbol detection scheme. Furthermore, we propose a graph theory-based user partition algorithm, relying on the VR overlap ratio between different users. Partial zero-forcing (PZF) is utilized to eliminate only the interference from users allocated to the same group, which further reduces computational complexity in matrix inversion. Numerical results confirm the correctness of the analytical results and the effectiveness of the proposed algorithms. It reveals that our algorithms approach the performance of conventional whole array (WA)-based designs but with much lower complexity. △ Less

Submitted 12 October, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

Comments: 18 pages. Accepted by IEEE JSAC

Showing 1–50 of 139 results for author: Ren, H