Search | arXiv e-print repository

Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

arXiv:2408.12354 [pdf, other]

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

Authors: Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Lirong Dai

Abstract: Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from low efficiency caused by a mass of inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion mo… ▽ More Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from low efficiency caused by a mass of inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion model (LDM) to accelerate inference speed. We achieved one-step or few-step inference while maintaining the high performance by distilling a pre-trained LDM based SVC model, which had the advantages of timbre decoupling and sound quality. Experimental results show that our proposed method can significantly reduce the inference time and largely preserve the sound quality and timbre similarity comparing with other state-of-the-art SVC models. Audio samples are available at https://sounddemos.github.io/lcm-svc. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Accepted to ISCSLP 2024. arXiv admin note: text overlap with arXiv:2406.05325

arXiv:2408.10636 [pdf]

UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its effectiveness in DR screening. A total of 18,321 UWF-FA images of different phases were registered with corresponding UWF-RI images and fed into a generative adversarial networks (GAN)-based model for training. The quality of generated UWF-FA images was evaluated through quantitative metrics and human evaluation. The DeepDRiD dataset was used to externally assess the contribution of generated UWF-FA images to DR classification, using area under the receiver operating characteristic curve (AUROC) as outcome metrics. The generated early, mid, and late phase UWF-FA images achieved high authenticity, with multi-scale similarity scores ranging from 0.70 to 0.91 and qualitative visual scores ranging from 1.64 to 1.98 (1=real UWF-FA quality). In fifty randomly selected images, 56% to 76% of the generated images were difficult to distinguish from real images in the Turing test. Moreover, adding these generated UWF-FA images for DR classification significantly increased the AUROC from 0.869 to 0.904 compared to the baseline model using UWF-RI images (P < .001). The model successfully generates realistic multi-frame UWF-FA images for enhancing DR stratification without intravenous dye injection. △ Less

Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: 22 pages, 2 figures

arXiv:2408.06254 [pdf, other]

Data-Efficient Prediction of Minimum Operating Voltage via Inter- and Intra-Wafer Variation Alignment

Authors: Yuxuan Yin, Rebecca Chen, Chen He, Peng Li

Abstract: Predicting the minimum operating voltage ($V_{min}$) of chips stands as a crucial technique in enhancing the speed and reliability of manufacturing testing flow. However, existing $V_{min}$ prediction methods often overlook various sources of variations in both training and deployment phases. Notably, the neglect of wafer zone-to-zone (intra-wafer) variations and wafer-to-wafer (inter-wafer) varia… ▽ More Predicting the minimum operating voltage ($V_{min}$) of chips stands as a crucial technique in enhancing the speed and reliability of manufacturing testing flow. However, existing $V_{min}$ prediction methods often overlook various sources of variations in both training and deployment phases. Notably, the neglect of wafer zone-to-zone (intra-wafer) variations and wafer-to-wafer (inter-wafer) variations, compounded by process variations, diminishes the accuracy, data efficiency, and reliability of $V_{min}$ predictors. To address this gap, we introduce a novel data-efficient $V_{min}$ prediction flow, termed restricted bias alignment (RBA), which incorporates a novel variation alignment technique. Our approach concurrently estimates inter- and intra-wafer variations. Furthermore, we propose utilizing class probe data to model inter-wafer variations for the first time. We empirically demonstrate RBA's effectiveness and data efficiency on an industrial 16nm automotive chip dataset. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.02859 [pdf, other]

Multistain Pretraining for Slide Representation Learning in Pathology

Authors: Guillaume Jaume, Anurag Vaidya, Andrew Zhang, Andrew H. Song, Richard J. Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long Phi Le, Faisal Mahmood

Abstract: Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learnin… ▽ More Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: ECCV'24

arXiv:2407.21490 [pdf, other]

Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

Authors: Junxuan Yu, Rusi Chen, Yongsong Zhou, Yanlin Chen, Yaofei Duan, Yuhao Huang, Han Zhou, Tan Tao, Xin Yang, Dong Ni

Abstract: Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specif… ▽ More Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specific cardiac structures. In this context, we propose an explainable and controllable method for echocardiography video generation, taking an initial frame and a motion curve as guidance. Our contributions are three-fold. First, we extract motion information from each heart substructure to construct motion curves, enabling the diffusion model to synthesize customized echocardiography videos by modifying these curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves across cardiac structures. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information. Extensive experiments on three echocardiography datasets show that our method outperforms others regarding fidelity and consistency. The full code will be released at https://github.com/mlmi-2024-72/ECM. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: Accepted by MICCAI MLMI 2024

arXiv:2407.21479 [pdf, ps, other]

doi 10.1109/LWC.2024.3360053

Air-to-Ground Cooperative OAM Communications

Authors: Ruirui Chen, Yu Ding, Beibei Zhang, Song Li, Liping Liang

Abstract: For users in hotspot region, orbital angular momentum (OAM) can realize multifold increase of spectrum efficiency (SE), and the flying base station (FBS) can rapidly support the real-time communication demand. However, the hollow divergence and alignment requirement impose crucial challenges for users to achieve air-to-ground OAM communications, where there exists the line-of-sight path. Therefore… ▽ More For users in hotspot region, orbital angular momentum (OAM) can realize multifold increase of spectrum efficiency (SE), and the flying base station (FBS) can rapidly support the real-time communication demand. However, the hollow divergence and alignment requirement impose crucial challenges for users to achieve air-to-ground OAM communications, where there exists the line-of-sight path. Therefore, we propose the air-to-ground cooperative OAM communication (ACOC) scheme, which can realize OAM communications for users with size-limited devices. The waist radius is adjusted to guarantee the maximum intensity at the cooperative users (CUs). We derive the closed-form expression of the optimal FBS position, which satisfies the antenna alignment for two cooperative user groups (CUGs). Furthermore, the selection constraint is given to choose two CUGs composed of four CUs. Simulation results are provided to validate the optimal FBS position and the SE superiority of the proposed ACOC scheme. △ Less

Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Journal ref: IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 13, NO. 4, APRIL 2024

arXiv:2407.21478 [pdf, ps, other]

doi 10.1109/TBC.2023.3275363

Precoding Based Downlink OAM-MIMO Communications with Rate Splitting

Authors: Ruirui Chen, Jinyang Lin, Beibei Zhang, Yu Ding, Keyue Xu

Abstract: Orbital angular momentum (OAM) and rate splitting (RS) are the potential key techniques for the future wireless communications. As a new orthogonal resource, OAM can achieve the multifold increase of spectrum efficiency to relieve the scarcity of the spectrum resource, but how to enhance the privacy performance imposes crucial challenge for OAM communications. RS technique divides the information… ▽ More Orbital angular momentum (OAM) and rate splitting (RS) are the potential key techniques for the future wireless communications. As a new orthogonal resource, OAM can achieve the multifold increase of spectrum efficiency to relieve the scarcity of the spectrum resource, but how to enhance the privacy performance imposes crucial challenge for OAM communications. RS technique divides the information into private and common parts, which can guarantee the privacies for all users. In this paper, we integrate the RS technique into downlink OAM-MIMO communications, and study the precoding optimization to maximize the sum capacity. First, the concentric uniform circular arrays (UCAs) are utilized to construct the downlink transmission framework of OAM-MIMO communications with RS. Particularly, users in the same user pair utilize RS technique to obtain the information and different user pairs use different OAM modes. Then, we derive the OAM-MIMO channel model, and formulate the sum capacity maximization problem. Finally, based on the fractional programming, the optimal precoding matrix is obtained to maximize the sum capacity by using quadratic transformation. Extensive simulation results show that by using the proposed precoding optimization algorithm, OAM-MIMO communications with RS can achieve higher sum capacity than the traditional communication schemes. △ Less

Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Journal ref: IEEE TRANSACTIONS ON BROADCASTING, VOL. 69, NO. 4, DECEMBER 2023

arXiv:2407.21444 [pdf, ps, other]

doi 10.1109/TVT.2023.3309034

Cooperative Orbital Angular Momentum Wireless Communications

Authors: Ruirui Chen, Wenchi Cheng, Jinyang Lin, Liping Liang

Abstract: Orbital angular momentum (OAM) mode multiplexing has the potential to achieve high spectrum-efficiency communications at the same time and frequency by using orthogonal mode resource. However, the vortex wave hollow divergence characteristic results in the requirement of the large-scale receive antenna, which makes users hardly receive the OAM signal by size-limited equipment. To promote the OAM a… ▽ More Orbital angular momentum (OAM) mode multiplexing has the potential to achieve high spectrum-efficiency communications at the same time and frequency by using orthogonal mode resource. However, the vortex wave hollow divergence characteristic results in the requirement of the large-scale receive antenna, which makes users hardly receive the OAM signal by size-limited equipment. To promote the OAM application in the next 6G communications, this paper proposes the cooperative OAM wireless (COW) communication scheme, which can select the cooperative users (CUs) to form the aligned antennas by size-limited user equipment. First, we derive the feasible radial radius and selective waist radius to choose the CUs in the same circle with the origin at the base station. Then, based on the locations of CUs, the waist radius is adjusted to form the receive antennas and ensure the maximum intensity for the CUs. Finally, the cooperative formation probability is derived in the closed-form solution, which can depict the feasibility of the proposed COW communication scheme. Furthermore, OAM beam steering is used to expand the feasible CU region, thus achieving higher cooperative formation probability. Simulation results demonstrate that the derived cooperative formation probability in mathematical analysis is very close to the statistical probability of cooperative formation, and the proposed COW communication scheme can obtain higher spectrum efficiency than the traditional scheme due to the effective reception of the OAM signal. △ Less

Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 73, NO. 1, JANUARY 2024

arXiv:2407.14121 [pdf, other]

Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection

Authors: Ran Chen, Zeren Zhang, Jinwen Ma

Abstract: Seismic fault detection holds significant geographical and practical application value, aiding experts in subsurface structure interpretation and resource exploration. Despite some progress made by automated methods based on deep learning, research in the seismic domain faces significant challenges, particularly because it is difficult to obtain high-quality, large-scale, open-source, and diverse… ▽ More Seismic fault detection holds significant geographical and practical application value, aiding experts in subsurface structure interpretation and resource exploration. Despite some progress made by automated methods based on deep learning, research in the seismic domain faces significant challenges, particularly because it is difficult to obtain high-quality, large-scale, open-source, and diverse datasets, which hinders the development of general foundation models. Therefore, this paper proposes Seismic Fault SAM, which, for the first time, applies the general pre-training foundation model-Segment Anything Model (SAM)-to seismic fault interpretation. This method aligns the universal knowledge learned from a vast amount of images with the seismic domain tasks through an Adapter design. Specifically, our innovative points include designing lightweight Adapter modules, freezing most of the pre-training weights, and only updating a small number of parameters to allow the model to converge quickly and effectively learn fault features; combining 2.5D input strategy to capture 3D spatial patterns with 2D models; integrating geological constraints into the model through prior-based data augmentation techniques to enhance the model's generalization capability. Experimental results on the largest publicly available seismic dataset, Thebe, show that our method surpasses existing 3D models on both OIS and ODS metrics, achieving state-of-the-art performance and providing an effective extension scheme for other seismic domain downstream tasks that lack labeled data. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.11322 [pdf, ps, other]

Reconfigurable-Intelligent-Surface Assisted Orbital-Angular-Momentum Secure Communications

Authors: Minmin Wang, Liping Liang, Wenchi Cheng, Wei Zhang, Ruirui Chen, Hailin Zhang

Abstract: As a kind of wavefront with helical phase, orbital angular momentum (OAM) shows the great potential to enhance the security results of wireless communications due to its unique orthogonality and central hollow electromagnetic wave structure. Therefore, in this paper we propose the reconfigurable-intelligent-surface (RIS) assisted OAM scheme, where RIS is deployed to weaken the information acquisit… ▽ More As a kind of wavefront with helical phase, orbital angular momentum (OAM) shows the great potential to enhance the security results of wireless communications due to its unique orthogonality and central hollow electromagnetic wave structure. Therefore, in this paper we propose the reconfigurable-intelligent-surface (RIS) assisted OAM scheme, where RIS is deployed to weaken the information acquisition at eavesdroppers by adjusting the OAM beams pointed to the eavesdropper and artificial noise (AN) is applied to interfere with the eavesdropper, thus significantly increasing the secrecy rates of short-range secure communications. Aiming at obtaining the maximum secrecy rate, we develop the Riemannian manifold conjugate gradient (RMCG) based alternative optimization (AO) algorithm to assign much power to low-order OAM-modes and optimize the OAM beams direction with the programmable RIS, thus respectively enhancing and weakening the received signal strength at the legitimate receiver and the eavesdropper. Numerical results show that our proposed scheme outperforms the existing works in terms of the secrecy rate and the eavesdropper's bit error rate. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2406.05799

arXiv:2407.07464 [pdf, other]

Video-to-Audio Generation with Hidden Alignment

Authors: Manjie Xu, Chenxing Li, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu

Abstract: Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techni… ▽ More Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techniques. Beginning with a foundational model VTA-LDM built on a simple yet surprisingly effective intuition, we explore various vision encoders and auxiliary embeddings through ablation studies. Employing a comprehensive evaluation pipeline that emphasizes generation quality and video-audio synchronization alignment, we demonstrate that our model exhibits state-of-the-art video-to-audio generation capabilities. Furthermore, we provide critical insights into the impact of different data augmentation methods on enhancing the generation framework's overall capacity. We showcase possibilities to advance the challenge of generating synchronized audio from semantic and temporal perspectives. We hope these insights will serve as a stepping stone toward developing more realistic and accurate audio-visual generation models. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: https://sites.google.com/view/vta-ldm

arXiv:2406.18536 [pdf, other]

Reliable Interval Prediction of Minimum Operating Voltage Based on On-chip Monitors via Conformalized Quantile Regression

Authors: Yuxuan Yin, Xiaoxiao Wang, Rebecca Chen, Chen He, Peng Li

Abstract: Predicting the minimum operating voltage ($V_{min}$) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current $V_{min}$ prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertaintie… ▽ More Predicting the minimum operating voltage ($V_{min}$) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current $V_{min}$ prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free $V_{min}$ interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for $V_{min}$ prediction. △ Less

Submitted 3 May, 2024; originally announced June 2024.

Comments: Accepted by DATE 2024. Camera-ready version

arXiv:2406.15232 [pdf, ps, other]

Damping Wind Farm Resonances with Current Based Model Predictive Pulse Pattern Control

Authors: Orcun Karaca, Ioannis Tsoumas, Tinus Dorfling, Ran Chen, Lennart Harnefors

Abstract: It is well-established that a proportional current control gain emulates a resistor in the converter output impedance. Even though this resistance can provide additional damping to grid resonances, its effect for traditional linear current controllers is known to be rather limited. Moreover, for medium-voltage systems, high switching frequencies are not an option due to the high switching losses.… ▽ More It is well-established that a proportional current control gain emulates a resistor in the converter output impedance. Even though this resistance can provide additional damping to grid resonances, its effect for traditional linear current controllers is known to be rather limited. Moreover, for medium-voltage systems, high switching frequencies are not an option due to the high switching losses. To meet the harmonic standards, it is expedient to use optimized pulse patterns. This further exacerbates the problems with the resistance of classical controllers, since an additional filtering would be required so that the current controller acts only on the fundamental component (and not on the ripple component). Such a design limits the damping effect not only in its amplitude but also in the frequency range where it is active. This paper shows that a high-bandwidth current-based model predictive pulse pattern controller can alleviate these limitations. The pulse pattern control approach can achieve a high gain even at low switching frequencies, while controlling directly the instantaneous currents (i.e., the fundamental component and the ripple together). With a fast implementation cycle, the frequency range where this damping effect is active can be further extended. Numerical studies showcase these benefits for a multi-phase medium-voltage wind power conversion system. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 16 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.11175 [pdf, other]

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, termed SMRU, to cover different application scenarios. The novelty lies in two-fold. First, a multi-scale band split layer and band merge layer are proposed to effectively fuse local frequency bands for lower complexity modeling. Besides, by simulating the multi-resolution feature modeling characteristic of the classical UNet structure, a novel recurrent-dominated UNet is devised. It consists of multiple variable frame rate blocks, each of which involves the causal time down-/up-sampling layer with varying compression ratios and the dual-path structure for inter- and intra-band modeling. The model is configured from 50 M/s to 6.8 G/s in terms of MACs, and the experimental results show that the proposed approach yields competitive or even better performance over existing baselines, and has the full potential to adapt to more general scenarios with varying complexity requirements. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.05325 [pdf, other]

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Authors: Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, Lirong Dai

Abstract: Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusi… ▽ More Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusion model for SVC (LDM-SVC) in this work, which attempts to perform SVC in the latent space using an LDM. We pretrain a variational autoencoder structure using the noted open-source So-VITS-SVC project based on the VITS framework, which is then used for the LDM training. Besides, we propose a singer guidance training method based on classifier-free guidance to further suppress the timbre of the original singer. Experimental results show the superiority of the proposed method over previous works in both subjective and objective evaluations of timbre similarity. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.03882 [pdf, other]

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications. △ Less

Submitted 9 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.01993 [pdf]

Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via Human-in-the-Loop Labeling

Authors: Ruoyu Chen, Ziwei Zhao, Mayinuer Yusufu, Xianwen Shang, Danli Shi, Mingguang He

Abstract: Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the st… ▽ More Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the study aims to develop a high-precision choroidal vessel segmentation model with limited labor using HITL framework. We utilized a multi-source ICGA dataset, including 55 degree view and ultra-widefield ICGA (UWF-ICGA) images for model development. The choroidal vessel network was pre-segmented by a pre-trained vessel segmentation model, and then manually modified by two ophthalmologists. Choroidal vascular diameter, density, complexity, tortuosity, and branching angle were automatically quantified based on the segmentation. We finally conducted four cycles of HITL. One hundred and fifty 55 degree view ICGA images were used for the first three cycles (50 images per cycle), and twenty UWF-ICGA images for the last cycle. The average time needed to manually correct a pre-segmented ICGA image per cycle reduced from 20 minutes to 1 minute. High segmentation accuracy has been achieved on both 55 degree view ICGA and UWF-ICGA images. Additionally, the multi-dimensional choroidal vascular parameters were significantly associated with various chorioretinal diseases. Our study not only demonstrated the feasibility of the HITL strategy in improving segmentation performance with reduced manual labeling, but also innovatively introduced several risk predictors for choroidal abnormalities. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 25 pages,4 figures

arXiv:2405.11380 [pdf, other]

Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills

Authors: Tianhao Wei, Liqian Ma, Rui Chen, Weiye Zhao, Changliu Liu

Abstract: The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development… ▽ More The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development of a universal robotic foundation model. In this work, we propose Meta-Control, the first LLM-enabled automatic control synthesis approach that creates customized state representations and control strategies tailored to specific tasks. Our core insight is that a meta-control system can be built to automate the thought process that human experts use to design control systems. Specifically, human experts heavily use a model-based, hierarchical (from abstract to concrete) thought model, then compose various dynamic models and controllers together to form a control system. Meta-Control mimics the thought model and harnesses LLM's extensive control knowledge with Socrates' "art of midwifery" to automate the thought process. Meta-Control stands out for its fully model-based nature, allowing rigorous analysis, generalizability, robustness, efficient parameter tuning, and reliable real-time execution. △ Less

Submitted 7 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2403.14968 [pdf, other]

Real-time Safety Index Adaptation for Parameter-varying Systems via Determinant Gradient Ascend

Authors: Rui Chen, Weiye Zhao, Ruixuan Liu, Weiyang Zhang, Changliu Liu

Abstract: Safety Index Synthesis (SIS) is critical for deriving safe control laws. Recent works propose to synthesize a safety index (SI) via nonlinear programming and derive a safe control law such that the system 1) achieves forward invariant (FI) with some safe set and 2) guarantees finite time convergence (FTC) to that safe set. However, real-world system dynamics can vary during run-time, making the co… ▽ More Safety Index Synthesis (SIS) is critical for deriving safe control laws. Recent works propose to synthesize a safety index (SI) via nonlinear programming and derive a safe control law such that the system 1) achieves forward invariant (FI) with some safe set and 2) guarantees finite time convergence (FTC) to that safe set. However, real-world system dynamics can vary during run-time, making the control law infeasible and invalidating the initial SI. Since the full SIS nonlinear programming is computationally expensive, it is infeasible to re-synthesize the SI each time the dynamics are perturbed. To address that, this paper proposes an efficient approach to adapting the SI to varying system dynamics and maintaining the feasibility of the safe control law. The proposed method leverages determinant gradient ascend and derives a closed-form update to safety index parameters, enabling real-time adaptation performance. A numerical study validates the effectiveness of our approach. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted to American Control Conference (ACC) 2024

arXiv:2403.06066 [pdf]

CausalCellSegmenter: Causal Inference inspired Diversified Aggregation Convolution for Pathology Image Segmentation

Authors: Dawei Fan, Yifan Gao, Jiaming Yu, Yanping Chen, Wencheng Li, Chuancong Lin, Kaibin Li, Changcai Yang, Riqing Chen, Lifang Wei

Abstract: Deep learning models have shown promising performance for cell nucleus segmentation in the field of pathology image analysis. However, training a robust model from multiple domains remains a great challenge for cell nucleus segmentation. Additionally, the shortcomings of background noise, highly overlapping between cell nucleus, and blurred edges often lead to poor performance. To address these ch… ▽ More Deep learning models have shown promising performance for cell nucleus segmentation in the field of pathology image analysis. However, training a robust model from multiple domains remains a great challenge for cell nucleus segmentation. Additionally, the shortcomings of background noise, highly overlapping between cell nucleus, and blurred edges often lead to poor performance. To address these challenges, we propose a novel framework termed CausalCellSegmenter, which combines Causal Inference Module (CIM) with Diversified Aggregation Convolution (DAC) techniques. The DAC module is designed which incorporates diverse downsampling features through a simple, parameter-free attention module (SimAM), aiming to overcome the problems of false-positive identification and edge blurring. Furthermore, we introduce CIM to leverage sample weighting by directly removing the spurious correlations between features for every input sample and concentrating more on the correlation between features and labels. Extensive experiments on the MoNuSeg-2018 dataset achieves promising results, outperforming other state-of-the-art methods, where the mIoU and DSC scores growing by 3.6% and 2.65%. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 10 pages, 5 figures, 2 tables, MICCAI

arXiv:2402.09735 [pdf, other]

DFORM: Diffeomorphic vector field alignment for assessing dynamics across learned models

Authors: Ruiqi Chen, Giacomo Vedovati, Todd Braver, ShiNung Ching

Abstract: Dynamical system models such as Recurrent Neural Networks (RNNs) have become increasingly popular as hypothesis-generating tools in scientific research. Evaluating the dynamics in such networks is key to understanding their learned generative mechanisms. However, comparison of learned dynamics across models is challenging due to their inherent nonlinearity and because a priori there is no enforced… ▽ More Dynamical system models such as Recurrent Neural Networks (RNNs) have become increasingly popular as hypothesis-generating tools in scientific research. Evaluating the dynamics in such networks is key to understanding their learned generative mechanisms. However, comparison of learned dynamics across models is challenging due to their inherent nonlinearity and because a priori there is no enforced equivalence of their coordinate systems. Here, we propose the DFORM (Diffeomorphic vector field alignment for comparing dynamics across learned models) framework. DFORM learns a nonlinear coordinate transformation which provides a continuous, maximally one-to-one mapping between the trajectories of learned models, thus approximating a diffeomorphism between them. The mismatch between DFORM-transformed vector fields defines the orbital similarity between two models, thus providing a generalization of the concepts of smooth orbital and topological equivalence. As an example, we apply DFORM to models trained on a canonical neuroscience task, showing that learned dynamics may be functionally similar, despite overt differences in attractor landscapes. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 12 pages, 8 figures

arXiv:2312.09439 [pdf, other]

Smart Roads: Roadside Perception, Vehicle-Road Cooperation and Business Model

Authors: Rui Chen, Lu Gao, Yutian Liu, Yong Liang Guan, Yan Zhang

Abstract: Smart roads have become an essential component of intelligent transportation systems (ITS). The roadside perception technology, a critical aspect of smart roads, utilizes various sensors, roadside units (RSUs), and edge computing devices to gather real-time traffic data for vehicle-road cooperation. However, the full potential of smart roads in improving the safety and efficiency of autonomous veh… ▽ More Smart roads have become an essential component of intelligent transportation systems (ITS). The roadside perception technology, a critical aspect of smart roads, utilizes various sensors, roadside units (RSUs), and edge computing devices to gather real-time traffic data for vehicle-road cooperation. However, the full potential of smart roads in improving the safety and efficiency of autonomous vehicles only can be realized through the mass deployment of roadside perception and communication devices. On the one hand, roadside devices require significant investment but can only achieve monitoring function currently, resulting in no profitability for investors. On the other hand, drivers lack trust in the safety of autonomous driving technology, making it difficult to promote large-scale commercial applications. To deal with the dilemma of mass deployment, we propose a novel smart-road vehicle-guiding architecture for vehicle-road cooperative autonomous driving, based on which we then propose the corresponding business model and analyze its benefits from both operator and driver perspectives. The numerical simulations validate that our proposed smart road solution can enhance driving safety and traffic efficiency. Moreover, we utilize the cost-benefit analysis (CBA) model to assess the economic advantages of the proposed business model which indicates that the smart highway that can provide vehicle-guided-driving services for autonomous vehicles yields more profit than the regular highway. △ Less

Submitted 19 October, 2023; originally announced December 2023.

arXiv:2312.07864 [pdf, other]

MMSE Design of RIS-aided Communications

Authors: Wen-Xuan Long, Marco Moretti, Andrea Abrardo, Luca Sanguinetti, Rui Chen

Abstract: Consider a communication system in which a single antenna user equipment exchanges information with a multi-antenna base station via a reconfigurable intelligent surface (RIS) in the presence of spatially correlated channels and electromagnetic interference (EMI). To exploit the attractive advantages of RIS technology, accurate configuration of its reflecting elements is crucial. In this paper, we… ▽ More Consider a communication system in which a single antenna user equipment exchanges information with a multi-antenna base station via a reconfigurable intelligent surface (RIS) in the presence of spatially correlated channels and electromagnetic interference (EMI). To exploit the attractive advantages of RIS technology, accurate configuration of its reflecting elements is crucial. In this paper, we use statistical knowledge of channels and EMI to optimize the RIS elements for i) accurate channel estimation and ii) reliable data transmission. In both cases, our goal is to determine the RIS coefficients that minimize the mean square error, resulting in the formulation of two non-convex problems that share the same structure. To solve these two problems, we present an alternating optimization approach that reliably converges to a locally optimal solution. The incorporation of the diagonally scaled steepest descent algorithm, derived from Newton's method, ensures fast convergence with manageable complexity. Numerical results demonstrate the effectiveness of the proposed method under various propagation conditions. Notably, it shows significant advantages over existing alternatives that depend on a sub-optimal configuration of the RIS and are derived on the basis of different criteria. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 13 pages, 10 figures

arXiv:2311.14515 [pdf, other]

On RIS-Aided SIMO Gaussian Channels: Towards A Single-RF MIMO Transceiver Architecture

Authors: Ru-Han Chen, Jing Zhou, Yonggang Zhu, Kai Zhang

Abstract: In this paper, for a single-input multiple-output (SIMO) system aided by a passive reconfigurable intelligent surface (RIS), the joint transmission accomplished by the single transmit antenna and the RIS with multiple controllable reflective elements is considered. Relying on a general capacity upper bound derived by a maximum-trace argument, we respectively characterize the capacity of such \rev{… ▽ More In this paper, for a single-input multiple-output (SIMO) system aided by a passive reconfigurable intelligent surface (RIS), the joint transmission accomplished by the single transmit antenna and the RIS with multiple controllable reflective elements is considered. Relying on a general capacity upper bound derived by a maximum-trace argument, we respectively characterize the capacity of such \rev{a} channel in the low-SNR or the rank-one regimes, in which the optimal configuration of the RIS is proved to be beamforming with carefully-chosen phase shifts. To exploit the potential of modulating extra information on the RIS, based on the QR decomposition, successive interference cancellation, and a strategy named \textit{partially beamforming and partially information-carrying}, we propose a novel transceiver architecture with only a single RF front end at the transmitter, by which the considered channel can be regarded as a concatenation of a vector Gaussian channel and several phase-modulated channels. Especially, we investigate a class of vector Gaussian channels with a hypersphere input support constraint, and not only generalize the existing result to arbitrary-dimensional real spaces but also present its high-order capacity asymptotics, by which both capacities of hypersphere-constrained channels and achievable rates of the proposed transceiver with two different signaling schemes can be well-approximated. Information-theoretic analyses show that the transceiver architecture designed for the SIMO channel has a boosted multiplexing gain, rather than one for the conventionally-used optimized beamforming scheme.Numerical results verify our derived asymptotics and show notable superiority of the proposed transceiver. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: A Shortened version is submitted to IEEE journal

arXiv:2311.09019 [pdf, ps, other]

Closed-Loop Identification of Stabilized Models Using Dual Input-Output Parameterization

Authors: Ran Chen, Amber Srivastava, Mingzhou Yin, Roy S. Smith

Abstract: This paper introduces a dual input-output parameterization (dual IOP) for the identification of linear time-invariant systems from closed-loop data. It draws inspiration from the recent input-output parameterization developed to synthesize a stabilizing controller. The controller is parameterized in terms of closed-loop transfer functions, from the external disturbances to the input and output of… ▽ More This paper introduces a dual input-output parameterization (dual IOP) for the identification of linear time-invariant systems from closed-loop data. It draws inspiration from the recent input-output parameterization developed to synthesize a stabilizing controller. The controller is parameterized in terms of closed-loop transfer functions, from the external disturbances to the input and output of the system, constrained to lie in a given subspace. Analogously, the dual IOP method parameterizes the unknown plant with analogous closed-loop transfer functions, also referred to as dual parameters. In this case, these closed-loop transfer functions are constrained to lie in an affine subspace guaranteeing that the identified plant is \emph{stabilized} by the known controller. Compared with existing closed-loop identification techniques guaranteeing closed-loop stability, such as the dual Youla parameterization, the dual IOP neither requires a doubly-coprime factorization of the controller nor a nominal plant that is stabilized by the controller. The dual IOP does not depend on the order and the state-space realization of the controller either, as in the dual system-level parameterization. Simulation shows that the dual IOP outperforms the existing benchmark methods. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.08415 [pdf]

Scanning phase imaging without accurate positioning system

Authors: Tao Liu, Bingyang Wang, JiangTao Zhao, Fu rong Chen, Fucai Zhang

Abstract: Ptychography, a high-resolution phase imaging technique using precise in-plane translation information, has been widely applied in modern synchrotron radiation sources across the globe. A key requirement for successful ptychographic reconstruction is the precise knowledge of the scanning positions, which are typically obtained by a physical interferometric positioning system. Whereas high-throughp… ▽ More Ptychography, a high-resolution phase imaging technique using precise in-plane translation information, has been widely applied in modern synchrotron radiation sources across the globe. A key requirement for successful ptychographic reconstruction is the precise knowledge of the scanning positions, which are typically obtained by a physical interferometric positioning system. Whereas high-throughput positioning poses a challenge in engineering, especially in nano or even smaller scale. In this work, we propose a novel scanning imaging framework that does not require any prior position information from the positioning system. Specifically, our scheme utilizes the wavefront modulation mechanism to reconstruct the object functions at each scan position and the shared illumination function, simultaneously. The scanning trajectory information is extracted by our subpixel image registration algorithm from the overlap region of reconstructed object functions. Then, a completed object function can be obtained by assembling each part of the reconstructed sample functions. High-quality imaging of biological sample and position recovery with sub-pixel accuracy are demonstrated in proof-of-concept experiment. Based on current results, we find it may have great potential applications in high-resolution and high throughput phase imaging. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: 9 pages,4 figures

arXiv:2311.02865 [pdf, other]

Geometrically-Shaped Constellation for Visible Light Communications at Short Blocklength

Authors: Jia-Ning Guo, Ru-Han Chen, Jian Zhang, Longguang Li, Xu Yang, Jing Zhou

Abstract: In this paper, we present a general framework of designing geometrically shaped constellations for short-packet visible light communications with a peak- and an average-intensity constraints. By leveraging tools from large deviation theory, we first characterize the second-order asymptotics of the optimal constellation shaping region under aforementioned intensity constraints, which serves as a go… ▽ More In this paper, we present a general framework of designing geometrically shaped constellations for short-packet visible light communications with a peak- and an average-intensity constraints. By leveraging tools from large deviation theory, we first characterize the second-order asymptotics of the optimal constellation shaping region under aforementioned intensity constraints, which serves as a good performance measure for the best geometric shaping in finite blocklength. To further incorporate a sufficiently large coding gain and a nearly-maximum shaping gain, we construct multidimensional constellations by the nested structure of Construction B lattices, where the constellation shaping is implemented by controlling the boundary of the embedded sublattice, i.e., a strategy called coarsely shaping and finely coding. Fast algorithms for constellation mapping and demodulation are presented as well. As an illustrative example, we present an energy-efficient $24$-dimensional constellation design based on the Leech lattice, whose superiority over existing constellation designs is verified by numerical results. △ Less

Submitted 28 April, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.01781 [pdf, other]

Passive Handwriting Tracking via Weak mmWave Communication Signals

Authors: Chao Yu, Yan Luo, Renqi Chen, Rui Wang

Abstract: In this letter, a cooperative sensing framework based on millimeter wave (mmWave) communication systems is proposed to detect tiny motions with a millimeter-level resolution. Particularly, the cooperative sensing framework is facilitated with one transmitter and two receivers. There are two radio frequency (RF) chains at each receiver. Hence, the Doppler effect due to the tiny motions can be detec… ▽ More In this letter, a cooperative sensing framework based on millimeter wave (mmWave) communication systems is proposed to detect tiny motions with a millimeter-level resolution. Particularly, the cooperative sensing framework is facilitated with one transmitter and two receivers. There are two radio frequency (RF) chains at each receiver. Hence, the Doppler effect due to the tiny motions can be detected via passive sensing respectively at the receivers, and the velocities of the motions can be estimated by integrating the Doppler frequencies. It is demonstrated that the proposed cooperative sensing system is able to track the handwriting with 90% error below 6 mm. Moreover, the proposed cooperative sensing is robust to the strength of received signal. For example, it works even without the line-of-sight paths from the transmitter to the receivers or the sensing target, where the received signal strength is not sufficient for timing synchronization or demodulation. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.01003 [pdf, other]

Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial Vehicle

Authors: Chen Qian, Rui Chen, Peiyao Shen, Yongchun Fang, Jifu Yan, Tiefeng Li

Abstract: Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flapping wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and d… ▽ More Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flapping wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and develop a general-purpose trajectory generation strategy. Subsequently, the trajectory tracking controller is developed with the help of robust control and switch control techniques. After that, the overall system asymptotic stability is guaranteed by Lyapunov stability analysis. To make the controller applicable in real flight, we also provide several instructions. Finally, a series of experiment results manifest the successful implementation of the proposed trajectory generation strategy and tracking control strategy. This work firstly achieves the closed-loop integration of trajectory generation and control for real 3-dimensional flight of an underactuated FWAV to a practical level. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.18498 [pdf, ps, other]

GPT-4 Vision on Medical Image Classification -- A Case Study on COVID-19 Dataset

Authors: Ruibo Chen, Tianyi Xiong, Yihan Wu, Guodong Liu, Zhengmian Hu, Lichang Chen, Yanshuo Chen, Chenxi Liu, Heng Huang

Abstract: This technical report delves into the application of GPT-4 Vision (GPT-4V) in the nuanced realm of COVID-19 image classification, leveraging the transformative potential of in-context learning to enhance diagnostic processes. This technical report delves into the application of GPT-4 Vision (GPT-4V) in the nuanced realm of COVID-19 image classification, leveraging the transformative potential of in-context learning to enhance diagnostic processes. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.17974

FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model for Fault Recognition

Authors: Zeren Zhang, Ran Chen, Jinwen Ma

Abstract: This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining. Seismic fault interpretation holds great significance in the fields of geophysics and geology. However, conventional methods for seismic fault recognition encounter various issues, including dependence on data quality and quantity, as well as susceptibility to interpreter subjectivity. Curre… ▽ More This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining. Seismic fault interpretation holds great significance in the fields of geophysics and geology. However, conventional methods for seismic fault recognition encounter various issues, including dependence on data quality and quantity, as well as susceptibility to interpreter subjectivity. Currently, automated fault recognition methods proposed based on small synthetic datasets experience performance degradation when applied to actual seismic data. To address these challenges, we have introduced the concept of self-supervised learning, utilizing a substantial amount of relatively easily obtainable unlabeled seismic data for pretraining. Specifically, we have employed the Swin Transformer model as the core network and employed the SimMIM pretraining task to capture unique features related to discontinuities in seismic data. During the fine-tuning phase, inspired by edge detection techniques, we have also refined the structure of the Swin-UNETR model, enabling multiscale decoding and fusion for more effective fault detection. Experimental results demonstrate that our proposed method attains state-of-the-art performance on the Thebe dataset, as measured by the OIS and ODS metrics. △ Less

Submitted 8 January, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: The logical flow and background of the article need significant revisions

arXiv:2310.03297 [pdf, other]

Passive Respiration Detection via mmWave Communication Signal Under Interference

Authors: Kehan Wu, Renqi Chen, Haiyu Wang, Chenqing Ji, Jiayuan Zhu, Guang Wu

Abstract: Recent research has highlighted the detection of human respiration rate using commodity WiFi devices. Nevertheless, these devices encounter challenges in accurately discerning human respiration amidst the prevailing human motion interference encountered in daily life. To tackle this predicament, this paper introduces a passive sensing and communication system designed specifically for respiration… ▽ More Recent research has highlighted the detection of human respiration rate using commodity WiFi devices. Nevertheless, these devices encounter challenges in accurately discerning human respiration amidst the prevailing human motion interference encountered in daily life. To tackle this predicament, this paper introduces a passive sensing and communication system designed specifically for respiration detection in the presence of robust human motion interference. Operating within the 60.48 GHz band, the proposed system aims to detect human respiration even when confronted with substantial human motion interference within close proximity. Subsequently, a neural network is trained using the collected data by us to enable human respiration detection. The experimental results demonstrate a consistently high accuracy rate over 90\% of the human respiration detection under interference, given an adequate sensing duration. Finally, an empirical model is derived analytically to achieve the respiratory rate counting in 10 seconds. △ Less

Submitted 4 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Submitted to WCNC2024 Workshop

arXiv:2309.15415 [pdf]

Formation Wing-Beat Modulation (FWM): A Tool for Quantifying Bird Flocks Using Radar Micro-Doppler Signals

Authors: Jiangkun Gong, Jun Yan, Deyong Kong, Ruizhi Chen, Deren Li

Abstract: Radar echoes from bird flocks contain modulation signals, which we find are produced by the flapping gaits of birds in the flock, resulting in a group of spectral peaks with similar amplitudes spaced at a specific interval. We call this the formation wing-beat modulation (FWM) effect. FWM signals are micro-Doppler modulated by flapping wings and are related to the bird number, wing-beat frequency,… ▽ More Radar echoes from bird flocks contain modulation signals, which we find are produced by the flapping gaits of birds in the flock, resulting in a group of spectral peaks with similar amplitudes spaced at a specific interval. We call this the formation wing-beat modulation (FWM) effect. FWM signals are micro-Doppler modulated by flapping wings and are related to the bird number, wing-beat frequency, and flight phasing strategy. Our X-band radar data show that FWM signals exist in radar signals of a seagull flock, providing tools for quantifying the bird number and estimating the mean wingbeat rate of birds. This new finding could aid in research on the quantification of bird migration numbers and estimation of bird flight behavior in radar ornithology and aero-ecology. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13456 [pdf, other]

An Optimal Control Framework for Influencing Human Driving Behavior in Mixed-Autonomy Traffic

Authors: Anirudh Chari, Rui Chen, Jaskaran Grover, Changliu Liu

Abstract: As autonomous vehicles (AVs) become increasingly prevalent, their interaction with human drivers presents a critical challenge. Current AVs lack social awareness, causing behavior that is often awkward or unsafe. To combat this, social AVs, which are proactive rather than reactive in their behavior, have been explored in recent years. With knowledge of robot-human interaction dynamics, a social AV… ▽ More As autonomous vehicles (AVs) become increasingly prevalent, their interaction with human drivers presents a critical challenge. Current AVs lack social awareness, causing behavior that is often awkward or unsafe. To combat this, social AVs, which are proactive rather than reactive in their behavior, have been explored in recent years. With knowledge of robot-human interaction dynamics, a social AV can influence a human driver to exhibit desired behaviors by strategically altering its own behaviors. In this paper, we present a novel framework for achieving human influence. The foundation of our framework lies in an innovative use of control barrier functions to formulate the desired objectives of influence as constraints in an optimal control problem. The computed controls gradually push the system state toward satisfaction of the objectives, e.g. slowing the human down to some desired speed. We demonstrate the proposed framework's feasibility in a variety of scenarios related to car-following and lane changes, including multi-robot and multi-human configurations. In two case studies, we validate the framework's effectiveness when applied to the problems of traffic flow optimization and aggressive behavior mitigation. Given these results, the main contribution of our framework is its versatility in a wide spectrum of influence objectives and mixed-autonomy configurations. △ Less

Submitted 22 March, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: Accepted to American Control Conference (ACC) 2024

arXiv:2309.12406 [pdf, other]

Safety Index Synthesis with State-dependent Control Space

Authors: Rui Chen, Weiye Zhao, Changliu Liu

Abstract: This paper introduces an approach for synthesizing feasible safety indices to derive safe control laws under state-dependent control spaces. The problem, referred to as Safety Index Synthesis (SIS), is challenging because it requires the existence of feasible control input in all states and leads to an infinite number of constraints. The proposed method leverages Positivstellensatz to formulate SI… ▽ More This paper introduces an approach for synthesizing feasible safety indices to derive safe control laws under state-dependent control spaces. The problem, referred to as Safety Index Synthesis (SIS), is challenging because it requires the existence of feasible control input in all states and leads to an infinite number of constraints. The proposed method leverages Positivstellensatz to formulate SIS as a nonlinear programming (NP) problem. We formally prove that the NP solutions yield safe control laws with two imperative guarantees: forward invariance within user-defined safe regions and finite-time convergence to those regions. A numerical study validates the effectiveness of our approach. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.09797 [pdf, other]

A Read Margin Enhancement Circuit with Dynamic Bias Optimization for MRAM

Authors: Renhe Chen, Albert Lee, Zirui Wang, Di Wu, Xufeng Kou

Abstract: This brief introduces a read bias circuit to improve readout yield of magnetic random access memories (MRAMs). A dynamic bias optimization (DBO) circuit is proposed to enable the real-time tracking of the optimal read voltage across processvoltage-temperature (PVT) variations within an MRAM array. It optimizes read performance by adjusting the read bias voltage dynamically for maximum sensing marg… ▽ More This brief introduces a read bias circuit to improve readout yield of magnetic random access memories (MRAMs). A dynamic bias optimization (DBO) circuit is proposed to enable the real-time tracking of the optimal read voltage across processvoltage-temperature (PVT) variations within an MRAM array. It optimizes read performance by adjusting the read bias voltage dynamically for maximum sensing margin. Simulation results on a 28-nm 1Mb MRAM macro show that the tracking accuracy of the proposed DBO circuit remains above 90% even when the optimal sensing voltage varies up to 50%. Such dynamic tracking strategy further results in up to two orders of magnitude reduction in the bit error rate with respect to different variations, highlighting its effectiveness in enhancing MRAM performance and reliability. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.04672 [pdf, other]

SSHNN: Semi-Supervised Hybrid NAS Network for Echocardiographic Image Segmentation

Authors: Renqi Chen, Jingjing Luo, Fan Nian, Yuhui Cen, Yiheng Peng, Zekuan Yu

Abstract: Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak de… ▽ More Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak decoder" structure, insufficient to handle global relationships and local details. To resolve these issues, we propose a novel semi-supervised hybrid NAS network for accurate medical image segmentation termed SSHNN. In SSHNN, we creatively use convolution operation in layer-wise feature fusion instead of normalized scalars to avoid losing details, making NAS a stronger encoder. Moreover, Transformers are introduced for the compensation of global context and U-shaped decoder is designed to efficiently connect global context with local features. Specifically, we implement a semi-supervised algorithm Mean-Teacher to overcome the limited volume problem of labeled medical image dataset. Extensive experiments on CAMUS echocardiography dataset demonstrate that SSHNN outperforms state-of-the-art approaches and realizes accurate segmentation. Code will be made publicly available. △ Less

Submitted 27 December, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: Accepted by ICASSP2024

arXiv:2308.14553 [pdf, other]

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Authors: Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang

Abstract: Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background… ▽ More Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech. Meanwhile, it was shown that self-supervised pre-trained models exhibit excellent noise robustness on many speech tasks, implying that the learned representation has a better tolerance for noise perturbations. In this work, we therefore explore pre-trained models to improve the noise robustness of TTS models. Based on HiFi-GAN, we first propose a representation-to-waveform vocoder, which aims to learn to map the representation of pre-trained models to the waveform. We then propose a text-to-representation FastSpeech2 model, which aims to learn to map text to pre-trained model representations. Experimental results on the LJSpeech and LibriTTS datasets show that our method outperforms those using speech enhancement methods in both subjective and objective metrics. Audio samples are available at: https://zqs01.github.io/rep2wav. △ Less

Submitted 3 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 5 pages,2 figures

arXiv:2308.13790 [pdf, other]

FFPN: Fourier Feature Pyramid Network for Ultrasound Image Segmentation

Authors: Chaoyu Chen, Xin Yang, Rusi Chen, Junxuan Yu, Liwei Du, Jian Wang, Xindi Hu, Yan Cao, Yingying Liu, Dong Ni

Abstract: Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel F… ▽ More Ultrasound (US) image segmentation is an active research area that requires real-time and highly accurate analysis in many scenarios. The detect-to-segment (DTS) frameworks have been recently proposed to balance accuracy and efficiency. However, existing approaches may suffer from inadequate contour encoding or fail to effectively leverage the encoded results. In this paper, we introduce a novel Fourier-anchor-based DTS framework called Fourier Feature Pyramid Network (FFPN) to address the aforementioned issues. The contributions of this paper are two fold. First, the FFPN utilizes Fourier Descriptors to adequately encode contours. Specifically, it maps Fourier series with similar amplitudes and frequencies into the same layer of the feature map, thereby effectively utilizing the encoded Fourier information. Second, we propose a Contour Sampling Refinement (CSR) module based on the contour proposals and refined features produced by the FFPN. This module extracts rich features around the predicted contours to further capture detailed information and refine the contours. Extensive experimental results on three large and challenging datasets demonstrate that our method outperforms other DTS methods in terms of accuracy and efficiency. Furthermore, our framework can generalize well to other detection or segmentation tasks. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 10 pages, 5 figures, Accepted by MLMI 2023

arXiv:2308.08269 [pdf, other]

OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis

Authors: Han Zhou, Dong Ni, Ao Chang, Xinrui Zhou, Rusi Chen, Yanlin Chen, Lian Liu, Jiamin Liang, Yuhao Huang, Tong Han, Zhe Liu, Deng-Ping Fan, Xin Yang

Abstract: Ultrasound (US) imaging is indispensable in clinical practice. To diagnose certain diseases, sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information. However, the limited availability of specific US video cases causes teaching difficulties in identifying corresponding diseases, which potentially impacts the detection rate of such cases. The synthesis… ▽ More Ultrasound (US) imaging is indispensable in clinical practice. To diagnose certain diseases, sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information. However, the limited availability of specific US video cases causes teaching difficulties in identifying corresponding diseases, which potentially impacts the detection rate of such cases. The synthesis of US videos may represent a promising solution to this issue. Nevertheless, it is challenging to accurately animate the intricate motion of dynamic anatomic structures while preserving image fidelity. To address this, we present a novel online feature-decoupling framework called OnUVS for high-fidelity US video synthesis. Our highlights can be summarized by four aspects. First, we introduced anatomic information into keypoint learning through a weakly-supervised training strategy, resulting in improved preservation of anatomical integrity and motion while minimizing the labeling burden. Second, to better preserve the integrity and textural information of US images, we implemented a dual-decoder that decouples the content and textural features in the generator. Third, we adopted a multiple-feature discriminator to extract a comprehensive range of visual cues, thereby enhancing the sharpness and fine details of the generated videos. Fourth, we constrained the motion trajectories of keypoints during online learning to enhance the fluidity of generated videos. Our validation and user studies on in-house echocardiographic and pelvic floor US videos showed that OnUVS synthesizes US videos with high fidelity. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 14 pages, 13 figures and 6 tables

arXiv:2308.07342 [pdf, other]

Emergent communication for AR

Authors: Ruxiao Chen, Shuaishuai Guo

Abstract: Mobile augmented reality (MAR) is widely acknowledged as one of the ubiquitous interfaces to the digital twin and Metaverse, demanding unparalleled levels of latency, computational power, and energy efficiency. The existing solutions for realizing MAR combine multiple technologies like edge, cloud computing, and fifth-generation (5G) networks. However, the inherent communication latency of visual… ▽ More Mobile augmented reality (MAR) is widely acknowledged as one of the ubiquitous interfaces to the digital twin and Metaverse, demanding unparalleled levels of latency, computational power, and energy efficiency. The existing solutions for realizing MAR combine multiple technologies like edge, cloud computing, and fifth-generation (5G) networks. However, the inherent communication latency of visual data imposes apparent limitations on the quality of experience (QoE). To address the challenge, we propose an emergent semantic communication framework to learn the communication protocols in MAR. Specifically, we train two agents through a modified Lewis signaling game to emerge a discrete communication protocol spontaneously. Based on this protocol, two agents can communicate about the abstract idea of visual data through messages with extremely small data sizes in a noisy channel, which leads to message errors. To better simulate real-world scenarios, we incorporate channel uncertainty into our training process. Experiments have shown that the proposed scheme has better generalization on unseen objects than traditional object recognition used in MAR and can effectively enhance communication efficiency through the utilization of small-size messages. △ Less

Submitted 12 August, 2023; originally announced August 2023.

arXiv:2308.02782 [pdf]

doi 10.1364/OL.501622

Non-line-of-sight reconstruction via structure sparsity regularization

Authors: Duolan Huang, Quan Chen, Zhun Wei, Rui Chen

Abstract: Non-line-of-sight (NLOS) imaging allows for the imaging of objects around a corner, which enables potential applications in various fields such as autonomous driving, robotic vision, medical imaging, security monitoring, etc. However, the quality of reconstruction is challenged by low signal-noise-ratio (SNR) measurements. In this study, we present a regularization method, referred to as structure… ▽ More Non-line-of-sight (NLOS) imaging allows for the imaging of objects around a corner, which enables potential applications in various fields such as autonomous driving, robotic vision, medical imaging, security monitoring, etc. However, the quality of reconstruction is challenged by low signal-noise-ratio (SNR) measurements. In this study, we present a regularization method, referred to as structure sparsity (SS) regularization, for denoising in NLOS reconstruction. By exploiting the prior knowledge of structure sparseness, we incorporate nuclear norm penalization into the cost function of directional light-cone transform (DLCT) model for NLOS imaging system. This incorporation effectively integrates the neighborhood information associated with the directional albedo, thereby facilitating the denoising process. Subsequently, the reconstruction is achieved by optimizing a directional albedo model with SS regularization using fast iterative shrinkage-thresholding algorithm. Notably, the robust reconstruction of occluded objects is observed. Through comprehensive evaluations conducted on both synthetic and experimental datasets, we demonstrate that the proposed approach yields high-quality reconstructions, surpassing the state-of-the-art reconstruction algorithms, especially in scenarios involving short exposure and low SNR measurements. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures

arXiv:2305.19558 [pdf, other]

Look-Ahead Task Offloading for Multi-User Mobile Augmented Reality in Edge-Cloud Computing

Authors: Ruxiao Chen, Shuaishuai Guo

Abstract: Mobile augmented reality (MAR) blends a real scenario with overlaid virtual content, which has been envisioned as one of the ubiquitous interfaces to the Metaverse. Due to the limited computing power and battery life of MAR devices, it is common to offload the computation tasks to edge or cloud servers in close proximity. However, existing offloading solutions developed for MAR tasks suffer from h… ▽ More Mobile augmented reality (MAR) blends a real scenario with overlaid virtual content, which has been envisioned as one of the ubiquitous interfaces to the Metaverse. Due to the limited computing power and battery life of MAR devices, it is common to offload the computation tasks to edge or cloud servers in close proximity. However, existing offloading solutions developed for MAR tasks suffer from high migration overhead, poor scalability, and short-sightedness when applied in provisioning multi-user MAR services. To address these issues, a MAR service-oriented task offloading scheme is designed and evaluated in edge-cloud computing networks. Specifically, the task interdependency of MAR applications is firstly analyzed and modeled by using directed acyclic graphs. Then, we propose a look-ahead offloading scheme based on a modified Monte Carlo tree (MMCT) search, which can run several multi-step executions in advance to get an estimate of the long-term effect of immediate action. Experiment results show that the proposed offloading scheme can effectively improve the quality of service (QoS) in provisioning multi-user MAR services, compared to four benchmark schemes. Furthermore, it is also shown that the proposed solution is stable and suitable for applications in a highly volatile environment. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE Network

arXiv:2304.14660 [pdf, other]

doi 10.1016/j.media.2023.103061

Segment Anything Model for Medical Images?

Authors: Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, Sijing Liu, Haozhe Chi, Xindi Hu, Kejuan Yue, Lei Li, Vicente Grau, Deng-Ping Fan, Fajin Dong, Dong Ni

Abstract: The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's perfo… ▽ More The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's performance on medical data, we collected and sorted 53 open-source datasets and built a large medical segmentation dataset with 18 modalities, 84 objects, 125 object-modality paired targets, 1050K 2D images, and 6033K masks. We comprehensively analyzed different models and strategies on the so-called COSMOS 1050K dataset. Our findings mainly include the following: 1) SAM showed remarkable performance in some specific objects but was unstable, imperfect, or even totally failed in other situations. 2) SAM with the large ViT-H showed better overall performance than that with the small ViT-B. 3) SAM performed better with manual hints, especially box, than the Everything mode. 4) SAM could help human annotation with high labeling quality and less time. 5) SAM was sensitive to the randomness in the center point and tight box prompts, and may suffer from a serious performance drop. 6) SAM performed better than interactive methods with one or a few points, but will be outpaced as the number of points increases. 7) SAM's performance correlated to different factors, including boundary complexity, intensity differences, etc. 8) Finetuning the SAM on specific medical tasks could improve its average DICE performance by 4.39% and 6.68% for ViT-B and ViT-H, respectively. We hope that this comprehensive report can help researchers explore the potential of SAM applications in MIS, and guide how to appropriately use and develop SAM. △ Less

Submitted 17 January, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

Comments: Accepted by Medical Image Analysis. 23 pages, 18 figures, 8 tables

arXiv:2302.14257 [pdf, ps, other]

Beamforming Design for RIS-Aided AF Relay Networks

Authors: Xuehui Wang, Feng Shu, Riqing Chen, Peng Zhang, Qi Zhang, Guiyang Xia, Weiping shi, Jiangzhou Wang

Abstract: Since reconfigurable intelligent surface (RIS) is considered to be a passive reflector for rate performance enhancement, a RIS-aided amplify-and-forward (AF) relay network is presented. By jointly optimizing the beamforming matrix at AF relay and the phase shifts matrices at RIS, two schemes are put forward to address a maximizing signal-to-noise ratio (SNR) problem. Firstly, aiming at achieving a… ▽ More Since reconfigurable intelligent surface (RIS) is considered to be a passive reflector for rate performance enhancement, a RIS-aided amplify-and-forward (AF) relay network is presented. By jointly optimizing the beamforming matrix at AF relay and the phase shifts matrices at RIS, two schemes are put forward to address a maximizing signal-to-noise ratio (SNR) problem. Firstly, aiming at achieving a high rate, a high-performance alternating optimization (AO) method based on Charnes-Cooper transformation and semidefinite programming (CCT-SDP) is proposed, where the optimization problem is decomposed to three subproblems solved by CCT-SDP and rank-one solutions can be recovered by Gaussian randomization. While the optimization variables in CCT-SDP method are matrices, which leads to extremely high complexity. In order to reduce the complexity, a low-complexity AO scheme based on Dinkelbachs transformation and successive convex approximation (DT-SCA) is put forward, where matrices variables are transformed to vector variables and three decoupled subproblems are solved by DT-SCA. Simulation results verify that compared to two benchmarks (i.e. a RIS-assisted AF relay network with random phase and a AF relay network without RIS), the proposed CCT-SDP and DT-SCA schemes can harvest better rate performance. Furthermore, it is revealed that the rate of the low-complexity DT-SCA method is close to that of CCT-SDP method. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2301.02858 [pdf, ps, other]

Two Efficient Beamforming Methods for Hybrid IRS-aided AF Relay Wireless Networks

Authors: Xuehui Wang, Feng Shu, Mengxing Huang, Fuhui Zhou, Riqing Chen, Cunhua Pan, Yongpeng Wu, Jiangzhou Wang

Abstract: Due to the double fading effect caused by conventional passive intelligent reflecting surface (IRS), the signal via the reflection link is weak. To enhance the received signal, active elements with the ability to amplify the reflected signal are introduced to the passive IRS forming hybrid IRS. In this paper, we propose a hybrid IRS-aided amplify-and-forward (AF) relay wireless network, where an o… ▽ More Due to the double fading effect caused by conventional passive intelligent reflecting surface (IRS), the signal via the reflection link is weak. To enhance the received signal, active elements with the ability to amplify the reflected signal are introduced to the passive IRS forming hybrid IRS. In this paper, we propose a hybrid IRS-aided amplify-and-forward (AF) relay wireless network, where an optimization problem is formulated, which is subject to the constraints of transmit power budgets at the source/AF relay/hybrid IRS and that of unit modulus for passive IRS elements. By alternately designing the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS, signal-to-noise ratio can be maximized. To achieve high rate performance and extend the coverage range, a high-performance method based on semidefinite relaxation and fractional programming (HP-SDR-FP) algorithm is presented. Due to its extremely high complexity, a low-complexity method based on whitening filter, general power iterative and generalized Rayleigh-Ritz (WF-GPI-GRR) is proposed, which is different from HP-SDR-FP method. It is assumed that the amplifying coefficient of each active IRS element is equal, and the corresponding analytical solution of the amplifying coefficient can be obtained according to the transmit powers at AF relay and hybrid IRS. Simulation results show that the proposed two methods can greatly improve the rate performance compared to the existing networks, such as the passive IRS-aided AF relay and only AF relay network. In particular, a 50.0% rate gain over the existing networks is approximately achieved in the high power budget region of hybrid IRS. Moreover, it is verified that the proposed HP-SDR-FP method perform better than WF-GPI-GRR method in terms of rate performance. △ Less

Submitted 23 November, 2023; v1 submitted 7 January, 2023; originally announced January 2023.

arXiv:2212.08391 [pdf, ps, other]

Enhanced-rate Iterative Beamformers for Active IRS-assisted Wireless Communications

Authors: Yeqing Lin, Feng Shu, Rongen Dong, Riqing Chen, Siling Feng, Weiping Shi, Jing Liu, Jiangzhou Wang

Abstract: Compared to passive intelligent reflecting surface (IRS), active IRS is viewed as a more efficient promising technique to combat the double-fading impact in IRS-aided wireless network. In this paper, in order to boost the achievable rate of user in such a wireless network, three enhanced-rate iterative beamforming methods are proposed by designing the amplifying factors and the corresponding phase… ▽ More Compared to passive intelligent reflecting surface (IRS), active IRS is viewed as a more efficient promising technique to combat the double-fading impact in IRS-aided wireless network. In this paper, in order to boost the achievable rate of user in such a wireless network, three enhanced-rate iterative beamforming methods are proposed by designing the amplifying factors and the corresponding phases at active IRS. The first method, maximizing the simplified signal-to-noise ratio (Max-SSNR) is designed by omitting the cross-term in the definition of rate. Using the Rayleigh-Ritz (RR) theorem, Max-SSNR-RR is proposed to iteratively optimize the norm of beamforming vector and its associated normalized vector. In addition, generalized maximum ratio reflection (GMRR) is presented with a closed-form expression, which is motivated by the maximum ratio combining. To further improve rate, maximizing SNR (Max-SNR) is designed by fractional programming (FP), which is called Max-SNR-FP. Simulation results show that the proposed three methods make an obvious rate enhancement over Max-reflecting signal-to-noise ratio (Max-RSNR), maximum ratio reflection (MRR), selective ratio reflecting (SRR), equal gain reflection (EGR) and passive IRS, and are in increasing order of rate performance as follows: Max-SSNR-RR, GMRR, and Max-SNR-FP. △ Less

Submitted 14 May, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.14361 [pdf, other]

doi 10.1109/IROS55552.2023.10341790

gatekeeper: Online Safety Verification and Control for Nonlinear Systems in Dynamic Environments

Authors: Devansh R Agrawal, Ruichang Chen, Dimitra Panagou

Abstract: This paper presents the gatekeeper algorithm, a real-time and computationally-lightweight method that ensures that trajectories of a nonlinear system satisfy safety constraints despite sensing limitations. gatekeeper integrates with existing path planners and feedback controllers by introducing an additional verification step to ensure that proposed trajectories can be executed safely, despite non… ▽ More This paper presents the gatekeeper algorithm, a real-time and computationally-lightweight method that ensures that trajectories of a nonlinear system satisfy safety constraints despite sensing limitations. gatekeeper integrates with existing path planners and feedback controllers by introducing an additional verification step to ensure that proposed trajectories can be executed safely, despite nonlinear dynamics subject to bounded disturbances, input constraints and partial knowledge of the environment. Our key contribution is that (A) we propose an algorithm to recursively construct safe trajectories by numerically forward propagating the system over a (short) finite horizon, and (B) we prove that tracking such a trajectory ensures the system remains safe for all future time, i.e., beyond the finite horizon. We demonstrate the method in a simulation of a dynamic firefighting mission, and in physical experiments of a quadrotor navigating in an obstacle environment that is sensed online. We also provide comparisons against the state-of-the-art techniques for similar problems. △ Less

Submitted 14 August, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: Accepted at IEEE T-RO 2024. Accepted at IROS 2023. 17 pages, 10 figures

Showing 1–50 of 105 results for author: Chen, R