Search | arXiv e-print repository

Addressing the Mutual Interference in Uplink ISAC Receivers: A Projection Method

Authors: Zhiyuan Yu, Hong Ren, Cunhua Pan, Gui Zhou, Ruizhe Wang, Mengyu Liu, Jiangzhou Wang

Abstract: Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estim… ▽ More Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estimation from the echoes, leading to issues with mutual interference. In this paper, a projection-based scheme is proposed to equivalently transform the joint signal detection and target estimation problem into a joint signal detection process across multiple snapshots. Compared with conventional successive interference cancellation (SIC) schemes, our proposed approach achieves a higher signal-to-noise ratio (SNR), and a higher ergodic rate when the radar signal is non-negligible. Nonetheless, it introduces an ill-conditioned signal detection problem, which is addressed using a non-linear detector. By jointly processing an increased number of snapshots, the proposed scheme can achieve high S&C performance simultaneously. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 5 pages, 3 figures, accepted by IEEE WCL

arXiv:2408.09315 [pdf, other]

Unpaired Volumetric Harmonization of Brain MRI with Conditional Latent Diffusion

Authors: Mengqi Wu, Minhui Yu, Shuaiming Jing, Pew-Thian Yap, Zhengwu Zhang, Mingxia Liu

Abstract: Multi-site structural MRI is increasingly used in neuroimaging studies to diversify subject cohorts. However, combining MR images acquired from various sites/centers may introduce site-related non-biological variations. Retrospective image harmonization helps address this issue, but current methods usually perform harmonization on pre-extracted hand-crafted radiomic features, limiting downstream a… ▽ More Multi-site structural MRI is increasingly used in neuroimaging studies to diversify subject cohorts. However, combining MR images acquired from various sites/centers may introduce site-related non-biological variations. Retrospective image harmonization helps address this issue, but current methods usually perform harmonization on pre-extracted hand-crafted radiomic features, limiting downstream applicability. Several image-level approaches focus on 2D slices, disregarding inherent volumetric information, leading to suboptimal outcomes. To this end, we propose a novel 3D MRI Harmonization framework through Conditional Latent Diffusion (HCLD) by explicitly considering image style and brain anatomy. It comprises a generalizable 3D autoencoder that encodes and decodes MRIs through a 4D latent space, and a conditional latent diffusion model that learns the latent distribution and generates harmonized MRIs with anatomical information from source MRIs while conditioned on target image style. This enables efficient volume-level MRI harmonization through latent style translation, without requiring paired images from target and source domains during training. The HCLD is trained and evaluated on 4,158 T1-weighted brain MRIs from three datasets in three tasks, assessing its ability to remove site-related variations while retaining essential biological features. Qualitative and quantitative experiments suggest the effectiveness of HCLD over several state-of-the-arts △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.06427 [pdf]

Quantification of Multi-Compartment Flow with Spectral Diffusion MRI

Authors: Mira M. Liu, Jonathan Dyke, Thomas Gladytz, Jonas Jasse, Ian Bolger, Sergio Calle, Swathi Pavaluri, Tanner Crews, Surya Seshan, Steven Salvatore, Isaac Stillman, Thangamani Muthukumar, Bachir Taouli, Samira Farouk, Sara Lewis, Octavia Bane

Abstract: Purpose: Estimation of multi-compartment intravoxel flow in fD in ml/100g/min with multi-b-value diffusion weighted imaging and a multi-Gaussian model in the kidneys. Theory and Methods: A multi-Gaussian model of intravoxel flow using water transport time to quantify fD is presented and simulated. Multi-compartment anisotropic DWI signal is simulated analyzed with (1) a rigid bi-exponential, (2) a… ▽ More Purpose: Estimation of multi-compartment intravoxel flow in fD in ml/100g/min with multi-b-value diffusion weighted imaging and a multi-Gaussian model in the kidneys. Theory and Methods: A multi-Gaussian model of intravoxel flow using water transport time to quantify fD is presented and simulated. Multi-compartment anisotropic DWI signal is simulated analyzed with (1) a rigid bi-exponential, (2) a rigid tri-exponential, and (3) diffusion spectrum imaging model of intravoxel incoherent motion (spectral diffusion). The application is demonstrated in a two-center study of 54 kidney allografts with 9 b-value advanced DWI that were split by function (CKD-EPI 2021 eGFR<45ml/min/1.73m2) and fibrosis (Banff 2017 interstitial fibrosis and tubular atrophy score 0-6). Results: Spectral diffusion demonstrated strong correlation to truth for simulated three-compartment anisotropic diffusion (y=1.08x+0.1, R2=0.71) and two-compartment anisotropic diffusion (y=0.91x+0.6, R2=0.74), outperforming rigid models in cases of variable compartment number. Use of a fixed regularization parameter set to λ=0.1 increased computation up to 208-fold and agreed with voxel-wise cross-validated regularization (concordance correlation coefficient=0.99). Spectral diffusion of renal allografts showed significant increase in tissue parenchyma compartment fD (f-stat=3.86, p=0.02). Tubular fD was significantly decreased in allografts with impaired function (Mann-Whitney Utest t-stat=-2.14, p=0.04). Conclusions: Quantitative multi-compartment intravoxel flow can be estimated in ml/100g/min with fD from multi-Gaussian diffusion, even with moderate anisotropy such as in kidneys. The use of spectral diffusion with a multi-Gaussian model and a fixed regularization parameter shows promise in organs such as the kidney with variable numbers of physiologic compartments. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.04274 [pdf, other]

Field of View Expansion for Resonant Beam Information and Power Transfer

Authors: Shun Han, Wen Fang, Mingqing Liu, Mengyuan Xu, Shuaifan Xia, Qingwen Liu

Abstract: Simultaneous wireless information and power transfer (SWIPT) leverages lightwave as the wireless transmission medium, emerging as a promising technology in the future Internet of Things (IoT) scenarios. The use of retro-reflectors in constructing spatially separated laser resonators (SSLR) enables a self-aligning wireless transmission system with the self-reproducing resonant beam, i.e. resonant b… ▽ More Simultaneous wireless information and power transfer (SWIPT) leverages lightwave as the wireless transmission medium, emerging as a promising technology in the future Internet of Things (IoT) scenarios. The use of retro-reflectors in constructing spatially separated laser resonators (SSLR) enables a self-aligning wireless transmission system with the self-reproducing resonant beam, i.e. resonant beam system (RBS). However, it's effective Field of View (FoV) is physically limited by the size of retroreflectors and still requires significant improvement. This restricts the transmitter from providing seamless wireless connectivity and power supply to receivers within a large dynamic movement range. In this paper, we propose an FoV-enlarged resonant beam system operating at a meter distance by incorporating a telescope. The telescope plays a crucial role in minimizing the extra loss inflicted on the gain medium, which typically arises from the deviation of the resonant beam within the cavity. Further, we construct the proposed telescope-based RBS and experimentally demonstrate that the design could expand the FoV to 28$^\circ$ over 1 m transmission distance is about triple that of the ordinary RBS design. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.04267 [pdf, other]

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

Authors: Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie

Abstract: The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the k… ▽ More The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the knowledge distillation (KD) method to use a larger teacher model to help train a smaller student model. We design a knowledge distillation (KD) method, integrating attention transfer and Kullback-Leibler divergence (AT-KL) to train the student model Distil-DCCRN. Additionally, we use a model with better performance and a more complicated structure, Uformer, as the teacher model. Unlike previous KD approaches that mainly focus on model outputs, our method also leverages the intermediate features from the models' middle layers, facilitating rich knowledge transfer across different structured models despite variations in layer configurations and discrepancies in the channel and time dimensions of intermediate features. Employing our AT-KL approach, Distil-DCCRN outperforms DCCRN as well as several other competitive models in both PESQ and SI-SNR metrics on the DNS test set and achieves comparable results to DCCRN in DNSMOS. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted by IEEE Signal Processing Letters

arXiv:2408.00940 [pdf, other]

A dual-task mutual learning framework for predicting post-thrombectomy cerebral hemorrhage

Authors: Caiwen Jiang, Tianyu Wang, Xiaodan Xing, Mianxin Liu, Guang Yang, Zhongxiang Ding, Dinggang Shen

Abstract: Ischemic stroke is a severe condition caused by the blockage of brain blood vessels, and can lead to the death of brain tissue due to oxygen deprivation. Thrombectomy has become a common treatment choice for ischemic stroke due to its immediate effectiveness. But, it carries the risk of postoperative cerebral hemorrhage. Clinically, multiple CT scans within 0-72 hours post-surgery are used to moni… ▽ More Ischemic stroke is a severe condition caused by the blockage of brain blood vessels, and can lead to the death of brain tissue due to oxygen deprivation. Thrombectomy has become a common treatment choice for ischemic stroke due to its immediate effectiveness. But, it carries the risk of postoperative cerebral hemorrhage. Clinically, multiple CT scans within 0-72 hours post-surgery are used to monitor for hemorrhage. However, this approach exposes radiation dose to patients, and may delay the detection of cerebral hemorrhage. To address this dilemma, we propose a novel prediction framework for measuring postoperative cerebral hemorrhage using only the patient's initial CT scan. Specifically, we introduce a dual-task mutual learning framework to takes the initial CT scan as input and simultaneously estimates both the follow-up CT scan and prognostic label to predict the occurrence of postoperative cerebral hemorrhage. Our proposed framework incorporates two attention mechanisms, i.e., self-attention and interactive attention. Specifically, the self-attention mechanism allows the model to focus more on high-density areas in the image, which are critical for diagnosis (i.e., potential hemorrhage areas). The interactive attention mechanism further models the dependencies between the interrelated generation and classification tasks, enabling both tasks to perform better than the case when conducted individually. Validated on clinical data, our method can generate follow-up CT scans better than state-of-the-art methods, and achieves an accuracy of 86.37% in predicting follow-up prognostic labels. Thus, our work thus contributes to the timely screening of post-thrombectomy cerebral hemorrhage, and could significantly reform the clinical process of thrombectomy and other similar operations related to stroke. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2408.00938 [pdf, other]

CIResDiff: A Clinically-Informed Residual Diffusion Model for Predicting Idiopathic Pulmonary Fibrosis Progression

Authors: Caiwen Jiang, Xiaodan Xing, Zaixin Ou, Mianxin Liu, Walsh Simon, Guang Yang, Dinggang Shen

Abstract: The progression of Idiopathic Pulmonary Fibrosis (IPF) significantly correlates with higher patient mortality rates. Early detection of IPF progression is critical for initiating timely treatment, which can effectively slow down the advancement of the disease. However, the current clinical criteria define disease progression requiring two CT scans with a one-year interval, presenting a dilemma: a… ▽ More The progression of Idiopathic Pulmonary Fibrosis (IPF) significantly correlates with higher patient mortality rates. Early detection of IPF progression is critical for initiating timely treatment, which can effectively slow down the advancement of the disease. However, the current clinical criteria define disease progression requiring two CT scans with a one-year interval, presenting a dilemma: a disease progression is identified only after the disease has already progressed. To this end, in this paper, we develop a novel diffusion model to accurately predict the progression of IPF by generating patient's follow-up CT scan from the initial CT scan. Specifically, from the clinical prior knowledge, we tailor improvements to the traditional diffusion model and propose a Clinically-Informed Residual Diffusion model, called CIResDiff. The key innovations of CIResDiff include 1) performing the target region pre-registration to align the lung regions of two CT scans at different time points for reducing the generation difficulty, 2) adopting the residual diffusion instead of traditional diffusion to enable the model focus more on differences (i.e., lesions) between the two CT scans rather than the largely identical anatomical content, and 3) designing the clinically-informed process based on CLIP technology to integrate lung function information which is highly relevant to diagnosis into the reverse process for assisting generation. Extensive experiments on clinical data demonstrate that our approach can outperform state-of-the-art methods and effectively predict the progression of IPF. △ Less

Submitted 5 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.18481 [pdf, ps, other]

Finite-time and bumpless transfer control of asynchronously switched systems: An output feedback control approach

Authors: Mo-Ran Liu, Zhen Wu, Xian Du, Zhongyang Fei

Abstract: In this paper, the finite-time control and bumpless transfer control are investigated for switched systems under asynchronously switching. First, a class of dynamic output feedback controllers are designed to stabilize the switched system with measurable system outputs. Considering the improvement of transient performance, the bumpless transfer control and finite-time control are further studied i… ▽ More In this paper, the finite-time control and bumpless transfer control are investigated for switched systems under asynchronously switching. First, a class of dynamic output feedback controllers are designed to stabilize the switched system with measurable system outputs. Considering the improvement of transient performance, the bumpless transfer control and finite-time control are further studied in the controller design. To avoid the control bumps, a practical filter is introduced to make the control signal smoother and continuous. Furthermore, to derive a finite-time bounded system state over short-time intervals, the finite-time analysis is considered in managing the switching process with the average dwell time. New criteria are proposed to analyze the finite-time stability and finite-time boundedness for the closed-loop system and solvable conditions are newly proposed to optimize the controller gain. Finally, the superiorities of the proposed method are validated through an application to a boost converter. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.10080 [pdf, other]

Design and Optimization on Successive RIS-assisted Multi-hop Wireless Communications

Authors: Rujing Xiong, Jialong Lu, Jianan Zhang, Minggang Liu, Xuehui Dong, Tiebin Mi, Robert Caiming Qiu

Abstract: As an emerging wireless communication technology, reconfigurable intelligent surface (RIS) has become a basic choice for providing signal coverage services in scenarios with dense obstacles or long tunnels through multi-hop configurations. Conventional works of literature mainly focus on alternating optimization or single-beam calculation in RIS phase configuration, which is limited in considering… ▽ More As an emerging wireless communication technology, reconfigurable intelligent surface (RIS) has become a basic choice for providing signal coverage services in scenarios with dense obstacles or long tunnels through multi-hop configurations. Conventional works of literature mainly focus on alternating optimization or single-beam calculation in RIS phase configuration, which is limited in considering energy efficiency, and often suffers from inaccurate channel state information (CSI), poor convergence, and high computational complexity. This paper addresses the design and optimization challenges for successive RIS-assisted multi-hop systems. Specifically, we establish a general model for multi-hop communication based on the relationship between the input and output electric fields within each RIS. Meanwhile, we derive the half-power beamwidth of the RIS-reflected beams, considering the beam direction. Leveraging these models and derivations, we propose deployment optimization and beam optimization strategies for multi-hop systems, which feature high aperture efficiency and significant gains in signal power. Simulation and prototype experiment results validate the effectiveness and superiority of the proposed systems and methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.06948 [pdf, other]

Detection-Triggered Recursive Impact Mitigation against Secondary False Data Injection Attacks in Microgrids

Authors: Mengxiang Liu, Xin Zhang, Rui Zhang, Zhuoran Zhou, Zhenyong Zhang, Ruilong Deng

Abstract: The cybersecurity of microgrid has received widespread attentions due to the frequently reported attack accidents against distributed energy resource (DER) manufactures. Numerous impact mitigation schemes have been proposed to reduce or eliminate the impacts of false data injection attacks (FDIAs). Nevertheless, the existing methods either requires at least one neighboring trustworthy agent or may… ▽ More The cybersecurity of microgrid has received widespread attentions due to the frequently reported attack accidents against distributed energy resource (DER) manufactures. Numerous impact mitigation schemes have been proposed to reduce or eliminate the impacts of false data injection attacks (FDIAs). Nevertheless, the existing methods either requires at least one neighboring trustworthy agent or may bring in unacceptable cost burdens. This paper aims to propose a detection-triggered recursive impact mitigation scheme that can timely and precisely counter the secondary FDIAs (SFDIAs) against the communication links among DERs. Once triggering attack alarms, the power line current readings will be utilised to observe the voltage bias injections through the physical interconnections among DERs, based on which the current bias injections can be recursively reconstructed from the residuals generated by unknown input observers (UIOs). The attack impacts are eliminated by subtracting the reconstructed bias from the incoming compromised data. The proposed mitigation method can work even in the worst case where all communication links are under SFDIAs and only require extra current sensors. The bias reconstruction performance under initial errors and system noises is theoretically analysed and the reconstruction error is proved to be bounded regardless of the electrical parameters. To avoid deploying current sensors on all power lines, a cost-effective deployment strategy is presented to secure a spanning tree set of communication links that can guarantee the secondary control performance. Extensive validation studies are conducted in MATLAB/SIMULINK and hardware-in-the-loop (HIL) testbeds to validate the proposed method's effectiveness against single/multiple and continuous/discontinuous SFDIAs. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Submitted to IEEE Transactions on Smart Grid

arXiv:2407.02264 [pdf, other]

SOAF: Scene Occlusion-aware Neural Acoustic Field

Authors: Huiyu Gao, Jiahao Ma, David Ahmedt-Aristizabal, Chuong Nguyen, Miaomiao Liu

Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach… ▽ More This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/. △ Less

Submitted 2 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01808 [pdf, other]

Toward Wireless System and Circuit Co-Design for the Internet of Self-Adaptive Things

Authors: Diptashree Das, Mohammad Abdi, Minghan Liu, Marvin Onabajo, Francesco Restuccia

Abstract: The deployment of a growing number of devices in Internet of Things (IoT) networks implies that uninterrupted and seamless adaptation of wireless communication parameters (e.g., carrier frequency, bandwidth and modulation) will become essential. To utilize wireless devices capable of switching several communication parameters requires real-time self-optimizations at the radio frequency integrated… ▽ More The deployment of a growing number of devices in Internet of Things (IoT) networks implies that uninterrupted and seamless adaptation of wireless communication parameters (e.g., carrier frequency, bandwidth and modulation) will become essential. To utilize wireless devices capable of switching several communication parameters requires real-time self-optimizations at the radio frequency integrated circuit (RFIC) level based on system level performance metrics during the processing of complex modulated signals. This article introduces a novel design verification approach for reconfigurable RFICs based on end-to-end wireless system-level performance metrics while operating in a dynamically changing communication environment. In contrast to prior work, this framework includes two modules that simulate a wireless channel and decode waveforms. These are connected to circuit-level modules that capture device- and circuit-level non-idealities of RFICs for design validation and optimization, such as transistor noises, intermodulation/harmonic distortions, and memory effects from parasitic capacitances. We demonstrate this framework with a receiver (RX) consisting of a reconfigurable complementary metal-oxide semiconductor (CMOS) low-noise amplifier (LNA) designed at the transistor level, a behavioral model of a mixer, and an ideal filter model. The seamless integration between system-level wireless models with circuit-level and behavioral models (such as VerilogA-based models) for RFIC blocks enables to preemptively evaluate circuit and system designs, and to optimize for different communication scenarios with adaptive circuits having extensive tuning ranges. An exemplary case study is presented, in which simulation results reveal that the LNA power consumption can be reduced up to 16x depending on system-level requirements. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted, 5 figures, 1 table. To be included in Proc. IEEE DySPAN 2024

arXiv:2406.17002 [pdf, other]

Benchmarking mortality risk prediction from electrocardiograms

Authors: Platon Lukyanenko, Joshua Mayourian, Mingxuan Liu, John K. Triedman, Sunil J. Ghelani, William G. La Cava

Abstract: Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These da… ▽ More Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These datasets now provide an excellent resource for a broader audience to explore ECG survival modeling. Here, we benchmark survival model performance on Code-15 and MIMIC-IV with two neural network architectures, compare four deep survival modeling approaches to Cox regressions trained on classifier outputs, and evaluate performance at one to ten years. Our results yield AUROC and concordance scores comparable to past work (circa 0.8) and reasonable AUPRC scores (MIMIC-IV: 0.4-0.5, Code-15: 0.05-0.13) considering the fraction of ECG samples linked to a mortality (MIMIC-IV: 27\%, Code-15: 4\%). When evaluating models on the opposite dataset, AUROC and concordance values drop by 0.1-0.15, which may be due to cohort differences. All code and results are made public. △ Less

Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 9 pages plus appendix, 2 figures

arXiv:2406.12646 [pdf, other]

An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Yajing Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potential for performance biases that could mirror those found in task-specific deep learning models like nnU-Net. In this paper, we explored the fairness dilemma concerning large segmentation foundation models. We prospectively curate a benchmark dataset of 3D MRI and CT scans of the organs including liver, kidney, spleen, lung and aorta from a total of 1056 healthy subjects with expert segmentations. Crucially, we document demographic details such as gender, age, and body mass index (BMI) for each subject to facilitate a nuanced fairness analysis. We test state-of-the-art foundation models for medical image segmentation, including the original SAM, medical SAM and SAT models, to evaluate segmentation efficacy across different demographic groups and identify disparities. Our comprehensive analysis, which accounts for various confounding factors, reveals significant fairness concerns within these foundational models. Moreover, our findings highlight not only disparities in overall segmentation metrics, such as the Dice Similarity Coefficient but also significant variations in the spatial distribution of segmentation errors, offering empirical evidence of the nuanced challenges in ensuring fairness in medical image segmentation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to MICCAI-2024

arXiv:2406.11914 [pdf, other]

Initial Investigation of Kolmogorov-Arnold Networks (KANs) as Feature Extractors for IMU Based Human Activity Recognition

Authors: Mengxi Liu, Daniel Geißler, Dominique Nshimyimana, Sizhen Bian, Bo Zhou, Paul Lukowicz

Abstract: In this work, we explore the use of a novel neural network architecture, the Kolmogorov-Arnold Networks (KANs) as feature extractors for sensor-based (specifically IMU) Human Activity Recognition (HAR). Where conventional networks perform a parameterized weighted sum of the inputs at each node and then feed the result into a statically defined nonlinearity, KANs perform non-linear computations rep… ▽ More In this work, we explore the use of a novel neural network architecture, the Kolmogorov-Arnold Networks (KANs) as feature extractors for sensor-based (specifically IMU) Human Activity Recognition (HAR). Where conventional networks perform a parameterized weighted sum of the inputs at each node and then feed the result into a statically defined nonlinearity, KANs perform non-linear computations represented by B-SPLINES on the edges leading to each node and then just sum up the inputs at the node. Instead of learning weights, the system learns the spline parameters. In the original work, such networks have been shown to be able to more efficiently and exactly learn sophisticated real valued functions e.g. in regression or PDE solution. We hypothesize that such an ability is also advantageous for computing low-level features for IMU-based HAR. To this end, we have implemented KAN as the feature extraction architecture for IMU-based human activity recognition tasks, including four architecture variations. We present an initial performance investigation of the KAN feature extractor on four public HAR datasets. It shows that the KAN-based feature extractor outperforms CNN-based extractors on all datasets while being more parameter efficient. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: This paper is under review

arXiv:2406.08928 [pdf, other]

Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

Authors: Guodong Sun, Junjie Liu, Mingxuan Liu, Moyun Liu, Yang Zhang

Abstract: Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure a… ▽ More Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure and texture. Nevertheless, solely relying on a single type of prior information often falls short when dealing with complex scenes, necessitating improvements in generalization performance. To address these challenges, we introduce a novel self-supervised monocular depth estimation model that leverages multiple priors to bolster representation capabilities across spatial, context, and semantic dimensions. Specifically, we employ a hybrid transformer and a lightweight pose network to obtain long-range spatial priors in the spatial dimension. Then, the context prior attention is designed to improve generalization, particularly in complex structures or untextured areas. In addition, semantic priors are introduced by leveraging semantic boundary loss, and semantic prior attention is supplemented, further refining the semantic features extracted by the decoder. Experiments on three diverse datasets demonstrate the effectiveness of the proposed model. It integrates multiple priors to comprehensively enhance the representation ability, improving the accuracy and reliability of depth estimation. Codes are available at: \url{https://github.com/MVME-HBUT/MPRLNet} △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 28 pages, 12 figures

arXiv:2406.07498 [pdf, other]

RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's performance. To mitigate these problems, we extend RaD-Net to its upgraded version, RaD-Net 2. Specifically, a causality-based knowledge distillation is introduced in the first stage to use future information in a causal way. We use the non-causal repairing network as the teacher to improve the performance of the causal repairing network. In addition, in the second stage, complex axial self-attention is applied in the denoising network's complex feature encoder/decoder. Experimental results on the ICASSP 2024 SSI Challenge blind test set show that RaD-Net 2 brings 0.10 OVRL DNSMOS improvement compared to RaD-Net. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.02557 [pdf, other]

EVAN: Evolutional Video Streaming Adaptation via Neural Representation

Authors: Mufan Liu, Le Yang, Yiling Xu, Ye-kui Wang, Jenq-Neng Hwang

Abstract: Adaptive bitrate (ABR) using conventional codecs cannot further modify the bitrate once a decision has been made, exhibiting limited adaptation capability. This may result in either overly conservative or overly aggressive bitrate selection, which could cause either inefficient utilization of the network bandwidth or frequent re-buffering, respectively. Neural representation for video (NeRV), whic… ▽ More Adaptive bitrate (ABR) using conventional codecs cannot further modify the bitrate once a decision has been made, exhibiting limited adaptation capability. This may result in either overly conservative or overly aggressive bitrate selection, which could cause either inefficient utilization of the network bandwidth or frequent re-buffering, respectively. Neural representation for video (NeRV), which embeds the video content into neural network weights, allows video reconstruction with incomplete models. Specifically, the recovery of one frame can be achieved without relying on the decoding of adjacent frames. NeRV has the potential to provide high video reconstruction quality and, more importantly, pave the way for developing more flexible ABR strategies for video transmission. In this work, a new framework, named Evolutional Video streaming Adaptation via Neural representation (EVAN), which can adaptively transmit NeRV models based on soft actor-critic (SAC) reinforcement learning, is proposed. EVAN is trained with a more exploitative strategy and utilizes progressive playback to avoid re-buffering. Experiments showed that EVAN can outperform existing ABRs with 50% reduction in re-buffering and achieve nearly 20% . △ Less

Submitted 15 April, 2024; originally announced June 2024.

Comments: accepted by ICME (conference)

arXiv:2406.01646 [pdf, other]

iKAN: Global Incremental Learning with KAN for Human Activity Recognition Across Heterogeneous Datasets

Authors: Mengxi Liu, Sizhen Bian, Bo Zhou, Paul Lukowicz

Abstract: This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of spli… ▽ More This work proposes an incremental learning (IL) framework for wearable sensor human activity recognition (HAR) that tackles two challenges simultaneously: catastrophic forgetting and non-uniform inputs. The scalable framework, iKAN, pioneers IL with Kolmogorov-Arnold Networks (KAN) to replace multi-layer perceptrons as the classifier that leverages the local plasticity and global stability of splines. To adapt KAN for HAR, iKAN uses task-specific feature branches and a feature redistribution layer. Unlike existing IL methods that primarily adjust the output dimension or the number of classifier nodes to adapt to new tasks, iKAN focuses on expanding the feature extraction branches to accommodate new inputs from different sensor modalities while maintaining consistent dimensions and the number of classifier outputs. Continual learning across six public HAR datasets demonstrated the iKAN framework's incremental learning performance, with a last performance of 84.9\% (weighted F1 score) and an average incremental performance of 81.34\%, which significantly outperforms the two existing incremental learning methods, such as EWC (51.42\%) and experience replay (59.92\%). △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: This work is submitted to Ubicomp/ISWC24 and is under review

arXiv:2406.00085 [pdf, other]

Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would result in poor model generalizability. Many domain adaptation methods are designed to reduce the distributional differences between sites to some extent, but usually ignore overfitting problem of the model on the source domain. Intuitively, target data augmentation can alleviate the overfitting problem by forcing the model to learn more generalized features and reduce the dependence on source domain data. In this work, we propose a new augmentation-based unsupervised cross-domain fMRI adaptation (AUFA) framework for automatic diagnosis of MDD. The AUFA consists of 1) a graph representation learning module for extracting rs-fMRI features with spatial attention, 2) a domain adaptation module for feature alignment between source and target data, 3) an augmentation-based self-optimization module for alleviating model overfitting on the source domain, and 4) a classification module. Experimental results on 1,089 subjects suggest that AUFA outperforms several state-of-the-art methods in MDD identification. Our approach not only reduces data heterogeneity between different sites, but also localizes disease-related functional connectivity abnormalities and provides interpretability for the model. △ Less

Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.17004 [pdf, other]

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

Authors: Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou

Abstract: Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neu… ▽ More Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neural architecture search (NAS) that automates the model design process gains considerable attention because of its promising performance. However, NAS is computationally intensive due to the large search space and huge data volume. In this work, we propose an efficient NAS-based framework for visual fault detection of freight trains to search for the task-specific detection head with capacities of multi-scale representation. First, we design a scale-aware search space for discovering an effective receptive field in the head. Second, we explore the robustness of data volume to reduce search costs based on the specifically designed search space, and a novel sharing strategy is proposed to reduce memory and further improve search efficiency. Extensive experimental results demonstrate the effectiveness of our method with data volume robustness, which achieves 46.8 and 47.9 mAP on the Bottom View and Side View datasets, respectively. Our framework outperforms the state-of-the-art approaches and linearly decreases the search costs with reduced data volumes. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures

arXiv:2405.16980 [pdf, other]

DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based picking methods input an image of a shot gather, and output a binary segmentation map, in which the maximum of each column is the location of FB. However, current designed segmentation networks is difficult to ensure the horizontal continuity of the segmentation. Additionally, FB jumps also exist in some areas, and it is not easy for current networks to detect such jumps. Therefore, it is important to pick as much as possible and ensure horizontal continuity. To alleviate this problem, we propose a novel semantic segmentation network for the 2-D seismic FB picking task, where we introduce the dynamic snake convolution into U-Net and call the new segmentation network dynamic-snake U-Net (DSU-Net). Specifically, we develop original dynamic-snake convolution (DSConv) in CV and propose a novel DSConv module, which can extract the horizontal continuous feature in the shallow feature of the shot gather. Many experiments have shown that DSU-Net demonstrates higher accuracy and robustness than the other 2-D segmentation-based models, achieving state-of-the-art (SOTA) performance in 2-D seismic field surveys. Particularly, it can effectively detect FB jumps and better ensure the horizontal continuity of FB. In addition, the ablation experiment and the anti-noise experiment, respectively, verify the optimal structure of the DSConv module and the robustness of the picking. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14336 [pdf, other]

I$^2$VC: A Unified Framework for Intra- & Inter-frame Video Compression

Authors: Meiqin Liu, Chenming Xu, Yukai Gu, Chao Yao, Yao Zhao

Abstract: Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalization across different video contexts. Intra-codec techniques lack the advanced Motion Estimation and… ▽ More Video compression aims to reconstruct seamless frames by encoding the motion and residual information from existing frames. Previous neural video compression methods necessitate distinct codecs for three types of frames (I-frame, P-frame and B-frame), which hinders a unified approach and generalization across different video contexts. Intra-codec techniques lack the advanced Motion Estimation and Motion Compensation (MEMC) found in inter-codec, leading to fragmented frameworks lacking uniformity. Our proposed Intra- & Inter-frame Video Compression (I$^2$VC) framework employs a single spatio-temporal codec that guides feature compression rates according to content importance. This unified codec transforms the dependence across frames into a conditional coding scheme, thus integrating intra- and inter-frame compression into one cohesive strategy. Given the absence of explicit motion data, achieving competent inter-frame compression with only a conditional codec poses a challenge. To resolve this, our approach includes an implicit inter-frame alignment mechanism. With the pre-trained diffusion denoising process, the utilization of a diffusion-inverted reference feature rather than random noise supports the initial compression state. This process allows for selective denoising of motion-rich regions based on decoded features, facilitating accurate alignment without the need for MEMC. Our experimental findings, across various compression configurations (AI, LD and RA) and frame types, prove that I$^2$VC outperforms the state-of-the-art perceptual learned codecs. Impressively, it exhibits a 58.4% enhancement in perceptual reconstruction performance when benchmarked against the H.266/VVC standard (VTM). Official implementation can be found at https://github.com/GYukai/I2VC. △ Less

Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 19 pages, 10 figures

arXiv:2405.09586 [pdf, other]

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

Abstract: The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal a… ▽ More The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal alignment. Additionally, existing methods for similar historical cases retrieval face suboptimal performance owing to the modal gap issue. In response, this paper introduces a novel method, named Factual Serialization Enhancement (FSE), for chest X-ray report generation. FSE begins with the structural entities approach to eliminate presentation-style vocabulary in reports, providing specific input for our model. Then, uni-modal features are learned through cross-modal alignment between images and factual serialization in reports. Subsequently, we present a novel approach to retrieve similar historical cases from the training set, leveraging aligned image features. These features implicitly preserve semantic similarity with their corresponding reference reports, enabling us to calculate similarity solely among aligned features. This effectively eliminates the modal gap issue for knowledge retrieval without the requirement for disease labels. Finally, the cross-modal fusion network is employed to query valuable information from these cases, enriching image features and aiding the text decoder in generating high-quality reports. Experiments on MIMIC-CXR and IU X-ray datasets from both specific and general scenarios demonstrate the superiority of FSE over state-of-the-art approaches in both natural language generation and clinical efficacy metrics. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.06178 [pdf, other]

ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI

Authors: Yuqi Fang, Junhao Zhang, Linmin Wang, Qianqian Wang, Mingxia Liu

Abstract: Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies… ▽ More Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies usually focus on analyzing fMRI using conventional machine learning models that rely on human-engineered fMRI features, without investigating deep learning models that can automatically learn data-driven fMRI representations. In this work, we develop an open-source toolbox, called Augmentation and Computation Toolbox for braIn netwOrk aNalysis (ACTION), offering comprehensive functions to streamline fMRI analysis. The ACTION is a Python-based and cross-platform toolbox with graphical user-friendly interfaces. It enables automatic fMRI augmentation, covering blood-oxygen-level-dependent (BOLD) signal augmentation and brain network augmentation. Many popular methods for brain network construction and network feature extraction are included. In particular, it supports constructing deep learning models, which leverage large-scale auxiliary unlabeled data (3,800+ resting-state fMRI scans) for model pretraining to enhance model performance for downstream tasks. To facilitate multi-site fMRI studies, it is also equipped with several popular federated learning strategies. Furthermore, it enables users to design and test custom algorithms through scripting, greatly improving its utility and extensibility. We demonstrate the effectiveness and user-friendliness of ACTION on real fMRI data and present the experimental results. The software, along with its source code and manual, can be accessed online. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, 5 tables

arXiv:2405.02504 [pdf, other]

Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI

Authors: Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu

Abstract: Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain f… ▽ More Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain functional information conveyed by PET. To this end, we propose a functional imaging constrained diffusion (FICD) framework for 3D brain PET image synthesis with paired structural MRI as input condition, through a new constrained diffusion model (CDM). The FICD introduces noise to PET and then progressively removes it with CDM, ensuring high output fidelity throughout a stable training phase. The CDM learns to predict denoised PET with a functional imaging constraint introduced to ensure voxel-wise alignment between each denoised PET and its ground truth. Quantitative and qualitative analyses conducted on 293 subjects with paired T1-weighted MRI and 18F-fluorodeoxyglucose (FDG)-PET scans suggest that FICD achieves superior performance in generating FDG-PET data compared to state-of-the-art methods. We further validate the effectiveness of the proposed FICD on data from a total of 1,262 subjects through three downstream tasks, with experimental results suggesting its utility and generalizability. △ Less

Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01750 [pdf, other]

PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems

Authors: Walter Zimmer, Ramandika Pranamulia, Xingcheng Zhou, Mingyu Liu, Alois C. Knoll

Abstract: In the context of Intelligent Transportation Systems (ITS), efficient data compression is crucial for managing large-scale point cloud data acquired by roadside LiDAR sensors. The demand for efficient storage, streaming, and real-time object detection capabilities for point cloud data is substantial. This work introduces PointCompress3D, a novel point cloud compression framework tailored specifica… ▽ More In the context of Intelligent Transportation Systems (ITS), efficient data compression is crucial for managing large-scale point cloud data acquired by roadside LiDAR sensors. The demand for efficient storage, streaming, and real-time object detection capabilities for point cloud data is substantial. This work introduces PointCompress3D, a novel point cloud compression framework tailored specifically for roadside LiDARs. Our framework addresses the challenges of compressing high-resolution point clouds while maintaining accuracy and compatibility with roadside LiDAR sensors. We adapt, extend, integrate, and evaluate three cutting-edge compression methods using our real-world-based TUMTraf dataset family. We achieve a frame rate of 10 FPS while keeping compression sizes below 105 Kb, a reduction of 50 times, and maintaining object detection performance on par with the original data. In extensive experiments and ablation studies, we finally achieved a PSNR d2 of 94.46 and a BPP of 6.54 on our dataset. Future work includes the deployment on the live system. The code is available on our project website: https://pointcompress3d.github.io. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.19615 [pdf, other]

SemiPL: A Semi-supervised Method for Event Sound Source Localization

Authors: Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many app… ▽ More In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many applications, e.g., crowd management, and emergency response services. In this paper, we apply the existing model to a more complex dataset, explore the influence of parameters on the model, and propose a semi-supervised improvement method SemiPL. With the increase in data quantity and the influence of label quality, self-supervised learning will be an unstoppable trend. The experiment shows that the parameter adjustment will positively affect the existing model. In particular, SSPL achieved an improvement of 12.2% cIoU and 0.56% AUC in Chaotic World compared to the results provided. The code is available at: https://github.com/ly245422/SSPL △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16324 [pdf, other]

Improved impedance inversion by deep learning and iterated graph Laplacian

Authors: Davide Bianchi, Florian Bossmann, Wenlong Wang, Mingming Liu

Abstract: Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Lap… ▽ More Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Laplacian and show its application in acoustic impedance inversion which is a routine procedure in seismic explorations. A neural network is used to obtain a first approximation of the underlying acoustic impedance and construct a graph Laplacian matrix from this approximation. Afterwards, we use a Tikhonov-like variational method to solve the impedance inversion problem where the regularizer is based on the constructed graph Laplacian. The obtained solution can be shown to be more accurate and stable with respect to noise than the initial guess obtained by the neural network. This process can be iterated several times, each time constructing a new graph Laplacian matrix from the most recent reconstruction. The method converges after only a few iterations returning a much more accurate reconstruction. We demonstrate the potential of our method on two different datasets and under various levels of noise. We use two different neural networks that have been introduced in previous works. The experiments show that our approach improves the reconstruction quality in the presence of noise. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Report number: submitted to SEG Geophysics (June 2024)

arXiv:2404.12062 [pdf, other]

doi 10.1007/978-981-99-8388-9_23

MIDGET: Music Conditioned 3D Dance Generation

Authors: Jinwu Wang, Wei Mao, Miaomiao Liu

Abstract: In this paper, we introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET based on Dance motion Vector Quantised Variational AutoEncoder (VQ-VAE) model and Motion Generative Pre-Training (GPT) model to generate vibrant and highquality dances that match the music rhythm. To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the… ▽ More In this paper, we introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET based on Dance motion Vector Quantised Variational AutoEncoder (VQ-VAE) model and Motion Generative Pre-Training (GPT) model to generate vibrant and highquality dances that match the music rhythm. To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the Motion VQ-VAE model to store different human pose codes, 2) employing Motion GPT model to generate pose codes with music and motion Encoders, 3) a simple framework for music feature extraction. We compare with existing state-of-the-art models and perform ablation experiments on AIST++, the largest publicly available music-dance dataset. Experiments demonstrate that our proposed framework achieves state-of-the-art performance on motion quality and its alignment with the music. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures Published in AI 2023: Advances in Artificial Intelligence

Journal ref: In Australasian Joint Conference on Artificial Intelligence (pp. 277-288). Singapore: Springer Nature Singapore 2023

arXiv:2404.04472 [pdf, other]

doi 10.1145/3653974

Recovery from Adversarial Attacks in Cyber-physical Systems: Shallow, Deep and Exploratory Works

Authors: Pengyuan Lu, Lin Zhang, Mengyu Liu, Kaustubh Sridhar, Fanxin Kong, Oleg Sokolsky, Insup Lee

Abstract: Cyber-physical systems (CPS) have experienced rapid growth in recent decades. However, like any other computer-based systems, malicious attacks evolve mutually, driving CPS to undesirable physical states and potentially causing catastrophes. Although the current state-of-the-art is well aware of this issue, the majority of researchers have not focused on CPS recovery, the procedure we defined as r… ▽ More Cyber-physical systems (CPS) have experienced rapid growth in recent decades. However, like any other computer-based systems, malicious attacks evolve mutually, driving CPS to undesirable physical states and potentially causing catastrophes. Although the current state-of-the-art is well aware of this issue, the majority of researchers have not focused on CPS recovery, the procedure we defined as restoring a CPS's physical state back to a target condition under adversarial attacks. To call for attention on CPS recovery and identify existing efforts, we have surveyed a total of 30 relevant papers. We identify a major partition of the proposed recovery strategies: shallow recovery vs. deep recovery, where the former does not use a dedicated recovery controller while the latter does. Additionally, we surveyed exploratory research on topics that facilitate recovery. From these publications, we discuss the current state-of-the-art of CPS recovery, with respect to applications, attack type, attack surfaces and system dynamics. Then, we identify untouched sub-domains in this field and suggest possible future directions for researchers. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Journal ref: ACM Computing Surveys 1 (2024) 1-31

arXiv:2404.01082 [pdf, other]

The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Liping Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and mapping raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and mapping tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field. △ Less

Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 25 pages, 17 figures

arXiv:2403.11061 [pdf, other]

Beamforming Design for Double-Active-RIS-aided Communication Systems with Inter-Excitation

Authors: Boshi Wang, Cunhua Pan, Hong Ren, Zhiyuan Yu, Yang Zhang, Mengyu Liu, Gui Zhou

Abstract: In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, they can effectively mitigate the multiplicative fading effect. However, t… ▽ More In this paper, we investigate a double-active-reconfigurable intelligent surface (RIS)-aided downlink wireless communication system, where a multi-antenna base station (BS) serves multiple single-antenna users with both double reflection and single reflection links. Due to the signal amplification capability of active RISs, they can effectively mitigate the multiplicative fading effect. However, this also induces signal bouncing between the two active RISs that cannot be ignored. This phenomenon is termed as the "inter-excitation" effect and is characterized in the received signal by proposing a feedback-type model. Based on the signal model, we formulate a weighted sum rate (WSR) maximization problem by jointly optimizing the beamforming matrix at the BS and the reflecting coefficient matrices at the two active RISs, subject to power constraints at the BS and active RISs, as well as the maximum amplification gain constraints of the active RISs. To solve this non-convex problem, we first transform the problem into a more tractable form using the fractional programming (FP) method. Then, by introducing auxiliary variables, the problem can be converted into an equivalent form that can be solved by using a penalty dual decomposition (PDD) algorithm. Finally, simulation results indicate that it proposed scheme outperforms benchmark schemes with single active RIS and double passive RISs in terms of achievable rate. Furthermore, the results demonstrate that the proposed scheme can enhance the WSR by 30\% compared to scenarios that do not take this effect into account when the maximum amplification gain is 40 dB. Additionally, the proposed scheme is capable of achieving high WSR performance at most locations where double active RISs are deployed between the BS and the users, thereby providing greater flexibility in their positioning. △ Less

Submitted 23 August, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.08162 [pdf, other]

Iterative Learning for Joint Image Denoising and Motion Artifact Correction of 3D Brain MRI

Authors: Lintao Zhang, Mengqi Wu, Lihong Wang, David C. Steffens, Guy G. Potter, Mingxia Liu

Abstract: Image noise and motion artifacts greatly affect the quality of brain MRI and negatively influence downstream medical image analysis. Previous studies often focus on 2D methods that process each volumetric MR image slice-by-slice, thus losing important 3D anatomical information. Additionally, these studies generally treat image denoising and artifact correction as two standalone tasks, without cons… ▽ More Image noise and motion artifacts greatly affect the quality of brain MRI and negatively influence downstream medical image analysis. Previous studies often focus on 2D methods that process each volumetric MR image slice-by-slice, thus losing important 3D anatomical information. Additionally, these studies generally treat image denoising and artifact correction as two standalone tasks, without considering their potential relationship, especially on low-quality images where severe noise and motion artifacts occur simultaneously. To address these issues, we propose a Joint image Denoising and motion Artifact Correction (JDAC) framework via iterative learning to handle noisy MRIs with motion artifacts, consisting of an adaptive denoising model and an anti-artifact model. In the adaptive denoising model, we first design a novel noise level estimation strategy, and then adaptively reduce the noise through a U-Net backbone with feature normalization conditioning on the estimated noise variance. The anti-artifact model employs another U-Net for eliminating motion artifacts, incorporating a novel gradient-based loss function designed to maintain the integrity of brain anatomy during the motion correction process. These two models are iteratively employed for joint image denoising and artifact correction through an iterative learning framework. An early stopping strategy depending on noise level estimation is applied to accelerate the iteration process. The denoising model is trained with 9,544 T1-weighted MRIs with manually added Gaussian noise as supervision. The anti-artifact model is trained on 552 T1-weighted MRIs with motion artifacts and paired motion-free images. Experimental results on a public dataset and a clinical study suggest the effectiveness of JDAC in both tasks of denoising and motion artifact correction, compared with several state-of-the-art methods. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.17909 [pdf, other]

Simulation of Muon Tomography Projections to Image the Pyramids of Giza

Authors: Mira Liu

Abstract: Purpose: A geometric simulation of a possible two-plane detector was developed to test the abilities of the detector to generate high-resolution images of the Great Pyramid using muon tomography. Methods and Materials: Trajectory range, angular resolution, and acceptance of the detector were calculated with a simulation. Trajectories and the corresponding sinogram space covered were simulated firs… ▽ More Purpose: A geometric simulation of a possible two-plane detector was developed to test the abilities of the detector to generate high-resolution images of the Great Pyramid using muon tomography. Methods and Materials: Trajectory range, angular resolution, and acceptance of the detector were calculated with a simulation. Trajectories and the corresponding sinogram space covered were simulated first with one detector in one location, and then two moving detectors on adjacent sides of the pyramid. The resolution at the center slice of the pyramid was calculated using the angular resolution of the detector. Results: The simulation returned trajectory range encompassing the pyramid and peak angular resolution of .0004sr. Sinogram space covered by one position was inadequate, however two moving detectors on adjacent sides of the pyramid cover a significant portion. Resolution at the center of the pyramid is roughly 3m. Conclusions: The simulation provides a way to calculate the detector positions needed to cover an adequate amount of sinogram space for high-resolution cosmic-ray tomographic reconstruction of the Great Pyramids. Key Words: high-resolution muon tomography, one-sided tomography, sinogram simulation, detector simulation △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 7 pages, 13 figures. Written as a rotation report for the fulfillment of MPHY 41800 Research in Advanced Tomographic Imaging Autumn 2019. Not submitted for peer-reviewed publication elsewhere

arXiv:2402.09445 [pdf, other]

doi 10.1109/PerCom59722.2024.10494489

iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition

Authors: Mengxi Liu, Vitor Fortes Rey, Yu Zhang, Lala Shakti Swarup Ray, Bo Zhou, Paul Lukowicz

Abstract: Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experimen… ▽ More Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning.To evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist.The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3.22 \% reaching 84.71 \% compared to the 81.49 \% of the IMU baseline model. We have also shown how bio-impedance can improve human activity recognition (HAR) directly through sensor fusion, reaching an average Macro F1 score of 89.57 \% (two modalities required for both training and inference) even if Bio-impedance alone has an average macro F1 score of 75.36 \%, which is outperformed by IMU alone. In addition, similar results were obtained in an extended study on lower body fitness activity classification, demonstrating the generalisability of our approach.Our findings underscore the potential of sensor fusion and contrastive learning as valuable tools for advancing fitness activity recognition, with bio-impedance playing a pivotal role in augmenting the capabilities of IMU-based systems. △ Less

Submitted 3 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: Accepted by percom2024

arXiv:2402.09434 [pdf, other]

Disentangling Imperfect: A Wavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data

Authors: Mengna Liu, Dong Xiang, Xu Cheng, Xiufeng Liu, Dalin Zhang, Shengyong Chen, Christian S. Jensen

Abstract: The popularity and diffusion of wearable devices provides new opportunities for sensor-based human activity recognition that leverages deep learning-based algorithms. Although impressive advances have been made, two major challenges remain. First, sensor data is often incomplete or noisy due to sensor placement and other issues as well as data transmission failure, calling for imputation of missin… ▽ More The popularity and diffusion of wearable devices provides new opportunities for sensor-based human activity recognition that leverages deep learning-based algorithms. Although impressive advances have been made, two major challenges remain. First, sensor data is often incomplete or noisy due to sensor placement and other issues as well as data transmission failure, calling for imputation of missing values, which also introduces noise. Second, human activity has multi-scale characteristics. Thus, different groups of people and even the same person may behave differently under different circumstances. To address these challenges, we propose a multilevel heterogeneous neural network, called MHNN, for sensor data analysis. We utilize multilevel discrete wavelet decomposition to extract multi-resolution features from sensor data. This enables distinguishing signals with different frequencies, thereby suppressing noise. As the components resulting from the decomposition are heterogeneous, we equip the proposed model with heterogeneous feature extractors that enable the learning of multi-scale features. Due to the complementarity of these features, we also include a cross aggregation module for enhancing their interactions. An experimental study using seven publicly available datasets offers evidence that MHNN can outperform other cutting-edge models and offers evidence of robustness to missing values and noise. An ablation study confirms the importance of each module. △ Less

Submitted 26 January, 2024; originally announced February 2024.

Comments: 14 pages, 7 figures

arXiv:2402.06875 [pdf, other]

Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework

Authors: Mengqi Wu, Lintao Zhang, Pew-Thian Yap, Hongtu Zhu, Mingxia Liu

Abstract: Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However,… ▽ More Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However, existing methods generally suffer from high computational requirements and limited generalizability, restricting their applicability to unseen MRIs. In this paper, we design a novel disentangled latent energy-based style translation (DLEST) framework for unpaired image-level MRI harmonization, consisting of (a) site-invariant image generation (SIG), (b) site-specific style translation (SST), and (c) site-specific MRI synthesis (SMS). Specifically, the SIG employs a latent autoencoder to encode MRIs into a low-dimensional latent space and reconstruct MRIs from latent codes. The SST utilizes an energy-based model to comprehend the global latent distribution of a target domain and translate source latent codes toward the target domain, while SMS enables MRI synthesis with a target-specific style. By disentangling image generation and style translation in latent space, the DLEST can achieve efficient style translation. Our model was trained on T1-weighted MRIs from a public dataset (with 3,984 subjects across 58 acquisition sites/settings) and validated on an independent dataset (with 9 traveling subjects scanned in 11 sites/settings) in four tasks: histogram and feature visualization, site classification, brain tissue segmentation, and site-specific structural MRI synthesis. Qualitative and quantitative results demonstrate the superiority of our method over several state-of-the-arts. △ Less

Submitted 29 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.05373 [pdf, other]

Unleashing the Infinity Power of Geometry: A Novel Geometry-Aware Transformer (GOAT) for Whole Slide Histopathology Image Analysis

Authors: Mingxin Liu, Yunzan Liu, Pengbo Xu, Jiquan Ma

Abstract: The histopathology analysis is of great significance for the diagnosis and prognosis of cancers, however, it has great challenges due to the enormous heterogeneity of gigapixel whole slide images (WSIs) and the intricate representation of pathological features. However, recent methods have not adequately exploited geometrical representation in WSIs which is significant in disease diagnosis. Theref… ▽ More The histopathology analysis is of great significance for the diagnosis and prognosis of cancers, however, it has great challenges due to the enormous heterogeneity of gigapixel whole slide images (WSIs) and the intricate representation of pathological features. However, recent methods have not adequately exploited geometrical representation in WSIs which is significant in disease diagnosis. Therefore, we proposed a novel weakly-supervised framework, Geometry-Aware Transformer (GOAT), in which we urge the model to pay attention to the geometric characteristics within the tumor microenvironment which often serve as potent indicators. In addition, a context-aware attention mechanism is designed to extract and enhance the morphological features within WSIs. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures. Accepted by 21st IEEE International Symposium on Biomedical Imaging (ISBI 2024)

arXiv:2402.04532 [pdf, other]

Joint Beamforming Design for Double Active RIS-assisted Radar-Communication Coexistence Systems

Authors: Mengyu Liu, Hong Ren, Cunhua Pan, Boshi Wang, Zhiyuan Yu, Ruisong Weng, Kangda Zhi, Yongchao He

Abstract: Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigur… ▽ More Integrated sensing and communication (ISAC) technology has been considered as one of the key candidate technologies in the next-generation wireless communication systems. However, when radar and communication equipment coexist in the same system, i.e. radar-communication coexistence (RCC), the interference from communication systems to radar can be large and cannot be ignored. Recently, reconfigurable intelligent surface (RIS) has been introduced into RCC systems to reduce the interference. However, the "multiplicative fading" effect introduced by passive RIS limits its performance. To tackle this issue, we consider a double active RIS-assisted RCC system, which focuses on the design of the radar's beamforming vector and the active RISs' reflecting coefficient matrices, to maximize the achievable data rate of the communication system. The considered system needs to meet the radar detection constraint and the power budgets at the radar and the RISs. Since the problem is non-convex, we propose an algorithm based on the penalty dual decomposition (PDD) framework. Specifically, we initially introduce auxiliary variables to reformulate the coupled variables into equation constraints and incorporate these constraints into the objective function through the PDD framework. Then, we decouple the equivalent problem into several subproblems by invoking the block coordinate descent (BCD) method. Furthermore, we employ the Lagrange dual method to alternately optimize these subproblems. Simulation results verify the effectiveness of the proposed algorithm. Furthermore, the results also show that under the same power budget, deploying double active RISs in RCC systems can achieve higher data rate than those with single active RIS and double passive RISs. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02634 [pdf, other]

Key-Graph Transformer for Image Restoration

Authors: Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

Abstract: While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In… ▽ More While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In response to these challenges, we introduce the Key-Graph Transformer (KGT) in this paper. Specifically, KGT views patch features as graph nodes. The proposed Key-Graph Constructor efficiently forms a sparse yet representative Key-Graph by selectively connecting essential nodes instead of all the nodes. Then the proposed Key-Graph Attention is conducted under the guidance of the Key-Graph only among selected nodes with linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed KGT's state-of-the-art performance, showcasing advancements both quantitatively and qualitatively. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 9 pages, 6 figures

arXiv:2401.08476 [pdf, other]

Incentivizing Secure Software Development: The Role of Liability (Waiver) and Audit

Authors: Ziyuan Huang, Gergely Biczók, Mingyan Liu

Abstract: Misaligned incentives in secure software development have long been the focus of research in the economics of security. Product liability, a powerful legal framework in other industries, has been largely ineffective for software products until recent times. However, the rapid regulatory responses to recent global cyberattacks by both the United States and the European Union, together with the (rel… ▽ More Misaligned incentives in secure software development have long been the focus of research in the economics of security. Product liability, a powerful legal framework in other industries, has been largely ineffective for software products until recent times. However, the rapid regulatory responses to recent global cyberattacks by both the United States and the European Union, together with the (relative) success of the General Data Protection Regulation in defining both duty and standard of care for software vendors, may just enable regulators to use liability to re-align incentives for the benefit of the digital society. Specifically, the recently proposed United States National Cybersecurity Strategy shifts responsibility for cyber incidents back to software vendors. In doing so, the strategy also puts forward the concept of the liability waiver: if a software company voluntarily undergoes and passes an IT security audit, its liability is waived. In this paper, we analyze this audit scenario from the aspect of the software vendor. We propose a mechanism where a software vendor should first undergo a repeated auditing process in each stage of which the vendor decides whether to quit early or stay with additional security investment. We show that the optimal strategy for an opt-in vendor is to never quit; and exert cumulative investments in either "one-and-done" or "incremental" manner. We relate the audit mechanism to a liability waiver insurance policy and revealed its effect on reshaping the vendor's risk perception. We also discuss influence of audit quality on the vendor's incentives and pinpoint that a desirable audit rule should be highly accurate and less strict. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 21 pages, 6 figures, submitted to the 23rd Workshop on the Economics of Information Security

ACM Class: K.6.5; I.2.8

arXiv:2401.06000 [pdf, other]

Body-Area Capacitive or Electric Field Sensing for Human Activity Recognition and Human-Computer Interaction: A Comprehensive Survey

Authors: Sizhen Bian, Mengxi Liu, Bo Zhou, Paul Lukowicz, Michele Magno

Abstract: Due to the fact that roughly sixty percent of the human body is essentially composed of water, the human body is inherently a conductive object, being able to, firstly, form an inherent electric field from the body to the surroundings and secondly, deform the distribution of an existing electric field near the body. Body-area capacitive sensing, also called body-area electric field sensing, is bec… ▽ More Due to the fact that roughly sixty percent of the human body is essentially composed of water, the human body is inherently a conductive object, being able to, firstly, form an inherent electric field from the body to the surroundings and secondly, deform the distribution of an existing electric field near the body. Body-area capacitive sensing, also called body-area electric field sensing, is becoming a promising alternative for wearable devices to accomplish certain tasks in human activity recognition and human-computer interaction. Over the last decade, researchers have explored plentiful novel sensing systems backed by the body-area electric field. On the other hand, despite the pervasive exploration of the body-area electric field, a comprehensive survey does not exist for an enlightening guideline. Moreover, the various hardware implementations, applied algorithms, and targeted applications result in a challenging task to achieve a systematic overview of the subject. This paper aims to fill in the gap by comprehensively summarizing the existing works on body-area capacitive sensing so that researchers can have a better view of the current exploration status. To this end, we first sorted the explorations into three domains according to the involved body forms: body-part electric field, whole-body electric field, and body-to-body electric field, and enumerated the state-of-art works in the domains with a detailed survey of the backed sensing tricks and targeted applications. We then summarized the three types of sensing frontends in circuit design, which is the most critical part in body-area capacitive sensing, and analyzed the data processing pipeline categorized into three kinds of approaches. Finally, we described the challenges and outlooks of body-area electric sensing. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05426 [pdf, other]

CoSS: Co-optimizing Sensor and Sampling Rate for Data-Efficient AI in Human Activity Recognition

Authors: Mengxi Liu, Zimin Zhao, Daniel Geißler, Bo Zhou, Sungho Suh, Paul Lukowicz

Abstract: Recent advancements in Artificial Neural Networks have significantly improved human activity recognition using multiple time-series sensors. While employing numerous sensors with high-frequency sampling rates usually improves the results, it often leads to data inefficiency and unnecessary expansion of the ANN, posing a challenge for their practical deployment on edge devices. Addressing these iss… ▽ More Recent advancements in Artificial Neural Networks have significantly improved human activity recognition using multiple time-series sensors. While employing numerous sensors with high-frequency sampling rates usually improves the results, it often leads to data inefficiency and unnecessary expansion of the ANN, posing a challenge for their practical deployment on edge devices. Addressing these issues, our work introduces a pragmatic framework for data-efficient utilization in HAR tasks, considering the optimization of both sensor modalities and sampling rate simultaneously. Central to our approach are the designed trainable parameters, termed 'Weight Scores,' which assess the significance of each sensor modality and sampling rate during the training phase. These scores guide the sensor modalities and sampling rate selection. The pruning method allows users to make a trade-off between computational budgets and performance by selecting the sensor modalities and sampling rates according to the weight score ranking. We tested our framework's effectiveness in optimizing sensor modality and sampling rate selection using three public HAR benchmark datasets. The results show that the sensor and sampling rate combination selected via CoSS achieves similar classification performance to configurations using the highest sampling rate with all sensors but at a reduced hardware cost. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Accepeted by the 2nd Workshop on Sustainable AI (AAAI24)

arXiv:2401.04389 [pdf, other]

RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

Abstract: This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training… ▽ More This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training stage. Finally, we use a three-step training strategy to optimize our model. We submit two models with different sets of parameters to meet the RTF requirement of the two tracks. According to the official results, the proposed systems rank 2nd in track 1 and 3rd in track 2. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: submitted to ICASSP 2024

arXiv:2401.02673 [pdf, other]

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Authors: Dongdi Zhao, Jianbo Ma, Lu Lu, Jinke Li, Xuan Ji, Lei Zhu, Fuming Fang, Ming Liu, Feijun Jiang

Abstract: Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen… ▽ More Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2312.09760 [pdf, other]

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41% compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ASRU2023

arXiv:2311.18074 [pdf, other]

Game Projection and Robustness for Game-Theoretic Autonomous Driving

Authors: Mushuang Liu, H. Eric Tseng, Dimitar Filev, Anouck Girard, Ilya Kolmanovsky

Abstract: Game-theoretic approaches are envisioned to bring human-like reasoning skills and decision-making processes for autonomous vehicles (AVs). However, challenges including game complexity and incomplete information still remain to be addressed before they can be sufficiently practical for real-world use. Game complexity refers to the difficulties of solving a multi-player game, which include solution… ▽ More Game-theoretic approaches are envisioned to bring human-like reasoning skills and decision-making processes for autonomous vehicles (AVs). However, challenges including game complexity and incomplete information still remain to be addressed before they can be sufficiently practical for real-world use. Game complexity refers to the difficulties of solving a multi-player game, which include solution existence, algorithm convergence, and scalability. To address these difficulties, a potential game based framework was developed in our recent work. However, conditions on cost function design need to be enforced to make the game a potential game. This paper relaxes the conditions and makes the potential game approach applicable to more general scenarios, even including the ones that cannot be molded as a potential game. Incomplete information refers to the ego vehicle's lack of knowledge of other traffic agents' cost functions. Cost function deviations between the ego vehicle estimated/learned other agents' cost functions and their actual ones are often inevitable. This motivates us to study the robustness of a game-theoretic solution. This paper defines the robustness margin of a game solution as the maximum magnitude of cost function deviations that can be accommodated in a game without changing the optimality of the game solution. With this definition, closed-form robustness margins are derived. Numerical studies using highway lane-changing scenarios are reported. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15607 [pdf, other]

Spatially Covariant Image Registration with Text Prompts

Authors: Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, Hang Zhang

Abstract: Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduc… ▽ More Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduces textSCF, a novel method that integrates spatially covariant filters and textual anatomical prompts encoded by visual-language models, to fill this gap. This approach optimizes an implicit function that correlates text embeddings of anatomical regions to filter weights, relaxing the typical translation-invariance constraint of convolutional operations. TextSCF not only boosts computational efficiency but can also retain or improve registration accuracy. By capturing the contextual interplay between anatomical regions, it offers impressive inter-regional transferability and the ability to preserve structural discontinuities during registration. TextSCF's performance has been rigorously tested on inter-subject brain MRI and abdominal CT registration tasks, outperforming existing state-of-the-art models in the MICCAI Learn2Reg 2021 challenge and leading the leaderboard. In abdominal registrations, textSCF's larger model variant improved the Dice score by 11.3% over the second-best model, while its smaller variant maintained similar accuracy but with an 89.13% reduction in network parameters and a 98.34\% decrease in computational operations. △ Less

Submitted 5 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 13 pages, 8 figures, 6 tables

arXiv:2311.12842 [pdf, other]

Multimodal Identification of Alzheimer's Disease: A Review

Authors: Guian Fang, Mengsha Liu, Yi Zhong, Zhuolin Zhang, Jiehui Huang, Zhenchao Tang, Calvin Yu-Chian Chen

Abstract: Alzheimer's disease is a progressive neurological disorder characterized by cognitive impairment and memory loss. With the increasing aging population, the incidence of AD is continuously rising, making early diagnosis and intervention an urgent need. In recent years, a considerable number of teams have applied computer-aided diagnostic techniques to early classification research of AD. Most studi… ▽ More Alzheimer's disease is a progressive neurological disorder characterized by cognitive impairment and memory loss. With the increasing aging population, the incidence of AD is continuously rising, making early diagnosis and intervention an urgent need. In recent years, a considerable number of teams have applied computer-aided diagnostic techniques to early classification research of AD. Most studies have utilized imaging modalities such as magnetic resonance imaging (MRI), positron emission tomography (PET), and electroencephalogram (EEG). However, there have also been studies that attempted to use other modalities as input features for the models, such as sound, posture, biomarkers, cognitive assessment scores, and their fusion. Experimental results have shown that the combination of multiple modalities often leads to better performance compared to a single modality. Therefore, this paper will focus on different modalities and their fusion, thoroughly elucidate the mechanisms of various modalities, explore which methods should be combined to better harness their utility, analyze and summarize the literature in the field of early classification of AD in recent years, in order to explore more possibilities of modality combinations. △ Less

Submitted 6 October, 2023; originally announced November 2023.

Showing 1–50 of 239 results for author: Liu, M