-
Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting
Authors:
Zhenyu Wang,
Li Wan,
Biqiao Zhang,
Yiteng Huang,
Shang-Wen Li,
Ming Sun,
Xin Lei,
Zhaojun Yang
Abstract:
A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource…
▽ More
A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource-aware disentangled learning with adversarial examples to reduce the mismatch between the original and adversarial data as well as the mismatch across original training datasources. The KWS model architecture is based on depth-wise separable convolution and a simple attention module. Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31%$ at $1%$ false accept rate on the internal dataset, compared to the strongest baseline without using adversarial examples. Our best-performing system achieves $98.06%$ accuracy on the Google Speech Commands V1 dataset.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
High Probability Low Latency Sequential Change Detection over an Unknown Finite Horizon
Authors:
Yu-Han Huang,
Venugopal V. Veeravalli
Abstract:
A finite horizon variant of the quickest change detection problem is studied, in which the goal is to minimize a delay threshold (latency), under constraints on the probability of false alarm and the probability that the latency is exceeded. In addition, the horizon is not known to the change detector. A variant of the cumulative sum (CuSum) test with a threshold that increasing logarithmically wi…
▽ More
A finite horizon variant of the quickest change detection problem is studied, in which the goal is to minimize a delay threshold (latency), under constraints on the probability of false alarm and the probability that the latency is exceeded. In addition, the horizon is not known to the change detector. A variant of the cumulative sum (CuSum) test with a threshold that increasing logarithmically with time is proposed as a candidate solution to the problem. An information-theoretic lower bound on the minimum value of the latency under the constraints is then developed. This lower bound is used to establish certain asymptotic optimality properties of the proposed test in terms of the horizon and the false alarm probability. Some experimental results are given to illustrate the performance of the test.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature Scaling
Authors:
Ruiquan Ge,
Xiao Yu,
Yifei Chen,
Fan Jia,
Shenghao Zhu,
Guanyu Zhou,
Yiyu Huang,
Chenyan Zhang,
Dong Zeng,
Changmiao Wang,
Qiegen Liu,
Shanzhou Niu
Abstract:
Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic…
▽ More
Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic clipping strategy. TC-KANRecon model aims to accelerate the MRI reconstruction process through deep learning methods while maintaining the quality of the reconstructed images. The MF-UKAN module can effectively balance the tradeoff between image denoising and structure preservation. Specifically, it presents the multi-head attention mechanisms and scalar modulation factors, which significantly enhances the model's robustness and structure preservation capabilities in complex noise environments. Moreover, the dynamic clipping strategy in TC-KANRecon adjusts the cropping interval according to the sampling steps, thereby mitigating image detail loss typically caused by traditional cropping methods and enriching the visual features of the images. Furthermore, the MC-Model module incorporates full-sampling k-space information, realizing efficient fusion of conditional information, enhancing the model's ability to process complex data, and improving the realism and detail richness of reconstructed images. Experimental results demonstrate that the proposed method outperforms other MRI reconstruction methods in both qualitative and quantitative evaluations. Notably, TC-KANRecon method exhibits excellent reconstruction results when processing high-noise, low-sampling-rate MRI data. Our source code is available at https://github.com/lcbkmm/TC-KANRecon.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile Phones
Authors:
Xuanyu Liu,
Haoxian Liu,
Jiao Li,
Zongqi Yang,
Yi Huang,
Jin Zhang
Abstract:
Atrial fibrillation (AF) is characterized by irregular electrical impulses originating in the atria, which can lead to severe complications and even death. Due to the intermittent nature of the AF, early and timely monitoring of AF is critical for patients to prevent further exacerbation of the condition. Although ambulatory ECG Holter monitors provide accurate monitoring, the high cost of these d…
▽ More
Atrial fibrillation (AF) is characterized by irregular electrical impulses originating in the atria, which can lead to severe complications and even death. Due to the intermittent nature of the AF, early and timely monitoring of AF is critical for patients to prevent further exacerbation of the condition. Although ambulatory ECG Holter monitors provide accurate monitoring, the high cost of these devices hinders their wider adoption. Current mobile-based AF detection systems offer a portable solution. However, these systems have various applicability issues, such as being easily affected by environmental factors and requiring significant user effort. To overcome the above limitations, we present AcousAF, a novel AF detection system based on acoustic sensors of smartphones. Particularly, we explore the potential of pulse wave acquisition from the wrist using smartphone speakers and microphones. In addition, we propose a well-designed framework comprised of pulse wave probing, pulse wave extraction, and AF detection to ensure accurate and reliable AF detection. We collect data from 20 participants utilizing our custom data collection application on the smartphone. Extensive experimental results demonstrate the high performance of our system, with 92.8% accuracy, 86.9% precision, 87.4% recall, and 87.1% F1 Score.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation
Authors:
Junxuan Yu,
Rusi Chen,
Yongsong Zhou,
Yanlin Chen,
Yaofei Duan,
Yuhao Huang,
Han Zhou,
Tan Tao,
Xin Yang,
Dong Ni
Abstract:
Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specif…
▽ More
Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specific cardiac structures. In this context, we propose an explainable and controllable method for echocardiography video generation, taking an initial frame and a motion curve as guidance. Our contributions are three-fold. First, we extract motion information from each heart substructure to construct motion curves, enabling the diffusion model to synthesize customized echocardiography videos by modifying these curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves across cardiac structures. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information. Extensive experiments on three echocardiography datasets show that our method outperforms others regarding fidelity and consistency. The full code will be released at https://github.com/mlmi-2024-72/ECM.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Movable Frequency Diverse Array for Wireless Communication Security
Authors:
Zihao Cheng,
Jiangbo Si,
Zan Li,
Pengpeng Liu,
Yangchao Huang,
Naofal Al-Dhahir
Abstract:
Frequency diverse array (FDA) is a promising antenna technology to achieve physical layer security by varying the frequency of each antenna at the transmitter. However, when the channels of the legitimate user and eavesdropper are highly correlated, FDA is limited by the frequency constraint and cannot provide satisfactory security performance. In this paper, we propose a novel movable FDA (MFDA)…
▽ More
Frequency diverse array (FDA) is a promising antenna technology to achieve physical layer security by varying the frequency of each antenna at the transmitter. However, when the channels of the legitimate user and eavesdropper are highly correlated, FDA is limited by the frequency constraint and cannot provide satisfactory security performance. In this paper, we propose a novel movable FDA (MFDA) antenna technology where the positions of antennas can be dynamically adjusted in a given finite region. Specifically, we aim to maximize the secrecy capacity by jointly optimizing the antenna beamforming vector, antenna frequency vector and antenna position vector. To solve this non-convex optimization problem with coupled variables, we develop a two-stage alternating optimization (AO) algorithm based on block successive upper-bound minimization (BSUM) method. Moreover, to evaluate the security performance provided by MFDA, we introduce two benchmark schemes, i.e., phased array (PA) and FDA. Simulation results demonstrate that MFDA can significantly enhance security performance compared to PA and FDA. In particular, when the frequency constraint is strict, MFDA can further increase the secrecy capacity by adjusting the positions of antennas instead of the frequencies.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Haptic feedback of front car motion can improve driving control
Authors:
Xiaoxiao Cheng,
Xianzhe Geng,
Yanpei Huang,
Etienne Burdet
Abstract:
This study investigates the role of haptic feedback in a car-following scenario, where information about the motion of the front vehicle is provided through a virtual elastic connection with it. Using a robotic interface in a simulated driving environment, we examined the impact of varying levels of such haptic feedback on the driver's ability to follow the road while avoiding obstacles. The resul…
▽ More
This study investigates the role of haptic feedback in a car-following scenario, where information about the motion of the front vehicle is provided through a virtual elastic connection with it. Using a robotic interface in a simulated driving environment, we examined the impact of varying levels of such haptic feedback on the driver's ability to follow the road while avoiding obstacles. The results of an experiment with 15 subjects indicate that haptic feedback from the front car's motion can significantly improve driving control (i.e., reduce motion jerk and deviation from the road) and reduce mental load (evaluated via questionnaire). This suggests that haptic communication, as observed between physically interacting humans, can be leveraged to improve safety and efficiency in automated driving systems, warranting further testing in real driving scenarios.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Correlating Stroke Risk with Non-Invasive Tracing of Brain Blood Dynamic via a Portable Speckle Contrast Optical Spectroscopy Laser Device
Authors:
Yu Xi Huang,
Simon Mahler,
Aidin Abedi,
Julian Michael Tyszka,
Yu Tung Lo,
Patrick D. Lyden,
Jonathan Russin,
Charles Liu,
Changhuei Yang
Abstract:
Stroke poses a significant global health threat, with millions affected annually, leading to substantial morbidity and mortality. Current stroke risk assessment for the general population relies on markers such as demographics, blood tests, and comorbidities. A minimally invasive, clinically scalable, and cost-effective way to directly measure cerebral blood flow presents an opportunity. This oppo…
▽ More
Stroke poses a significant global health threat, with millions affected annually, leading to substantial morbidity and mortality. Current stroke risk assessment for the general population relies on markers such as demographics, blood tests, and comorbidities. A minimally invasive, clinically scalable, and cost-effective way to directly measure cerebral blood flow presents an opportunity. This opportunity has potential to positively impact effective stroke risk assessment prevention and intervention. Physiological changes in the cerebral vascular system, particularly in response to carbon dioxide level changes and oxygen deprivation, such as during breath-holding, can offer insights into stroke risk assessment. However, existing methods for measuring cerebral perfusion reserve, such as blood flow and blood volume changes, are limited by either invasiveness or impracticality. Here, we propose a transcranial approach using speckle contrast optical spectroscopy (SCOS) to non-invasively monitor regional changes in brain blood flow and volume during breath-holding. Our study, conducted on 50 individuals classified into two groups (low-risk and higher-risk for stroke), shows significant differences in blood dynamic changes during breath-holding between the two groups, providing physiological insights for stroke risk assessment using a non-invasive quantification paradigm. Given its cost-effectiveness, scalability, portability, and simplicity, this laser-centric tool has significant potential in enhancing the pre-screening of stroke and mitigating strokes in the general population through early diagnosis and intervention.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Dreamer: Dual-RIS-aided Imager in Complementary Modes
Authors:
Fuhai Wang,
Yunlong Huang,
Zhanbo Feng,
Rujing Xiong,
Zhe Li,
Chun Wang,
Tiebin Mi,
Robert Caiming Qiu,
Zenan Ling
Abstract:
Reconfigurable intelligent surfaces (RISs) have emerged as a promising auxiliary technology for radio frequency imaging. However, existing works face challenges of faint and intricate back-scattered waves and the restricted field-of-view (FoV), both resulting from complex target structures and a limited number of antennas. The synergistic benefits of multi-RIS-aided imaging hold promise for addres…
▽ More
Reconfigurable intelligent surfaces (RISs) have emerged as a promising auxiliary technology for radio frequency imaging. However, existing works face challenges of faint and intricate back-scattered waves and the restricted field-of-view (FoV), both resulting from complex target structures and a limited number of antennas. The synergistic benefits of multi-RIS-aided imaging hold promise for addressing these challenges. Here, we propose a dual-RIS-aided imaging system, Dreamer, which operates collaboratively in complementary modes (reflection-mode and transmission-mode). Dreamer significantly expands the FoV and enhances perception by deploying dual-RIS across various spatial and measurement patterns. Specifically, we perform a fine-grained analysis of how radio-frequency (RF) signals encode scene information in the scattered object modeling. Based on this modeling, we design illumination strategies to balance spatial resolution and observation scale, and implement a prototype system in a typical indoor environment. Moreover, we design a novel artificial neural network with a CNN-external-attention mechanism to translate RF signals into high-resolution images of human contours. Our approach achieves an impressive SSIM score exceeding 0.83, validating its effectiveness in broadening perception modes and enhancing imaging capabilities. The code to reproduce our results is available at https://github.com/fuhaiwang/Dreamer.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
Integrating Base Station with Intelligent Surface for 6G Wireless Networks: Architectures, Design Issues, and Future Directions
Authors:
Yuwei Huang,
Lipeng Zhu,
Rui Zhang
Abstract:
Intelligent surface (IS) is envisioned as a promising technology for the sixth-generation (6G) wireless networks, which can effectively reconfigure the wireless propagation environment via dynamically controllable signal reflection/transmission. In particular, integrating passive intelligent surface (IS) into the base station (BS) is a novel solution to enhance the wireless network throughput and…
▽ More
Intelligent surface (IS) is envisioned as a promising technology for the sixth-generation (6G) wireless networks, which can effectively reconfigure the wireless propagation environment via dynamically controllable signal reflection/transmission. In particular, integrating passive intelligent surface (IS) into the base station (BS) is a novel solution to enhance the wireless network throughput and coverage both cost-effectively and energyefficiently. In this article, we provide an overview of IS-integrated BSs for wireless networks, including their motivations, practical architectures, and main design issues. Moreover, numerical results are presented to compare the performance of different IS-integrated BS architectures as well as the conventional BS without IS. Finally, promising directions are pointed out to stimulate future research on IS-BS/terminal integration in wireless networks.
△ Less
Submitted 21 June, 2024;
originally announced July 2024.
-
Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer
Authors:
Yuanfei Huang,
Hua Huang
Abstract:
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets, suffering from the variability in noise distributions encountered in real-world scenarios. In this work, we propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. This insight forms the basis for our developm…
▽ More
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets, suffering from the variability in noise distributions encountered in real-world scenarios. In this work, we propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. This insight forms the basis for our development of conditional optimization framework, designed to overcome the constraints of traditional denoising framework. To this end, we introduce a Locally Noise Prior Estimation (LoNPE) algorithm, which accurately estimates the noise prior directly from a single raw noisy image. This estimation acts as an explicit prior representation of the camera sensor's imaging environment, distinct from the image prior of scenes. Additionally, we design an auxiliary learnable LoNPE network tailored for practical application to sRGB noisy images. Leveraging the estimated noise prior, we present a novel Conditional Denoising Transformer (Condformer), by incorporating the noise prior into a conditional self-attention mechanism. This integration allows the Condformer to segment the optimization process into multiple explicit subspaces, significantly enhancing the model's generalization and flexibility. Extensive experimental evaluations on both synthetic and real-world datasets, demonstrate that the proposed method achieves superior performance over current state-of-the-art methods. The source code is available at https://github.com/YuanfeiHuang/Condformer.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
SliceMamba with Neural Architecture Search for Medical Image Segmentation
Authors:
Chao Fan,
Hongyuan Yu,
Yan Huang,
Liang Wang,
Zhenghan Yang,
Xibin Jia
Abstract:
Despite the progress made in Mamba-based medical image segmentation models, existing methods utilizing unidirectional or multi-directional feature scanning mechanisms struggle to effectively capture dependencies between neighboring positions, limiting the discriminant representation learning of local features. These local features are crucial for medical image segmentation as they provide critical…
▽ More
Despite the progress made in Mamba-based medical image segmentation models, existing methods utilizing unidirectional or multi-directional feature scanning mechanisms struggle to effectively capture dependencies between neighboring positions, limiting the discriminant representation learning of local features. These local features are crucial for medical image segmentation as they provide critical structural information about lesions and organs. To address this limitation, we propose SliceMamba, a simple and effective locally sensitive Mamba-based medical image segmentation model. SliceMamba includes an efficient Bidirectional Slice Scan module (BSS), which performs bidirectional feature slicing and employs varied scanning mechanisms for sliced features with distinct shapes. This design ensures that spatially adjacent features remain close in the scanning sequence, thereby improving segmentation performance. Additionally, to fit the varying sizes and shapes of lesions and organs, we further introduce an Adaptive Slice Search method to automatically determine the optimal feature slice method based on the characteristics of the target data. Extensive experiments on two skin lesion datasets (ISIC2017 and ISIC2018), two polyp segmentation (Kvasir and ClinicDB) datasets, and one multi-organ segmentation dataset (Synapse) validate the effectiveness of our method.
△ Less
Submitted 19 August, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Multi-scale Conditional Generative Modeling for Microscopic Image Restoration
Authors:
Luzhe Huang,
Xiongye Xiao,
Shixuan Li,
Jiawen Sun,
Yi Huang,
Aydogan Ozcan,
Paul Bogdan
Abstract:
The advance of diffusion-based generative models in recent years has revolutionized state-of-the-art (SOTA) techniques in a wide variety of image analysis and synthesis tasks, whereas their adaptation on image restoration, particularly within computational microscopy remains theoretically and empirically underexplored. In this research, we introduce a multi-scale generative model that enhances con…
▽ More
The advance of diffusion-based generative models in recent years has revolutionized state-of-the-art (SOTA) techniques in a wide variety of image analysis and synthesis tasks, whereas their adaptation on image restoration, particularly within computational microscopy remains theoretically and empirically underexplored. In this research, we introduce a multi-scale generative model that enhances conditional image restoration through a novel exploitation of the Brownian Bridge process within wavelet domain. By initiating the Brownian Bridge diffusion process specifically at the lowest-frequency subband and applying generative adversarial networks at subsequent multi-scale high-frequency subbands in the wavelet domain, our method provides significant acceleration during training and sampling while sustaining a high image generation quality and diversity on par with SOTA diffusion models. Experimental results on various computational microscopy and imaging tasks confirm our method's robust performance and its considerable reduction in its sampling steps and time. This pioneering technique offers an efficient image restoration framework that harmonizes efficiency with quality, signifying a major stride in incorporating cutting-edge generative models into computational microscopy workflows.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
Jingping Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo Jin,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Low-Complexity SVM Signal Recovery in Bandwidth-Limited 100Gb/s PAM4 PON Upstream
Authors:
Liyan Wu,
Yanlu Huang,
Kai Jin,
Shangya Han,
Kun Xu,
Yanni Ou
Abstract:
We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE.
We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI
Authors:
Luyi Han,
Tao Tan,
Tianyu Zhang,
Xin Wang,
Yuan Gao,
Chunyao Lu,
Xinglong Liang,
Haoran Dou,
Yunzhi Huang,
Ritse Mann
Abstract:
Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec…
▽ More
Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
Authors:
Yurui Huang,
Yang Yang,
Shou Chen,
Xiangyu Wu,
Qingguo Chen,
Jianfeng Lu
Abstract:
In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information…
▽ More
In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information to help the model better localize the start and end of sounds. The fused features are trained in a multi-scale Transformer for training. In the final test dataset, we achieved a mean average precision (mAP) of 0.33, obtaining the second-best performance in this track.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Generative Iris Prior Embedded Transformer for Iris Restoration
Authors:
Yubo Huang,
Jia Wang,
Peipei Li,
Liuyu Xiang,
Peigang Li,
Zhaofeng He
Abstract:
Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder…
▽ More
Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder network employing Transformer block and generative iris prior. First, we tame Transformer blocks to model long-range dependencies in target images. Second, we pretrain an iris generative adversarial network (GAN) to obtain the rich iris prior, and incorporate it into the iris restoration process with our iris feature modulator. Our experiments demonstrate that the proposed Gformer outperforms state-of-the-art methods. Besides, iris recognition performance has been significantly improved after applying Gformer.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
LUT-boosted CDR and Equalization for Burst-mode 50/100 Gbit/s Bandwidth-limited Flexible PON
Authors:
Yanlu Huang,
Liyan Wu,
Shangya Han,
Kai Jin,
Kun Xu,
Yanni Ou
Abstract:
We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles.
We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Energy efficiency analysis of ammonia-fueled power systems for vehicles considering residual heat recovery
Authors:
Zexin Nie,
Yi Huang,
Guangyu Tian
Abstract:
Ammonia, known as a good hydrogen carrier, shows great potential for use as a zero-carbon fuel for vehicles. However, both the internal combustion engine (ICE) and the proton exchange membrane fuel cell (PEMFC), the currently available engines used by the vehicle, require hydrogen decomposed from ammonia. On-board hydrogen production is an energy-intensive process that significantly reduces system…
▽ More
Ammonia, known as a good hydrogen carrier, shows great potential for use as a zero-carbon fuel for vehicles. However, both the internal combustion engine (ICE) and the proton exchange membrane fuel cell (PEMFC), the currently available engines used by the vehicle, require hydrogen decomposed from ammonia. On-board hydrogen production is an energy-intensive process that significantly reduces system efficiency. Therefore, energy recovery from the system's residual heat is essential to promote system efficiency. ICEs and FCs require different amounts of hydrogen, and they produce residual heat of different quality and quantity, so the system efficiency is not only determined by the engine operating point, but also by the measures and ratios of residual heat recovery. To thoroughly understand the relationships between system energy efficiency and system configuration as well as system parameters, this paper takes three typical power systems with different configurations as our objects. Models of three systems are set up for system energy efficiency analysis, and carry out simulations under different conditions to conduct system output power and energy efficiency. By analyzing the simulation results, the factors that most significantly impact the system efficiency are identified, the guidelines for system design and parameter optimization are proposed.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Reachability Analysis for Linear Systems with Uncertain Parameters using Polynomial Zonotopes
Authors:
Yushen Huang,
Ertai Luo,
Stanley Bak,
Yifan Sun
Abstract:
In real world applications, uncertain parameters are the rule rather than the exception. We present a reachability algorithm for linear systems with uncertain parameters and inputs using set propagation of polynomial zonotopes. In contrast to previous methods, our approach is able to tightly capture the non-convexity of the reachable set. Building up on our main result, we show how our reachabilit…
▽ More
In real world applications, uncertain parameters are the rule rather than the exception. We present a reachability algorithm for linear systems with uncertain parameters and inputs using set propagation of polynomial zonotopes. In contrast to previous methods, our approach is able to tightly capture the non-convexity of the reachable set. Building up on our main result, we show how our reachability algorithm can be extended to handle linear time-varying systems as well as linear systems with time-varying parameters. Moreover, our approach opens up new possibilities for reachability analysis of linear time-invariant systems, nonlinear systems, and hybrid systems. We compare our approach to other state of the art methods, with superior tightness on two benchmarks including a 9-dimensional vehicle platooning system. Moreover, as part of the journal extension, we investigate through a polynomial zonotope with special structure named multi-affine zonotopes and its optimization problem. We provide the corresponding optimization algorithm and experiment over the examples obatined from two benchmark systems, showing the efficiency and scalability comparing to the state of the art method for handling such type of set representation.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
Jinming Guo,
Xiaolin Chen,
Jingcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources…
▽ More
Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered.
△ Less
Submitted 30 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Authors:
Yu-Fen Huang,
Nikki Moran,
Simon Coleman,
Jon Kelly,
Shun-Hwa Wei,
Po-Yin Chen,
Yun-Hsin Huang,
Tsung-Ping Chen,
Yu-Chia Kuo,
Yu-Chi Wei,
Chih-Hsuan Li,
Da-Yu Huang,
Hsuan-Kai Kao,
Ting-Wei Lin,
Li Su
Abstract:
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m…
▽ More
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome
Authors:
Yixin Huang,
Yiqi Jin,
Ke Tao,
Kaijian Xia,
Jianfeng Gu,
Lei Yu,
Lan Du,
Cunjian Chen
Abstract:
May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t…
▽ More
May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-temporal relationship among CT scans and emulate the clinical process of diagnosing MTS, we propose a novel attention module called the dual-enhanced positional multi-head self-attention (DEP-MHSA). The proposed DEP-MHSA reconsiders the role of positional embedding and incorporates a dual-enhanced positional embedding in both attention weights and residual connections. Further, we establish a new dataset, termed MTS-CT, consisting of 747 subjects. Experimental results demonstrate that our proposed approach achieves state-of-the-art MTS diagnosis results, and our self-attention design facilitates the spatial-temporal modeling. We believe that our DEP-MHSA is more suitable to handle CT image sequence modeling and the proposed dataset enables future research on MTS diagnosis. We make our code and dataset publicly available at: https://github.com/Nutingnon/MTS_dep_mhsa.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Navigating Autonomous Vehicle on Unmarked Roads with Diffusion-Based Motion Prediction and Active Inference
Authors:
Yufei Huang,
Yulin Li,
Andrea Matta,
Mohsen Jafari
Abstract:
This paper presents a novel approach to improving autonomous vehicle control in environments lacking clear road markings by integrating a diffusion-based motion predictor within an Active Inference Framework (AIF). Using a simulated parking lot environment as a parallel to unmarked roads, we develop and test our model to predict and guide vehicle movements effectively. The diffusion-based motion p…
▽ More
This paper presents a novel approach to improving autonomous vehicle control in environments lacking clear road markings by integrating a diffusion-based motion predictor within an Active Inference Framework (AIF). Using a simulated parking lot environment as a parallel to unmarked roads, we develop and test our model to predict and guide vehicle movements effectively. The diffusion-based motion predictor forecasts vehicle actions by leveraging probabilistic dynamics, while AIF aids in decision-making under uncertainty. Unlike traditional methods such as Model Predictive Control (MPC) and Reinforcement Learning (RL), our approach reduces computational demands and requires less extensive training, enhancing navigation safety and efficiency. Our results demonstrate the model's capability to navigate complex scenarios, marking significant progress in autonomous driving technology.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations
Authors:
Zongbao Zhang,
Jiao Hao,
Wenmeng Zhao,
Yan Liu,
Yaohui Huang,
Xinhang Luo
Abstract:
The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address the…
▽ More
The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address these challenges, we propose a Multiscale Spatio-Temporal Enhanced Model (MSTEM) for effective load forecasting at EVCS. MSTEM incorporates a multiscale graph neural network to discern hierarchical nonlinear temporal dependencies across various time scales. Besides, it also integrates a recurrent learning component and a residual fusion mechanism, enhancing its capability to accurately capture spatial and temporal variations in charging patterns. The effectiveness of the proposed MSTEM has been validated through comparative analysis with six baseline models using three evaluation metrics. The case studies utilize real-world datasets for both fast and slow charging loads at EVCS in Perth, UK. The experimental results demonstrate the superiority of MSTEM in short-term continuous load forecasting for EVCS.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A 7K Parameter Model for Underwater Image Enhancement based on Transmission Map Prior
Authors:
Fuheng Zhou,
Dikai Wei,
Ye Fan,
Yulong Huang,
Yonggang Zhang
Abstract:
Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the origina…
▽ More
Although deep learning based models for underwater image enhancement have achieved good performance, they face limitations in both lightweight and effectiveness, which prevents their deployment and application on resource-constrained platforms. Moreover, most existing deep learning based models use data compression to get high-level semantic information in latent space instead of using the original information. Therefore, they require decoder blocks to generate the details of the output. This requires additional computational cost. In this paper, a lightweight network named lightweight selective attention network (LSNet) based on the top-k selective attention and transmission maps mechanism is proposed. The proposed model achieves a PSNR of 97\% with only 7K parameters compared to a similar attention-based model. Extensive experiments show that the proposed LSNet achieves excellent performance in state-of-the-art models with significantly fewer parameters and computational resources. The code is available at https://github.com/FuhengZhou/LSNet}{https://github.com/FuhengZhou/LSNet.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Automatic diagnosis of cardiac magnetic resonance images based on semi-supervised learning
Authors:
Hejun Huang,
Zuguo Chen,
Yi Huang,
Guangqiang Luo,
Chaoyang Chen,
Youzhi Song
Abstract:
Cardiac magnetic resonance imaging (MRI) is a pivotal tool for assessing cardiac function. Precise segmentation of cardiac structures is imperative for accurate cardiac functional evaluation. This paper introduces a semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis. By harnessing cardiac MRI images and necessitating only a small portion of annotated image d…
▽ More
Cardiac magnetic resonance imaging (MRI) is a pivotal tool for assessing cardiac function. Precise segmentation of cardiac structures is imperative for accurate cardiac functional evaluation. This paper introduces a semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis. By harnessing cardiac MRI images and necessitating only a small portion of annotated image data, the model achieves fully automated, high-precision segmentation of cardiac images, extraction of features, calculation of clinical indices, and prediction of diseases. The provided segmentation results, clinical indices, and prediction outcomes can aid physicians in diagnosis, thereby serving as auxiliary diagnostic tools. Experimental results showcase that this semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis attains high accuracy in segmentation and correctness in prediction, demonstrating substantial practical guidance and application value.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma
Authors:
Ahmed Gomaa,
Yixing Huang,
Amr Hagag,
Charlotte Schmitter,
Daniel Höfler,
Thomas Weissmann,
Katharina Breininger,
Manuel Schmidt,
Jenny Stritzelberger,
Daniel Delev,
Roland Coras,
Arnd Dörfler,
Oliver Schnell,
Benjamin Frey,
Udo S. Gaipl,
Sabine Semrau,
Christoph Bert,
Rainer Fietkau,
Florian Putz
Abstract:
Background: This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model, addressing data heterogeneity and performance generalizability. Method: We propose and evaluate a transformer-based non-linear and non-proportional survival prediction model. The model employs self-supervised learnin…
▽ More
Background: This research aims to improve glioblastoma survival prediction by integrating MR images, clinical and molecular-pathologic data in a transformer-based deep learning model, addressing data heterogeneity and performance generalizability. Method: We propose and evaluate a transformer-based non-linear and non-proportional survival prediction model. The model employs self-supervised learning techniques to effectively encode the high-dimensional MRI input for integration with non-imaging data using cross-attention. To demonstrate model generalizability, the model is assessed with the time-dependent concordance index (Cdt) in two training setups using three independent public test sets: UPenn-GBM, UCSF-PDGM, and RHUH-GBM, each comprising 378, 366, and 36 cases, respectively. Results: The proposed transformer model achieved promising performance for imaging as well as non-imaging data, effectively integrating both modalities for enhanced performance (UPenn-GBM test-set, imaging Cdt 0.645, multimodal Cdt 0.707) while outperforming state-of-the-art late-fusion 3D-CNN-based models. Consistent performance was observed across the three independent multicenter test sets with Cdt values of 0.707 (UPenn-GBM, internal test set), 0.672 (UCSF-PDGM, first external test set) and 0.618 (RHUH-GBM, second external test set). The model achieved significant discrimination between patients with favorable and unfavorable survival for all three datasets (logrank p 1.9\times{10}^{-8}, 9.7\times{10}^{-3}, and 1.2\times{10}^{-2}). Conclusions: The proposed transformer-based survival prediction model integrates complementary information from diverse input modalities, contributing to improved glioblastoma survival prediction compared to state-of-the-art methods. Consistent performance was observed across institutions supporting model generalizability.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Predictive Energy Management for Battery Electric Vehicles with Hybrid Models
Authors:
Yu-Wen Huang,
Christian Prehofer,
William Lindskog,
Ron Puts,
Pietro Mosca,
Göran Kauermann
Abstract:
This paper addresses the problem of predicting the energy consumption for the drivers of Battery electric vehicles (BEVs). Several external factors (e.g., weather) are shown to have huge impacts on the energy consumption of a vehicle besides the vehicle or powertrain dynamics. Thus, it is challenging to take all of those influencing variables into consideration. The proposed approach is based on a…
▽ More
This paper addresses the problem of predicting the energy consumption for the drivers of Battery electric vehicles (BEVs). Several external factors (e.g., weather) are shown to have huge impacts on the energy consumption of a vehicle besides the vehicle or powertrain dynamics. Thus, it is challenging to take all of those influencing variables into consideration. The proposed approach is based on a hybrid model which improves the prediction accuracy of energy consumption of BEVs. The novelty of this approach is to combine a physics-based simulation model, which captures the basic vehicle and powertrain dynamics, with a data-driven model. The latter accounts for other external influencing factors neglected by the physical simulation model, using machine learning techniques, such as generalized additive mixed models, random forests and boosting. The hybrid modeling method is evaluated with a real data set from TUM and the hybrid models were shown that decrease the average prediction error from 40% of the pure physics model to 10%.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation
Authors:
Yixing Huang,
Zahra Khodabakhshi,
Ahmed Gomaa,
Manuel Schmidt,
Rainer Fietkau,
Matthias Guckenberger,
Nicolaus Andratschke,
Christoph Bert,
Stephanie Tanadini-Lang,
Florian Putz
Abstract:
Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospita…
▽ More
Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF.
Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed.
Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training.
△ Less
Submitted 25 July, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model
Authors:
Yongsong Huang,
Tomo Miyazaki,
Xiaofeng Liu,
Shinichiro Omachi
Abstract:
Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual ta…
▽ More
Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual tasks, suggesting their applicability for IR enhancement. In this work, we introduce IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model, a novel Mamba-based model designed specifically for IR image super-resolution. This model enhances the restoration of context-sparse target details through its advanced dependency modeling capabilities. Additionally, a new wavelet transform feature modulation block improves multi-scale receptive field representation, capturing both global and local information efficiently. Comprehensive evaluations confirm that IRSRMamba outperforms existing models on multiple benchmarks. This research advances IR super-resolution and demonstrates the potential of Mamba-based models in IR image processing. Code are available at \url{https://github.com/yongsongH/IRSRMamba}.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior
Authors:
Yongfeng Huang,
Zhendong Chen,
Kun Ye,
Lang Zhou,
Haixin Sun
Abstract:
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod…
▽ More
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots.
△ Less
Submitted 17 May, 2024; v1 submitted 18 April, 2024;
originally announced May 2024.
-
Low-Complexity Joint Azimuth-Range-Velocity Estimation for Integrated Sensing and Communication with OFDM Waveform
Authors:
Jun Zhang,
Gang Yang,
Qibin Ye,
Yixuan Huang,
Su Hu
Abstract:
Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with ortho…
▽ More
Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with orthogonal frequency division multiplexing (OFDM) waveform, in which a base station receives the echos of its transmitted cellular OFDM signals to sense multiple targets. The Cramer-Rao bounds are first derived for JARVE. A low-complexity algorithm is further designed for super-resolution JARVE, which utilizes the proposed iterative subspace update scheme and Levenberg-Marquardt optimization method to replace the exhaustive search of spatial spectrum in multiple-signal-classification (MUSIC) algorithm. Finally, with the practical parameters of 5G New Radio, simulation results verify that the proposed algorithm can reduce the computational complexity by three orders of magnitude and two orders of magnitude compared to the existing three-dimensional MUSIC algorithm and estimation-of-signal-parameters-using-rotational-invariance-techniques (ESPRIT) algorithm, respectively, and also improve the estimation performance.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
Authors:
Yaqi Wu,
Zhihao Fan,
Xiaofeng Chu,
Jimmy S. Ren,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangcheng Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Senyan Xu,
Zhijing Sun,
Jiaying Zhu,
Yurui Zhu,
Xueyang Fu,
Zheng-Jun Zha,
Jun Cao,
Cheng Li,
Shu Chen,
Liang Ma,
Shiyang Zhou,
Haijin Zeng,
Kai Feng
, et al. (24 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
POPDG: Popular 3D Dance Generation with PopDanceSet
Authors:
Zhenye Luo,
Min Ren,
Xuecai Hu,
Yongzhen Huang,
Li Yao
Abstract:
Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M…
▽ More
Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. Moreover, the proposed POPDG model within the iDDPM framework enhances dance diversity and, through the Space Augmentation Algorithm, strengthens spatial physical connections between human body joints, ensuring that increased diversity does not compromise generation quality. A streamlined Alignment Module is also designed to improve the temporal alignment between dance and music. Extensive experiments show that POPDG achieves SOTA results on two datasets. Furthermore, the paper also expands on current evaluation metrics. The dataset and code are available at https://github.com/Luke-Luo1/POPDG.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Authors:
Qixin Deng,
Qikai Yang,
Ruibin Yuan,
Yipeng Huang,
Yi Wang,
Xubo Liu,
Zeyue Tian,
Jiahao Pan,
Ge Zhang,
Hanfeng Lin,
Yizhi Li,
Yinghao Ma,
Jie Fu,
Chenghua Lin,
Emmanouil Benetos,
Wenwu Wang,
Guangyu Xia,
Wei Xue,
Yike Guo
Abstract:
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C…
▽ More
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. To further explore and enhance LLMs' potential in music composition by leveraging their reasoning ability and the large knowledge base in music history and theory, we propose ComposerX, an agent-based symbolic music generation framework. We find that applying a multi-agent approach significantly improves the music composition quality of GPT-4. The results demonstrate that ComposerX is capable of producing coherent polyphonic music compositions with captivating melodies, while adhering to user instructions.
△ Less
Submitted 30 April, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Authors:
Marcos V. Conde,
Zhijun Lei,
Wen Li,
Cosmin Stejerean,
Ioannis Katsavounidis,
Radu Timofte,
Kihwan Yoon,
Ganzorig Gankhuyag,
Jiangtao Lv,
Long Sun,
Jinshan Pan,
Jiangxin Dong,
Jinhui Tang,
Zhiyuan Li,
Hao Wei,
Chenyang Ge,
Dongyang Zhang,
Tianle Liu,
Huaian Chen,
Yi Jin,
Menghan Zhou,
Yiqiang Yan,
Si Gao,
Biao Wu,
Shaoli Liu
, et al. (50 additional authors not shown)
Abstract:
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod…
▽ More
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
An Alternative Method to Identify the Susceptibility Threshold Level of Device under Test in a Reverberation Chamber
Authors:
Qian Xu,
Kai Chen,
Xueqi Shen,
Lei Xing,
Yi Huang,
Tian Hong Loh
Abstract:
By counting the number of pass/fail occurrences of a DUT (Device under Test) in the stirring process in a reverberation chamber (RC), the threshold electric field (E-field) level can be well estimated without tuning the input power and repeating the whole testing many times. The Monte-Carlo method is used to verify the results. Estimated values and uncertainties are given for Rayleigh distributed…
▽ More
By counting the number of pass/fail occurrences of a DUT (Device under Test) in the stirring process in a reverberation chamber (RC), the threshold electric field (E-field) level can be well estimated without tuning the input power and repeating the whole testing many times. The Monte-Carlo method is used to verify the results. Estimated values and uncertainties are given for Rayleigh distributed fields and for Rice distributed fields with different K-factors.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Massive MIMO Sampling Detection Strategy Based on Denoising Diffusion Model
Authors:
Lanxin He,
Zheng Wang,
Yongming Huang
Abstract:
The Langevin sampling method relies on an accurate score matching while the existing massive multiple-input multiple output (MIMO) Langevin detection involves an inevitable singular value decomposition (SVD) to calculate the posterior score. In this work, a massive MIMO sampling detection strategy that leverages the denoising diffusion model is proposed to narrow the gap between the given iterativ…
▽ More
The Langevin sampling method relies on an accurate score matching while the existing massive multiple-input multiple output (MIMO) Langevin detection involves an inevitable singular value decomposition (SVD) to calculate the posterior score. In this work, a massive MIMO sampling detection strategy that leverages the denoising diffusion model is proposed to narrow the gap between the given iterative detector and the maximum likelihood (ML) detection in an SVD-free manner. Specifically, the proposed score-based sampling detection strategy, denoted as approximate diffusion detection (ADD), is applicable to a wide range of iterative detection methods, and therefore entails a considerable potential in their performance improvement by multiple sampling attempts. On the other hand, the ADD scheme manages to bypass the channel SVD by introducing a reliable iterative detector to produce a sample from the approximate posterior, so that further Langevin sampling is tractable. Customized by the conjugated gradient descent algorithm as an instance, the proposed sampling scheme outperforms the existing score-based detector in terms of a better complexity-performance trade-off.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments
Authors:
Yongming Huang,
Xiaohu You,
Hang Zhan,
Shiwen He,
Ningning Fu,
Wei Xu
Abstract:
Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only…
▽ More
Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies
Authors:
Haili Sun,
Yan Huang,
Lansheng Han,
Cai Fu,
Chunjie Zhou
Abstract:
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominant…
▽ More
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominantly reconstruction-based and predictive in nature. However, they typically concentrate on a single-dimensional instance level, thereby not fully harnessing the complex associations inherent in industrial MTS. To address this issue, we propose a novel self-supervised hierarchical contrastive consistency learning method for detecting anomalies in MTS, named HCL-MTSAD. It innovatively leverages data consistency at multiple levels inherent in industrial MTS, systematically capturing consistent associations across four latent levels-measurement, sample, channel, and process. By developing a multi-layer contrastive loss, HCL-MTSAD can extensively mine data consistency and spatio-temporal association, resulting in more informative representations. Subsequently, an anomaly discrimination module, grounded in self-supervised hierarchical contrastive learning, is designed to detect timestamp-level anomalies by calculating multi-scale data consistency. Extensive experiments conducted on six diverse MTS datasets retrieved from real cyber-physical systems and server machines, in comparison with 20 baselines, indicate that HCL-MTSAD's anomaly detection capability outperforms the state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score.
△ Less
Submitted 18 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Net 835-Gb/s/λ Carrier- and LO-Free 100-km Transmission Using Channel-Aware Phase Retrieval Reception
Authors:
Hanzi Huang,
Haoshuo Chen,
Qian Hu,
Di Che,
Yetian Huang,
Brian Stern,
Nicolas K. Fontaine,
Mikael Mazur,
Lauren Dallachiesa,
Roland Ryf,
Zhengxuan Li,
Yingxiong Song
Abstract:
We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.
We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Joint Active and Passive Beamforming for IRS-Aided Wireless Energy Transfer Network Exploiting One-Bit Feedback
Authors:
Taotao Ji,
Meng Hua,
Chunguo Li,
Yongming Huang,
Luxi Yang
Abstract:
To reap the active and passive beamforming gain in an intelligent reflecting surface (IRS)-aided wireless network, a typical way is to first acquire the channel state information (CSI) relying on the pilot signal, and then perform the joint beamforming design. However, it is a great challenge when the receiver can neither send pilot signals nor have complex signal processing capabilities due to it…
▽ More
To reap the active and passive beamforming gain in an intelligent reflecting surface (IRS)-aided wireless network, a typical way is to first acquire the channel state information (CSI) relying on the pilot signal, and then perform the joint beamforming design. However, it is a great challenge when the receiver can neither send pilot signals nor have complex signal processing capabilities due to its hardware limitation. To tackle this problem, we study in this paper an IRS-aided wireless energy transfer (WET) network and propose two joint beamforming design methods, namely, the channel-estimationbased method and the distributed-beamforming-based method, that require only one-bit feedback from the energy receiver (ER) to the energy transmitter (ET). Specifically, for the channelestimation-based method, according to the feedback information, the ET is able to infer the cascaded ET-IRS-ER channel by continually adjusting its transmit beamformer while applying the analytic center cutting plane method (ACCPM). Then, based on the estimated cascaded CSI, the joint beamforming design can be performed by using the existing optimization techniques. While for the distributed-beamforming-based method, we first apply the distributed beamforming algorithm to optimize the IRS reflection coefficients, which is theoretically proven to converge to a local optimum almost surely. Then, the optimal ET's transmit covariance matrix is obtained based on the effective ET-ER channel learned by applying the ACCPM only once. Numerical results demonstrate the effectiveness of our proposed one-bitfeedback-based joint beamforming design schemes while greatly reducing the requirement on the hardware complexity of the ER. In particular, the high accuracy of our IRS-involved cascaded channel estimation method exploiting one-bit feedback is also validated.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Intelligent Reflecting Surface Aided Target Localization With Unknown Transceiver-IRS Channel State Information
Authors:
Taotao Ji,
Meng Hua,
Xuanhong Yan,
Chunguo Li,
Yongming Huang,
Luxi Yang
Abstract:
Integrating wireless sensing capabilities into base stations (BSs) has become a widespread trend in the future beyond fifth-generation (B5G)/sixth-generation (6G) wireless networks. In this paper, we investigate intelligent reflecting surface (IRS) enabled wireless localization, in which an IRS is deployed to assist a BS in locating a target in its non-line-of-sight (NLoS) region. In particular, w…
▽ More
Integrating wireless sensing capabilities into base stations (BSs) has become a widespread trend in the future beyond fifth-generation (B5G)/sixth-generation (6G) wireless networks. In this paper, we investigate intelligent reflecting surface (IRS) enabled wireless localization, in which an IRS is deployed to assist a BS in locating a target in its non-line-of-sight (NLoS) region. In particular, we consider the case where the BS-IRS channel state information (CSI) is unknown. Specifically, we first propose a separate BS-IRS channel estimation scheme in which the BS operates in full-duplex mode (FDM), i.e., a portion of the BS antennas send downlink pilot signals to the IRS, while the remaining BS antennas receive the uplink pilot signals reflected by the IRS. However, we can only obtain an incomplete BS-IRS channel matrix based on our developed iterative coordinate descent-based channel estimation algorithm due to the "sign ambiguity issue". Then, we employ the multiple hypotheses testing framework to perform target localization based on the incomplete estimated channel, in which the probability of each hypothesis is updated using Bayesian inference at each cycle. Moreover, we formulate a joint BS transmit waveform and IRS phase shifts optimization problem to improve the target localization performance by maximizing the weighted sum distance between each two hypotheses. However, the objective function is essentially a quartic function of the IRS phase shift vector, thus motivating us to resort to the penalty-based method to tackle this challenge. Simulation results validate the effectiveness of our proposed target localization scheme and show that the scheme's performance can be further improved by finely designing the BS transmit waveform and IRS phase shifts intending to maximize the weighted sum distance between different hypotheses.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems
Authors:
Yixuan Huang,
Jie Yang,
Wankai Tang,
Chao-Kai Wen,
Shi Jin
Abstract:
Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Dr…
▽ More
Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Drawing inspiration from conventional Fourier transform (FT)-based imaging methods, our research seeks to accelerate radio imaging in RIS-aided communication systems. To begin, we introduce a two-stage wavenumber domain 3D imaging technique: first, we modify RIS phase shifts to recover the equivalent channel response from the user equipment to the RIS array, subsequently employing traditional FT-based wavenumber domain methods to produce target images. We also determine the diffraction resolution limits of the system through k-space analysis, taking into account factors including system bandwidth, transmission direction, operating frequency, and the angle subtended by the RIS. Addressing the challenge of limited pilots in communication systems, we unveil an innovative algorithm that merges the strengths of both FT- and CS-based techniques by substituting the expansive sensing matrix with FT-based operators. Our simulation outcomes confirm that our proposed FT-based methods achieve high-quality images while demanding few time, memory, and communication resources.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Computationally Efficient Unsupervised Deep Learning for Robust Joint AP Clustering and Beamforming Design in Cell-Free Systems
Authors:
Guanghui Chen,
Zheng Wang,
Hongxin Lin,
Yongming Huang,
Luxi Yang
Abstract:
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP cluster…
▽ More
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP clustering. By transformations, the semi-infinite constraints caused by the imperfect CSI are converted into more tractable forms for facilitating a computationally efficient unsupervised deep learning algorithm. In addition, to further reduce the computational complexity, a computationally effective unsupervised deep learning algorithm is proposed to implement robust joint AP clustering and beamforming design with imperfect CSI in cell-free systems. Numerical results demonstrate that the proposed unsupervised deep learning algorithm achieves a higher worst-case sum rate under a smaller number of AP clustering with computational efficiency.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems
Authors:
Md Adnan Faisal Hossain,
Zhihao Duan,
Yuning Huang,
Fengqing Zhu
Abstract:
Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a ran…
▽ More
Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a range of rates by introducing a rate control parameter as an input to the neural network model. By compressing different intermediate features of a pre-trained vision task model, the proposed method can scale the encoding complexity without changing the overall size of the model. The proposed method is more flexible than existing baselines, at the same time outperforming them in terms of the three-way trade-off between feature compression rate, vision task accuracy, and encoding complexity. We have made the source code available at https://github.com/adnan-hossain/var_feat_comp.git.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs
Authors:
Yichi Zhang,
Zhihao Duan,
Yuning Huang,
Fengqing Zhu
Abstract:
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bo…
▽ More
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Semi-Automatic Line-System Provisioning with Integrated Physical-Parameter-Aware Methodology: Field Verification and Operational Feasibility
Authors:
Hideki Nishizawa,
Giacomo Borraccini,
Takeo Sasai,
Yue-Kai Huang,
Toru Mano,
Kazuya Anazawa,
Masatoshi Namiki,
Soichiroh Usui,
Tatsuya Matsumura,
Yoshiaki Sone,
Zehao Wang,
Seiji Okamoto,
Takeru Inoue,
Ezra Ip,
Andrea D'Amico,
Tingjun Chen,
Vittorio Curri,
Ting Wang,
Koji Asahi,
Koichi Takasugi
Abstract:
We propose methods and an architecture to conduct measurements and optimize newly installed optical fiber line systems semi-automatically using integrated physics-aware technologies in a data center interconnection (DCI) transmission scenario. We demonstrate, for the first time, digital longitudinal monitoring (DLM) and optical line system (OLS) physical parameter calibration working together in r…
▽ More
We propose methods and an architecture to conduct measurements and optimize newly installed optical fiber line systems semi-automatically using integrated physics-aware technologies in a data center interconnection (DCI) transmission scenario. We demonstrate, for the first time, digital longitudinal monitoring (DLM) and optical line system (OLS) physical parameter calibration working together in real-time to extract physical link parameters for transmission performance optimization. Our methodology has the following advantages over traditional design: a minimized footprint at user sites, accurate estimation of the necessary optical network characteristics via complementary telemetry technologies, and the capability to conduct all operation work remotely. The last feature is crucial, as it enables remote operation to implement network design settings for immediate response to quality of transmission (QoT) degradation and reversion in the case of unforeseen problems. We successfully performed semi-automatic line system provisioning over field fiber networks facilities at Duke University, Durham, NC. The tasks of parameter retrieval, equipment setting optimization, and system setup/provisioning were completed within 1 hour. The field operation was supervised by on-duty personnel who could access the system remotely from different time zones. By comparing Q-factor estimates calculated from the extracted link parameters with measured results from 400G transceivers, we confirmed that our methodology has a reduction in the QoT prediction errors (+-0.3 dB) over existing design (+-10.6 dB).
△ Less
Submitted 24 March, 2024;
originally announced March 2024.