-
A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds
Authors:
Ying-Chieh Hsu,
Stanley Yung-Chuan Liu,
Chao-Jung Huang,
Chi-Wei Wu,
Ren-Kai Cheng,
Jane Yung-Jen Hsu,
Shang-Ran Huang,
Yuan-Ren Cheng,
Fu-Shun Hsu
Abstract:
This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da…
▽ More
This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The dataset, comprising 5,173 one-second segments, was used to train and test models, including Support Vector Machine (SVM), Bidirectional Long Short-Term Memory (BiLSTM), and ResNet-50. The ResNet-50, a convolutional neural network (CNN), showed the best overall performance in classifying snoring acoustics, particularly in identifying multi-level obstructions. The study emphasizes the potential of integrating snoring acoustics with deep learning to improve the diagnosis and treatment of OSA. However, challenges such as limited sample size, data imbalance, and differences between pharmacologically induced and natural snoring sounds were noted, suggesting further research to enhance model accuracy and generalizability.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance
Authors:
Weiyi Zhang,
Siyu Huang,
Jiancheng Yang,
Ruoyu Chen,
Zongyuan Ge,
Yingfeng Zheng,
Danli Shi,
Mingguang He
Abstract:
Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF…
▽ More
Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Startup Control Optimization of He-Xe Cooled Space Nuclear Reactors Using a System Analysis Program
Authors:
Chengyuan Li,
Leran Guo,
Shanfang Huang,
Jian Deng,
Jiahe Shang
Abstract:
In recent years, achieving autonomous control in nuclear reactor operations has become pivotal for the effectiveness of Space Nuclear Power Systems (SNPS). However, compared to power control, the startup control of SNPS remains underexplored. This study introduces a multi-objective optimization framework aimed at enhancing startup control, leveraging a system level analysis program to simulate the…
▽ More
In recent years, achieving autonomous control in nuclear reactor operations has become pivotal for the effectiveness of Space Nuclear Power Systems (SNPS). However, compared to power control, the startup control of SNPS remains underexplored. This study introduces a multi-objective optimization framework aimed at enhancing startup control, leveraging a system level analysis program to simulate the system's dynamic behavior accurately. The primary contribution of this work is the development and implementation of an optimization framework that significantly reduces startup time and improves control efficiency. Utilizing a non-ideal gas model, a multi-channel core model and the Monte Carlo code RMC employed to calculate temperature reactivity coefficients and neutron kinetics parameters, the system analysis tool ensures precise thermal-dynamic simulations. After insightful comprehension of system dynamics through reactive insertion accidents, the optimization algorithm fine-tunes the control sequences for external reactivity insertion, TAC system shaft speed, and cooling system background temperature. The optimized control strategy achieves threshold power 1260 seconds earlier and turbine inlet temperature 1980 seconds sooner than baseline methods. The findings highlight the potential of the proposed optimization framework to enhance the autonomy and operational efficiency of future SNPS designs.
△ Less
Submitted 14 August, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
LightViz: Autonomous Light-field Surveying and Mapping for Distributed Light Pollution Monitoring
Authors:
Sheng-En Huang,
Kazi Farha Farzana Suhi,
Md Jahidul Islam
Abstract:
Existing technologies for distributed light-field mapping and light pollution monitoring (LPM) rely on either remote satellite imagery or manual light surveying with single-point sensors such as SQMs (sky quality meters). These modalities offer low-resolution data that are not informative for dense light-field mapping, pollutant factor identification, or sustainable policy implementation. In this…
▽ More
Existing technologies for distributed light-field mapping and light pollution monitoring (LPM) rely on either remote satellite imagery or manual light surveying with single-point sensors such as SQMs (sky quality meters). These modalities offer low-resolution data that are not informative for dense light-field mapping, pollutant factor identification, or sustainable policy implementation. In this work, we propose LightViz -- an interactive software interface to survey, simulate, and visualize light pollution maps in real-time. As opposed to manual error-prone methods, LightViz (i) automates the light-field data collection and mapping processes; (ii) provides a platform to simulate various light sources and intensity attenuation models; and (iii) facilitates effective policy identification for conservation.
To validate the end-to-end computational pipeline, we design a distributed light-field sensor suit, collect data on Florida coasts, and visualize the distributed light-field maps. In particular, we perform a case study at St. Johns County in Florida, which has a two-decade conservation program for lighting ordinances. The experimental results demonstrate that LightViz can offer high-resolution light-field mapping and provide interactive features to simulate and formulate community policies for light pollution mitigation. We also propose a mathematical formulation for light footprint evaluation, which we integrated into LightViz for targeted LPM in vulnerable communities.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Robust Simultaneous Multislice MRI Reconstruction Using Deep Generative Priors
Authors:
Shoujin Huang,
Guanxiong Luo,
Yuwan Wang,
Kexin Yang,
Lingyan Zhang,
Jingzhe Liu,
Hua Guo,
Min Wang,
Mengye Lyu
Abstract:
Simultaneous multislice (SMS) imaging is a powerful technique for accelerating magnetic resonance imaging (MRI) acquisitions. However, SMS reconstruction remains challenging due to the complex signal interactions between and within the excited slices. This study presents a robust SMS MRI reconstruction method using deep generative priors. Starting from Gaussian noise, we leverage denoising diffusi…
▽ More
Simultaneous multislice (SMS) imaging is a powerful technique for accelerating magnetic resonance imaging (MRI) acquisitions. However, SMS reconstruction remains challenging due to the complex signal interactions between and within the excited slices. This study presents a robust SMS MRI reconstruction method using deep generative priors. Starting from Gaussian noise, we leverage denoising diffusion probabilistic models (DDPM) to gradually recover the individual slices through reverse diffusion iterations while imposing data consistency from the measured k-space under readout concatenation framework. The posterior sampling procedure is designed such that the DDPM training can be performed on single-slice images without special adjustments for SMS tasks. Additionally, our method integrates a low-frequency enhancement (LFE) module to address a practical issue that SMS-accelerated fast spin echo (FSE) and echo-planar imaging (EPI) sequences cannot easily embed autocalibration signals. Extensive experiments demonstrate that our approach consistently outperforms existing methods and generalizes well to unseen datasets. The code is available at https://github.com/Solor-pikachu/ROGER after the review process.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Distributed Memory Approximate Message Passing
Authors:
Jun Lu,
Lei Liu,
Shunqi Huang,
Ning Wei,
Xiaoming Chen
Abstract:
Approximate message passing (AMP) algorithms are iterative methods for signal recovery in noisy linear systems. In some scenarios, AMP algorithms need to operate within a distributed network. To address this challenge, the distributed extensions of AMP (D-AMP, FD-AMP) and orthogonal/vector AMP (D-OAMP/D-VAMP) were proposed, but they still inherit the limitations of centralized algorithms. In this…
▽ More
Approximate message passing (AMP) algorithms are iterative methods for signal recovery in noisy linear systems. In some scenarios, AMP algorithms need to operate within a distributed network. To address this challenge, the distributed extensions of AMP (D-AMP, FD-AMP) and orthogonal/vector AMP (D-OAMP/D-VAMP) were proposed, but they still inherit the limitations of centralized algorithms. In this letter, we propose distributed memory AMP (D-MAMP) to overcome the IID matrix limitation of D-AMP/FD-AMP, as well as the high complexity and heavy communication cost of D-OAMP/D-VAMP. We introduce a matrix-by-vector variant of MAMP tailored for distributed computing. Leveraging this variant, D-MAMP enables each node to execute computations utilizing locally available observation vectors and transform matrices. Meanwhile, global summations of locally updated results are conducted through message interaction among nodes. For acyclic graphs, D-MAMP converges to the same mean square error performance as the centralized MAMP.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Authors:
Samuele Cornell,
Taejin Park,
Steve Huang,
Christoph Boeddeker,
Xuankai Chang,
Matthew Maciejewski,
Matthew Wiesner,
Paola Garcia,
Shinji Watanabe
Abstract:
This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi-channel distant speech recognition (DASR) and diarization with one or more, possibly heterogeneous, devices. The main goal is to spur research towards meeting transcription approaches that can generalize across arbitrary number of…
▽ More
This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi-channel distant speech recognition (DASR) and diarization with one or more, possibly heterogeneous, devices. The main goal is to spur research towards meeting transcription approaches that can generalize across arbitrary number of speakers, diverse settings (formal vs. informal conversations), meeting duration, wide-variety of acoustic scenarios and different recording configurations. Novelties with respect to C7DASR include: i) the addition of NOTSOFAR-1, an additional office/corporate meeting scenario, ii) a manually corrected Mixer 6 development set, iii) a new track in which we allow the use of large-language models (LLM) iv) a jury award mechanism to encourage participants to explore also more practical and innovative solutions. To lower the entry barrier for participants, we provide a standalone toolkit for downloading and preparing such datasets as well as performing text normalization and scoring their submissions. Furthermore, this year we also provide two baseline systems, one directly inherited from C7DASR and based on ESPnet and another one developed on NeMo and based on NeMo team submission in last year C7DASR. Baseline system results suggest that the addition of the NOTSOFAR-1 scenario significantly increases the task's difficulty due to its high number of speakers and very short duration.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models
Authors:
Zhiyuan Tang,
Dong Wang,
Shen Huang,
Shidong Shang
Abstract:
Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chin…
▽ More
Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges. Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. Furthermore, we propose a straightforward method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses. The experimental results reveal that Pinyin regularization consistently enhances the error-correcting ability of LLMs when compared with those without regularization. The dataset is available on the website.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Priorformer: A UGC-VQA Method with content and distortion priors
Authors:
Yajing Pei,
Shiyu Huang,
Yiting Lu,
Xin Li,
Zhibo Chen
Abstract:
User Generated Content (UGC) videos are susceptible to complicated and variant degradations and contents, which prevents the existing blind video quality assessment (BVQA) models from good performance since the lack of the adapability of distortions and contents. To mitigate this, we propose a novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its ad…
▽ More
User Generated Content (UGC) videos are susceptible to complicated and variant degradations and contents, which prevents the existing blind video quality assessment (BVQA) models from good performance since the lack of the adapability of distortions and contents. To mitigate this, we propose a novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions. Concretely, we introduce two powerful priors, i.e., the content and distortion priors, by extracting the content and distortion embeddings from two pre-trained feature extractors. Then we adopt these two powerful embeddings as the adaptive prior tokens, which are transferred to the vision transformer backbone jointly with implicit quality features. Based on the above strategy, the proposed PriorFormer achieves state-of-the-art performance on three public UGC VQA datasets including KoNViD-1K, LIVE-VQC and YouTube-UGC.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Diffusion Model-based FOD Restoration from High Distortion in dMRI
Authors:
Shuo Huang,
Lujia Zhong,
Yonggang Shi
Abstract:
Fiber orientation distributions (FODs) is a popular model to represent the diffusion MRI (dMRI) data. However, imaging artifacts such as susceptibility-induced distortion in dMRI can cause signal loss and lead to the corrupted reconstruction of FODs, which prohibits successful fiber tracking and connectivity analysis in affected brain regions such as the brain stem. Generative models, such as the…
▽ More
Fiber orientation distributions (FODs) is a popular model to represent the diffusion MRI (dMRI) data. However, imaging artifacts such as susceptibility-induced distortion in dMRI can cause signal loss and lead to the corrupted reconstruction of FODs, which prohibits successful fiber tracking and connectivity analysis in affected brain regions such as the brain stem. Generative models, such as the diffusion models, have been successfully applied in various image restoration tasks. However, their application on FOD images poses unique challenges since FODs are 4-dimensional data represented by spherical harmonics (SPHARM) with the 4-th dimension exhibiting order-related dependency. In this paper, we propose a novel diffusion model for FOD restoration that can recover the signal loss caused by distortion artifacts. We use volume-order encoding to enhance the ability of the diffusion model to generate individual FOD volumes at all SPHARM orders. Moreover, we add cross-attention features extracted across all SPHARM orders in generating every individual FOD volume to capture the order-related dependency across FOD volumes. We also condition the diffusion model with low-distortion FODs surrounding high-distortion areas to maintain the geometric coherence of the generated FODs. We trained and tested our model using data from the UK Biobank (n = 1315). On a test set with ground truth (n = 43), we demonstrate the high accuracy of the generated FODs in terms of root mean square errors of FOD volumes and angular errors of FOD peaks. We also apply our method to a test set with large distortion in the brain stem area (n = 1172) and demonstrate the efficacy of our method in restoring the FOD integrity and, hence, greatly improving tractography performance in affected brain regions.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Authors:
Vahid Noroozi,
Zhehuai Chen,
Somshubra Majumdar,
Steve Huang,
Jagadeesh Balam,
Boris Ginsburg
Abstract:
In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships be…
▽ More
In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships between the speech and text domains. Our process employs large language models to generate textual components and text-to-speech systems to generate speech components. The proposed methods offer a practical and effective means to expand the training dataset for these models. Experimental results show progress in achieving an integrated understanding of text and speech. We also highlight the potential of using unlabeled speech data to generate synthetic samples comparable in quality to those with available transcriptions, enabling the expansion of these models to more languages.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement
Authors:
Jingcheng Li,
Ye Qiao,
Haocheng Xu,
Sitao Huang
Abstract:
Images captured under low-light scenarios often suffer from low quality. Previous CNN-based deep learning methods often involve using Retinex theory. Nevertheless, most of them cannot perform well in more complicated datasets like LOL-v2 while consuming too much computational resources. Besides, some of these methods require sophisticated training at different stages, making the procedure even mor…
▽ More
Images captured under low-light scenarios often suffer from low quality. Previous CNN-based deep learning methods often involve using Retinex theory. Nevertheless, most of them cannot perform well in more complicated datasets like LOL-v2 while consuming too much computational resources. Besides, some of these methods require sophisticated training at different stages, making the procedure even more time-consuming and tedious. In this paper, we propose a more accurate, concise, and one-stage Retinex theory based framework, RSEND. RSEND first divides the low-light image into the illumination map and reflectance map, then captures the important details in the illumination map and performs light enhancement. After this step, it refines the enhanced gray-scale image and does element-wise matrix multiplication with the reflectance map. By denoising the output it has from the previous step, it obtains the final result. In all the steps, RSEND utilizes Squeeze and Excitation network to better capture the details. Comprehensive quantitative and qualitative experiments show that our Efficient Retinex model significantly outperforms other CNN-based models, achieving a PSNR improvement ranging from 0.44 dB to 4.2 dB in different datasets and even outperforms transformer-based models in the LOL-v2-real dataset.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Autoregressive Image Diffusion: Generation of Image Sequence and Application in MRI
Authors:
Guanxiong Luo,
Shoujin Huang,
Martin Uecker
Abstract:
Magnetic resonance imaging (MRI) is a widely used non-invasive imaging modality. However, a persistent challenge lies in balancing image quality with imaging speed. This trade-off is primarily constrained by k-space measurements, which traverse specific trajectories in the spatial Fourier domain (k-space). These measurements are often undersampled to shorten acquisition times, resulting in image a…
▽ More
Magnetic resonance imaging (MRI) is a widely used non-invasive imaging modality. However, a persistent challenge lies in balancing image quality with imaging speed. This trade-off is primarily constrained by k-space measurements, which traverse specific trajectories in the spatial Fourier domain (k-space). These measurements are often undersampled to shorten acquisition times, resulting in image artifacts and compromised quality. Generative models learn image distributions and can be used to reconstruct high-quality images from undersampled k-space data. In this work, we present the autoregressive image diffusion (AID) model for image sequences and use it to sample the posterior for accelerated MRI reconstruction. The algorithm incorporates both undersampled k-space and pre-existing information. Models trained with fastMRI dataset are evaluated comprehensively. The results show that the AID model can robustly generate sequentially coherent image sequences. In 3D and dynamic MRI, the AID can outperform the standard diffusion model and reduce hallucinations, due to the learned inter-image dependencies.
△ Less
Submitted 24 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
TauAD: MRI-free Tau Anomaly Detection in PET Imaging via Conditioned Diffusion Models
Authors:
Lujia Zhong,
Shuo Huang,
Jiaxin Yue,
Jianwei Zhang,
Zhiwei Deng,
Wenhao Chi,
Yonggang Shi
Abstract:
The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. Furthermore, a…
▽ More
The emergence of tau PET imaging over the last decade has enabled Alzheimer's disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. Furthermore, a high-resolution MRI is required to carry out conventional tau PET analysis, which is not commonly acquired in clinical practices and may not be acquired for many elderly patients with dementia due to strong motion artifacts, claustrophobia, or certain metal implants. In this work, we propose a novel conditional diffusion model to perform MRI-free anomaly detection from tau PET imaging data. By including individualized conditions and two complementary loss maps from pseudo-healthy and pseudo-unhealthy reconstructions, our model computes an anomaly map across the entire brain area that allows simply training a support vector machine (SVM) for classifying disease severity. We train our model on ADNI subjects (n=534) and evaluate its performance on a separate dataset from the preclinical subjects of the A4 clinical trial (n=447). We demonstrate that our method outperforms baseline generative models and the conventional Z-score-based method in anomaly localization without mis-detecting off-target bindings in sub-cortical and out-of-brain areas. By classifying the A4 subjects according to their anomaly map using the SVM trained on ADNI data, we show that our method can successfully group preclinical subjects with significantly different cognitive functions, which further demonstrates the effectiveness of our method in capturing biologically relevant anomaly in tau PET imaging.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Spatial Mode Multiplexing for Fiber-Coupled IM/DD Optical Wireless Links with Misalignment
Authors:
Jinzhe Che,
Shenjie Huang,
Majid Safari
Abstract:
Optical wireless communication (OWC) emerges as a pivotal solution for achieving terabit-level aggregate throughput in next-generation wireless networks. With the mature high-speed transceivers and advanced (de)multiplexing techniques designed for fiber optics, fiber-coupled OWC can be seamlessly integrated into existing ultra-high-speed networks such as data centres. In particular, OWC leveraging…
▽ More
Optical wireless communication (OWC) emerges as a pivotal solution for achieving terabit-level aggregate throughput in next-generation wireless networks. With the mature high-speed transceivers and advanced (de)multiplexing techniques designed for fiber optics, fiber-coupled OWC can be seamlessly integrated into existing ultra-high-speed networks such as data centres. In particular, OWC leveraging spatial mode multiplexing (SMM) and few-mode fiber (FMF) coupling can significantly increase capacity, though misalignment may reduce performance. This paper presents a thorough investigation into the SMM-enabled FMF coupling OWC systems affected by link misalignment, specifically focusing on systems with intensity modulation with direct detection (IM/DD) receivers. A theoretical analysis is conducted to assess the fiber coupling efficiency of the considered system in the presence of both pointing error and angle of arrival (AOA) fluctuations caused by random device vibrations. Our model elucidates the dependence of coupling efficiency to the order of the incident modes, highlighting the critical role of beam properties in system performance. To mitigate the intermodal crosstalk arising from link misalignment, we employ zero-forcing beamforming (ZFBF) to enhance the overall aggregated data rate. Through extensive numerical results, we identify optimal system configurations encompassing aperture design and mode selection, leading to a capacity boost exceeding 200%.
△ Less
Submitted 25 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty
Authors:
Rui Wang,
Xiaoqing Bai,
Shengquan Huang,
Shoupu Wei
Abstract:
As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to addres…
▽ More
As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to address these issues. Using historical data and deep learning classification methods, the proposed method simulates optimal system behaviour without requiring communication infrastructure. The simulation results verify that the proposed method effectively controls the voltage and current amplitude while minimizing the operational cost and three-phase imbalance within acceptable limits. The proposed method shows promise for managing uncertainties and optimizing performance in low-voltage distribution networks.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Dataset and Model for Realistic License Plate Deblurring
Authors:
Haoyan Gong,
Yuzheng Feng,
Zhenrong Zhang,
Xianxu Hou,
Jingxin Liu,
Siqi Huang,
Hongbin Liu
Abstract:
Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int…
▽ More
Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we introduce the first large-scale license plate deblurring dataset named License Plate Blur (LPBlur), captured by a dual-camera system and processed through a post-processing pipeline to avoid misalignment issues. Then, we propose a License Plate Deblurring Generative Adversarial Network (LPDGAN) to tackle the license plate deblurring: 1) a Feature Fusion Module to integrate multi-scale latent codes; 2) a Text Reconstruction Module to restore structure through textual modality; 3) a Partition Discriminator Module to enhance the model's perception of details in each letter. Extensive experiments validate the reliability of the LPBlur dataset for both model training and testing, showcasing that our proposed model outperforms other state-of-the-art motion deblurring methods in realistic license plate deblurring scenarios. The dataset and code are available at https://github.com/haoyGONG/LPDGAN.
△ Less
Submitted 22 April, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
MuPT: A Generative Symbolic Music Pretrained Transformer
Authors:
Xingwei Qu,
Yuelin Bai,
Yinghao Ma,
Ziya Zhou,
Ka Man Lo,
Jiaheng Liu,
Ruibin Yuan,
Lejun Min,
Xueling Liu,
Tianyu Zhang,
Xinrun Du,
Shuyue Guo,
Yiming Liang,
Yizhi Li,
Shangda Wu,
Junting Zhou,
Tianyu Zheng,
Ziyang Ma,
Fengze Han,
Wei Xue,
Gus Xia,
Emmanouil Benetos,
Xiang Yue,
Chenghua Lin,
Xu Tan
, et al. (4 additional authors not shown)
Abstract:
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal…
▽ More
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.
△ Less
Submitted 10 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
An edge detection-based deep learning approach for tear meniscus height measurement
Authors:
Kesheng Wang,
Kunhui Xu,
Xiaoyu Chen,
Chunlei He,
Jianfeng Zhang,
Dexing Kong,
Qi Dai,
Shoujun Huang
Abstract:
Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask lab…
▽ More
Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask labels less affected by subjective factors with enhanced efficiency compared to previous annotation approaches. For improved segmentation of the pupil and tear meniscus areas, the convolutional neural network Inceptionv3 was first implemented as an image quality assessment model, effectively identifying higher-quality images with an accuracy of 98.224%. Subsequently, by using the generated labels, various algorithms, including Unet, ResUnet, Deeplabv3+FcnResnet101, Deeplabv3+FcnResnet50, FcnResnet50, and FcnResnet101 were trained, with Unet demonstrating the best performance. Finally, Unet was used for automatic pupil and tear meniscus segmentation to locate the center of the pupil and calculate TMH,respectively. An evaluation of the mask quality predicted by Unet indicated a Mean Intersection over Union of 0.9362, a recall of 0.9261, a precision of 0.9423, and an F1-Score of 0.9326. Additionally, the TMH predicted by the model was assessed, with the fitting curve represented as y= 0.982x-0.862, an overall correlation coefficient of r^2=0.961 , and an accuracy of 94.80% (237/250). In summary, the algorithm can automatically screen images based on their quality,segment the pupil and tear meniscus areas, and automatically measure TMH. Measurement results using the AI algorithm demonstrate a high level of consistency with manual measurements, offering significant support to clinical doctors in diagnosing dry eye disease.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation
Authors:
Hairong Shi,
Songhao Han,
Shaofei Huang,
Yue Liao,
Guanbin Li,
Xiangxing Kong,
Hua Zhu,
Xiaomu Wang,
Si Liu
Abstract:
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st…
▽ More
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent studies have attempted to enhance SAM with medical expertise by pre-training on large-scale medical segmentation datasets. However, challenges still exist in 3D tumor lesion segmentation owing to tumor complexity and the imbalance in foreground and background regions. Therefore, we introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation. We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks, facilitating the generation of more precise segmentation masks. Furthermore, an iterative refinement scheme is implemented in M-SAM to refine the segmentation masks progressively, leading to improved performance. Extensive experiments on seven tumor lesion segmentation datasets indicate that our M-SAM not only achieves high segmentation accuracy but also exhibits robust generalization. The code is available at https://github.com/nanase1025/M-SAM.
△ Less
Submitted 11 July, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Noise Level Adaptive Diffusion Model for Robust Reconstruction of Accelerated MRI
Authors:
Shoujin Huang,
Guanxiong Luo,
Xi Wang,
Ziran Chen,
Yuwan Wang,
Huaishui Yang,
Pheng-Ann Heng,
Lingyan Zhang,
Mengye Lyu
Abstract:
In general, diffusion model-based MRI reconstruction methods incrementally remove artificially added noise while imposing data consistency to reconstruct the underlying images. However, real-world MRI acquisitions already contain inherent noise due to thermal fluctuations. This phenomenon is particularly notable when using ultra-fast, high-resolution imaging sequences for advanced research, or usi…
▽ More
In general, diffusion model-based MRI reconstruction methods incrementally remove artificially added noise while imposing data consistency to reconstruct the underlying images. However, real-world MRI acquisitions already contain inherent noise due to thermal fluctuations. This phenomenon is particularly notable when using ultra-fast, high-resolution imaging sequences for advanced research, or using low-field systems favored by low- and middle-income countries. These common scenarios can lead to sub-optimal performance or complete failure of existing diffusion model-based reconstruction techniques. Specifically, as the artificially added noise is gradually removed, the inherent MRI noise becomes increasingly pronounced, making the actual noise level inconsistent with the predefined denoising schedule and consequently inaccurate image reconstruction. To tackle this problem, we propose a posterior sampling strategy with a novel NoIse Level Adaptive Data Consistency (Nila-DC) operation. Extensive experiments are conducted on two public datasets and an in-house clinical dataset with field strength ranging from 0.3T to 3T, showing that our method surpasses the state-of-the-art MRI reconstruction methods, and is highly robust against various noise levels. The code for Nila is available at https://github.com/Solor-pikachu/Nila.
△ Less
Submitted 31 July, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
A Self-Healing Magnetic-Array-Type Current Sensor with Data-Driven Identification of Abnormal Magnetic Measurement Units
Authors:
Xiaohu Liu,
Kang Ma,
Jian Liu,
Wei Zhao,
Lisha Peng,
Songling Huang,
Shisong Li
Abstract:
Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmen…
▽ More
Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmental factors poses a potential threat to the measurement accuracy of the current sensor. In light of the need to ensure sustained measurement accuracy over the long term, this paper proposes an innovative self-healing approach rooted in cyber-physics correlation. This approach aims to identify MMUs exhibiting abnormal measurement errors, allowing for the exclusive utilization of the remaining unaffected MMUs in the current measurement process. To achieve this, principal component analysis (PCA) is employed to discern the primary component, arising from fluctuations of the measured current, from the residual component, attributed to the drift in measurement error. This analysis is conducted by scrutinizing the measured data obtained from the MMUs. Subsequently, the squared prediction error (SPE) statistic (also called $Q$ statistic) is deployed to individually identify any MMU displaying abnormal behavior. The experimental results demonstrate the successful online identification of abnormal MMUs without the need for a standard magnetic field sensor. By eliminating the contributions from the identified abnormal MMUs, the accuracy of the current measurement is effectively preserved.
△ Less
Submitted 15 August, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Authors:
Shuokang Huang,
Kaihan Li,
Di You,
Yichong Chen,
Arvin Lin,
Siying Liu,
Xiaohui Li,
Julie A. McCann
Abstract:
WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user se…
▽ More
WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user sensing, there remains a lack of benchmark datasets to facilitate reproducible and comparable research. To bridge this gap, we present WiMANS, to our knowledge, the first dataset for multi-user sensing based on WiFi. WiMANS contains over 9.4 hours of dual-band WiFi Channel State Information (CSI), as well as synchronized videos, monitoring simultaneous activities of multiple users. We exploit WiMANS to benchmark the performance of state-of-the-art WiFi-based human sensing models and video-based models, posing new challenges and opportunities for future work. We believe WiMANS can push the boundaries of current studies and catalyze the research on WiFi-based multi-user sensing.
△ Less
Submitted 12 March, 2024; v1 submitted 24 January, 2024;
originally announced February 2024.
-
Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift
Authors:
Jisheng Bai,
Mou Wang,
Haohe Liu,
Han Yin,
Yafei Jia,
Siwei Huang,
Yutong Du,
Dongzhe Zhang,
Dongyuan Shi,
Woon-Seng Gan,
Mark D. Plumbley,
Susanto Rahardja,
Bin Xiang,
Jianfeng Chen
Abstract:
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug…
▽ More
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift.
△ Less
Submitted 28 February, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Real-Time Systems Optimization with Black-box Constraints and Hybrid Variables
Authors:
Sen Wang,
Dong Li,
Shao-Yu Huang,
Xuanliang Deng,
Ashrarul H. Sifat,
Changhee Jung,
Ryan Williams,
Haibo Zeng
Abstract:
When optimizing real-time systems, designers often face a challenging problem where the schedulability constraints are non-convex, non-continuous, or lack an analytical form to understand their properties. Although the optimization framework NORTH proposed in previous work is general (it works with arbitrary schedulability analysis) and scalable, it can only handle problems with continuous variabl…
▽ More
When optimizing real-time systems, designers often face a challenging problem where the schedulability constraints are non-convex, non-continuous, or lack an analytical form to understand their properties. Although the optimization framework NORTH proposed in previous work is general (it works with arbitrary schedulability analysis) and scalable, it can only handle problems with continuous variables, which limits its application. In this paper, we extend the applications of the framework NORTH to problems with a hybrid of continuous and discrete variables. This is achieved in a coordinate-descent method, where the continuous and discrete variables are optimized separately during iterations. The new framework, NORTH+, improves around 20% solution quality than NORTH in experiments.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness
Authors:
Sicheng Yang,
Zunnan Xu,
Haiwei Xue,
Yongkang Cheng,
Shaoli Huang,
Mingming Gong,
Zhiyong Wu
Abstract:
Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac…
▽ More
Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tackle these issues, we introduce FreeTalker, which, to the best of our knowledge, is the first framework for the generation of both spontaneous (e.g., co-speech gesture) and non-spontaneous (e.g., moving around the podium) speaker motions. Specifically, we train a diffusion-based model for speaker motion generation that employs unified representations of both speech-driven gestures and text-driven motions, utilizing heterogeneous data sourced from various motion datasets. During inference, we utilize classifier-free guidance to highly control the style in the clips. Additionally, to create smooth transitions between clips, we utilize DoubleTake, a method that leverages a generative prior and ensures seamless motion blending. Extensive experiments show that our method generates natural and controllable speaker movements. Our code, model, and demo are are available at \url{https://youngseng.github.io/FreeTalker/}.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
A General and Scalable Method for Optimizing Real-Time Systems
Authors:
Sen Wang,
Dong Li,
Shao-Yu Huang,
Xuanliang Deng,
Ashrarul H. Sifat,
Changhee Jung,
Ryan Williams,
Haibo Zeng
Abstract:
In real-time systems optimization, designers often face a challenging problem posed by the non-convex and non-continuous schedulability conditions, which may even lack an analytical form to understand their properties. To tackle this challenging problem, we treat the schedulability analysis as a black box that only returns true/false results. We propose a general and scalable framework to optimize…
▽ More
In real-time systems optimization, designers often face a challenging problem posed by the non-convex and non-continuous schedulability conditions, which may even lack an analytical form to understand their properties. To tackle this challenging problem, we treat the schedulability analysis as a black box that only returns true/false results. We propose a general and scalable framework to optimize real-time systems, named Numerical Optimizer with Real-Time Highlight (NORTH). NORTH is built upon the gradient-based active-set methods from the numerical optimization literature but with new methods to manage active constraints for the non-differentiable schedulability constraints. In addition, we also generalize NORTH to NORTH+, to collaboratively optimize certain types of discrete variables (\eg priority assignments, categorical variables) with continuous variables based on numerical optimization algorithms. We demonstrate the algorithm performance with two example applications: energy minimization based on dynamic voltage and frequency scaling (DVFS), and optimization of control system performance. In these experiments, NORTH achieved $10^2$ to $10^5$ times speed improvements over state-of-the-art methods while maintaining similar or better solution quality. NORTH+ outperforms NORTH by 30\% with similar algorithm scalability. Both NORTH and NORTH+ support black-box schedulability analysis, ensuring broad applicability.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Implementing Digital Twin in Field-Deployed Optical Networks: Uncertain Factors, Operational Guidance, and Field-Trial Demonstration
Authors:
Yuchen Song,
Min Zhang,
Yao Zhang,
Yan Shi,
Shikui Shen,
Bingli Guo,
Shanguo Huang,
Danshi Wang
Abstract:
Digital twin has revolutionized optical communication networks by enabling their full life-cycle management, including design, troubleshooting, optimization, upgrade, and prediction. While extensive literature exists on frameworks, standards, and applications of digital twin, there is a pressing need in implementing digital twin in field-deployed optical networks operating in real-world environmen…
▽ More
Digital twin has revolutionized optical communication networks by enabling their full life-cycle management, including design, troubleshooting, optimization, upgrade, and prediction. While extensive literature exists on frameworks, standards, and applications of digital twin, there is a pressing need in implementing digital twin in field-deployed optical networks operating in real-world environments, as opposed to controlled laboratory settings. This paper addresses this challenge by examining the uncertain factors behind the inaccuracy of digital twin in field-deployed optical networks from three main challenges and proposing operational guidance for implementing accurate digital twin in field-deployed optical networks. Through the proposed guidance, we demonstrate the effective implementation of digital twin in a field-trial C+L-band optical transmission link, showcasing its capabilities in performance recovery in a fiber cut scenario.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis
Authors:
Shih-Cheng Huang,
Zepeng Huo,
Ethan Steinberg,
Chia-Chun Chiang,
Matthew P. Lungren,
Curtis P. Langlotz,
Serena Yeung,
Nigam H. Shah,
Jason A. Fries
Abstract:
Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patien…
▽ More
Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Toward ground-truth optical coherence tomography via three-dimensional unsupervised deep learning processing and data
Authors:
Renxiong Wu,
Fei Zheng,
Meixuan Li,
Shaoyan Huang,
Xin Ge,
Linbo Liu,
Yong Liu,
Guangming Ni
Abstract:
Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes un…
▽ More
Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes unsupervised 3D deep-learning processing and leverages OCT 3D imaging features to achieve speckle-free OCT imaging. Specifically, our proposed tGT-OCT utilizes an unsupervised 3D-convolution deep-learning network trained using random 3D volumetric data to distinguish and separate speckle from real structures in 3D imaging volumetric space; moreover, tGT-OCT effectively further reduces speckle noise and reveals structures that would otherwise be obscured by speckle noise while preserving spatial resolution. Results derived from different samples demonstrated the high-quality speckle-free 3D imaging performance of tGT-OCT and its advancement beyond the previous state-of-the-art.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
EU COST Action on future generation optical wireless communication technologies, 2nd White paper
Authors:
Z. Ghassemlooy,
M. A. Khalighi,
S. Zvanovec,
A. Shrestha,
B. Ortega,
M. Petkovic,
X. Pang,
C. Sirtori,
D. Orsucci,
A. Shrestha,
F. Moll,
G. Cossu,
V. Spirito,
M. P. Ninos,
E. Ciaramella,
J. Bas,
M. Amay,
S. Huang,
M. Safari,
T. Gutema,
W. Popoola,
Vicente Matus,
Jose Rabadan,
Rafael Perez-Jimenez,
E. Panayirci
, et al. (3 additional authors not shown)
Abstract:
NEWFOCUS is an EU COST Action targeted at exploring radical solutions that could influence the design of future wireless networks. The project aims to address some of the challenges associated with optical wireless communication (OWC) and to establish it as a complementary technology to the radio frequency (RF)-based wireless systems in order to meet the demanding requirements of the fifth generat…
▽ More
NEWFOCUS is an EU COST Action targeted at exploring radical solutions that could influence the design of future wireless networks. The project aims to address some of the challenges associated with optical wireless communication (OWC) and to establish it as a complementary technology to the radio frequency (RF)-based wireless systems in order to meet the demanding requirements of the fifth generation (5G) and the future sixth generation (6G) backhaul and access networks. Only 6G will be able to widely serve the exponential growth in connected devices (i.e., more than 500 billion) in 2030, real-time holographic communication, future virtual reality, etc. Space is emerging as the new frontier in 5 and 6G and beyond communication networks, where it offers high-speed wireless coverage to remote areas both in lands and sees. This activity is supported by the recent development of low-altitude Earth orbit satellite mega-constellations. The focus of this 2nd White Paper is on the use of OWC as an enabling technology for medium- and long-range links for deployment in (i) smart-cities and intelligent transportation systems; (ii) first- and last-mile access and backhaul/fronthaul wireless networks; (iii) hybrid free-space optics/RF adaptive wireless connections; (iv) space-to-ground, inter-satellite, ground-to-air, and air-to-air communications; and (v) underwater communications.
△ Less
Submitted 14 June, 2023;
originally announced November 2023.
-
Optimizing Logical Execution Time Model for Both Determinism and Low Latency
Authors:
Sen Wang,
Dong Li,
Ashrarul H. Sifat,
Shao-Yu Huang,
Xuanliang Deng,
Changhee Jung,
Ryan Williams,
Haibo Zeng
Abstract:
The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called the task's LET interval), and the task reads (resp. writes) at the beginning (resp. end) of the interval. Compared to other communication mechanisms, such as impl…
▽ More
The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called the task's LET interval), and the task reads (resp. writes) at the beginning (resp. end) of the interval. Compared to other communication mechanisms, such as implicit communication and Dynamic Buffer Protocol (DBP), LET performs worse on many metrics, such as end-to-end latency (including reaction time and data age) and time disparity jitter. Compared with the default LET setting, the flexible LET (fLET) model shrinks the LET interval while still guaranteeing schedulability by introducing the virtual offset to defer the read operation and using the virtual deadline to move up the write operation. Therefore, fLET has the potential to significantly improve the end-to-end timing performance while keeping the benefits of deterministic behavior on timing and dataflow.
To fully realize the potential of fLET, we consider the problem of optimizing the assignments of its virtual offsets and deadlines. We propose new abstractions to describe the task communication pattern and new optimization algorithms to explore the solution space efficiently. The algorithms leverage the linearizability of communication patterns and utilize symbolic operations to achieve efficient optimization while providing a theoretical guarantee. The framework supports optimizing multiple performance metrics and guarantees bounded suboptimality when optimizing end-to-end latency. Experimental results show that our optimization algorithms improve upon the default LET and its existing extensions and significantly outperform implicit communication and DBP in terms of various metrics, such as end-to-end latency, time disparity, and its jitter.
△ Less
Submitted 7 March, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Channel-robust Automatic Modulation Classification Using Spectral Quotient Cumulants
Authors:
Sai Huang,
Yuting Chen,
Jiashuo He,
Shuo Chang,
Zhiyong Feng
Abstract:
Automatic modulation classification (AMC) is to identify the modulation format of the received signal corrupted by the channel effects and noise. Most existing works focus on the impact of noise while relatively little attention has been paid to the impact of channel effects. However, the instability posed by multipath fading channels leads to significant performance degradation. To mitigate the a…
▽ More
Automatic modulation classification (AMC) is to identify the modulation format of the received signal corrupted by the channel effects and noise. Most existing works focus on the impact of noise while relatively little attention has been paid to the impact of channel effects. However, the instability posed by multipath fading channels leads to significant performance degradation. To mitigate the adverse effects of the multipath channel, we propose a channel-robust modulation classification framework named spectral quotient cumulant classification (SQCC) for orthogonal frequency division multiplexing (OFDM) systems. Specifically, we first transform the received signal to the spectral quotient (SQ) sequence by spectral circular shift division operations. Secondly, an outlier detector is proposed to filter the outliers in the SQ sequence. At last, we extract spectral quotient cumulants (SQCs) from the filtered SQ sequence as the inputs to train the artificial neural network (ANN) classifier and use the trained ANN to make the final decisions. Simulation results show that our proposed SQCC method exhibits classification robustness and superiority under various unknown Rician multipath fading channels compared with other existing methods. Specifically, the SQCC method achieves nearly 90% classification accuracy at the signal to noise ratio (SNR) of 4dB when testing under multiple channels but training under AWGN channel.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors
Authors:
Junren Chen,
Shuai Huang,
Michael K. Ng,
Zhaoqiang Liu
Abstract:
The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-d…
▽ More
The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol{x}$. First, we consider a $k$-sparse $\boldsymbol{x}$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol{x}$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent (with a good initialization) that produces a sequence linearly converging to $\boldsymbol{x}$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $\boldsymbol{x}$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. We develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(δ)$ at a geometric rate when $m=O(k\log\frac{Lrn}{δ^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Improving Lens Flare Removal with General Purpose Pipeline and Multiple Light Sources Recovery
Authors:
Yuyan Zhou,
Dong Liang,
Songcan Chen,
Sheng-Jun Huang,
Shuo Yang,
Chongyi Li
Abstract:
When taking images against strong light sources, the resulting images often contain heterogeneous flare artifacts. These artifacts can importantly affect image visual quality and downstream computer vision tasks. While collecting real data pairs of flare-corrupted/flare-free images for training flare removal models is challenging, current methods utilize the direct-add approach to synthesize data.…
▽ More
When taking images against strong light sources, the resulting images often contain heterogeneous flare artifacts. These artifacts can importantly affect image visual quality and downstream computer vision tasks. While collecting real data pairs of flare-corrupted/flare-free images for training flare removal models is challenging, current methods utilize the direct-add approach to synthesize data. However, these methods do not consider automatic exposure and tone mapping in image signal processing pipeline (ISP), leading to the limited generalization capability of deep models training using such data. Besides, existing methods struggle to handle multiple light sources due to the different sizes, shapes and illuminance of various light sources. In this paper, we propose a solution to improve the performance of lens flare removal by revisiting the ISP and remodeling the principle of automatic exposure in the synthesis pipeline and design a more reliable light sources recovery strategy. The new pipeline approaches realistic imaging by discriminating the local and global illumination through convex combination, avoiding global illumination shifting and local over-saturation. Our strategy for recovering multiple light sources convexly averages the input and output of the neural network based on illuminance levels, thereby avoiding the need for a hard threshold in identifying light sources. We also contribute a new flare removal testing dataset containing the flare-corrupted images captured by ten types of consumer electronics. The dataset facilitates the verification of the generalization capability of flare removal methods. Extensive experiments show that our solution can effectively improve the performance of lens flare removal and push the frontier toward more general situations.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
AATCT-IDS: A Benchmark Abdominal Adipose Tissue CT Image Dataset for Image Denoising, Semantic Segmentation, and Radiomics Evaluation
Authors:
Zhiyu Ma,
Chen Li,
Tianming Du,
Le Zhang,
Dechao Tang,
Deguo Ma,
Shanchuan Huang,
Yan Liu,
Yihao Sun,
Zhihao Chen,
Jin Yuan,
Qianqing Nie,
Marcin Grzegorzek,
Hongzan Sun
Abstract:
Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentati…
▽ More
Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentation models, and study radiomics. For different tasks, this paper compares and analyzes the performance of various methods on AATTCT-IDS by combining the visualization results and evaluation data. Thus, verify the research potential of this data set in the above three types of tasks.
Results: In the comparative study of image denoising, algorithms using a smoothing strategy suppress mixed noise at the expense of image details and obtain better evaluation data. Methods such as BM3D preserve the original image structure better, although the evaluation data are slightly lower. The results show significant differences among them. In the comparative study of semantic segmentation of abdominal adipose tissue, the segmentation results of adipose tissue by each model show different structural characteristics. Among them, BiSeNet obtains segmentation results only slightly inferior to U-Net with the shortest training time and effectively separates small and isolated adipose tissue. In addition, the radiomics study based on AATTCT-IDS reveals three adipose distributions in the subject population.
Conclusion: AATTCT-IDS contains the ground truth of adipose tissue regions in abdominal CT slices. This open-source dataset can attract researchers to explore the multi-dimensional characteristics of abdominal adipose tissue and thus help physicians and patients in clinical practice. AATCT-IDS is freely published for non-commercial purpose at: \url{https://figshare.com/articles/dataset/AATTCT-IDS/23807256}.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Tissue Segmentation of Thick-Slice Fetal Brain MR Scans with Guidance from High-Quality Isotropic Volumes
Authors:
Shijie Huang,
Xukun Zhang,
Zhiming Cui,
He Zhang,
Geng Chen,
Dinggang Shen
Abstract:
Accurate tissue segmentation of thick-slice fetal brain magnetic resonance (MR) scans is crucial for both reconstruction of isotropic brain MR volumes and the quantification of fetal brain development. However, this task is challenging due to the use of thick-slice scans in clinically-acquired fetal brain data. To address this issue, we propose to leverage high-quality isotropic fetal brain MR vol…
▽ More
Accurate tissue segmentation of thick-slice fetal brain magnetic resonance (MR) scans is crucial for both reconstruction of isotropic brain MR volumes and the quantification of fetal brain development. However, this task is challenging due to the use of thick-slice scans in clinically-acquired fetal brain data. To address this issue, we propose to leverage high-quality isotropic fetal brain MR volumes (and also their corresponding annotations) as guidance for segmentation of thick-slice scans. Due to existence of significant domain gap between high-quality isotropic volume (i.e., source data) and thick-slice scans (i.e., target data), we employ a domain adaptation technique to achieve the associated knowledge transfer (from high-quality <source> volumes to thick-slice <target> scans). Specifically, we first register the available high-quality isotropic fetal brain MR volumes across different gestational weeks to construct longitudinally-complete source data. To capture domain-invariant information, we then perform Fourier decomposition to extract image content and style codes. Finally, we propose a novel Cycle-Consistent Domain Adaptation Network (C2DA-Net) to efficiently transfer the knowledge learned from high-quality isotropic volumes for accurate tissue segmentation of thick-slice scans. Our C2DA-Net can fully utilize a small set of annotated isotropic volumes to guide tissue segmentation on unannotated thick-slice scans. Extensive experiments on a large-scale dataset of 372 clinically acquired thick-slice MR scans demonstrate that our C2DA-Net achieves much better performance than cutting-edge methods quantitatively and qualitatively.
△ Less
Submitted 4 December, 2023; v1 submitted 13 August, 2023;
originally announced August 2023.
-
Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge
Authors:
Jun Ma,
Yao Zhang,
Song Gu,
Cheng Ge,
Shihao Ma,
Adamo Young,
Cheng Zhu,
Kangkang Meng,
Xin Yang,
Ziyan Huang,
Fan Zhang,
Wentao Liu,
YuanKe Pan,
Shoujin Huang,
Jiacheng Wang,
Mingze Sun,
Weixin Xu,
Dengqiang Jia,
Jae Won Choi,
Natália Alves,
Bram de Wilde,
Gregor Koehler,
Yajun Wu,
Manuel Wiesenfarth,
Qiongjie Zhu
, et al. (4 additional authors not shown)
Abstract:
Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations,…
▽ More
Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations, we organized the FLARE 2022 Challenge, the largest abdominal organ analysis challenge to date, to benchmark fast, low-resource, accurate, annotation-efficient, and generalized AI algorithms. We constructed an intercontinental and multinational dataset from more than 50 medical groups, including Computed Tomography (CT) scans with different races, diseases, phases, and manufacturers. We independently validated that a set of AI algorithms achieved a median Dice Similarity Coefficient (DSC) of 90.0\% by using 50 labeled scans and 2000 unlabeled scans, which can significantly reduce annotation requirements. The best-performing algorithms successfully generalized to holdout external validation sets, achieving a median DSC of 89.5\%, 90.9\%, and 88.3\% on North American, European, and Asian cohorts, respectively. They also enabled automatic extraction of key organ biology features, which was labor-intensive with traditional manual measurements. This opens the potential to use unlabeled data to boost performance and alleviate annotation shortages for modern AI models.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation
Authors:
Lizhao Liu,
Zhuangwei Zhuang,
Shangxin Huang,
Xunlong Xiao,
Tianhang Xiang,
Cen Chen,
Jingdong Wang,
Mingkui Tan
Abstract:
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked m…
▽ More
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked modeling (e.g., MAE) in image and video representation learning, we seek to endow the power of masked modeling to learn contextual information from sparsely-annotated points. However, directly applying MAE to 3D point clouds with sparse annotations may fail to work. First, it is nontrivial to effectively mask out the informative visual context from 3D point clouds. Second, how to fully exploit the sparse annotations for context modeling remains an open question. In this paper, we propose a simple yet effective Contextual Point Cloud Modeling (CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method. Specifically, RegionMask masks the point cloud continuously in geometric space to construct a meaningful masked prediction task for subsequent context learning. CMT disentangles the learning of supervised segmentation and unsupervised masked context prediction for effectively learning the very limited labeled points and mass unlabeled points, respectively. Extensive experiments on the widely-tested ScanNet V2 and S3DIS benchmarks demonstrate the superiority of CPCM over the state-of-the-art.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection
Authors:
Siwei Huang,
Jianfeng Chen,
Jisheng Bai,
Yafei Jia,
Dongzhe Zhang
Abstract:
DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolutio…
▽ More
DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolution blocks to adaptively model the channel-wise features with different receptive fields. Secondly, we incorporate the SELDnet and EINv2 framework into the proposed SELD system with multi-track ACCDOA. Moreover, two scene-dedicated strategies are introduced into the training stage to improve the generalization of the system in realistic spatial sound scenes. Finally, we apply data augmentation methods to extend the dataset using channel rotation, spatial data synthesis. Four joint metrics are used to evaluate the performance of the SELD system on the Sony-TAu Realistic Spatial Soundscapes 2022 dataset.Experimental results show that the proposed systems outperform the fixed-kernel convolution SELD systems. In addition, the proposed system achieved an SELD score of 0.348 in the DCASE SELD task and surpassed the SOTA methods.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Model-based T1, T2* and Proton Density Mapping Using a Bayesian Approach with Parameter Estimation and Complementary Undersampling Patterns
Authors:
Shuai Huang,
James J. Lah,
Jason W. Allen,
Deqiang Qiu
Abstract:
Purpose: To achieve automatic hyperparameter estimation for the joint recovery of quantitative MR images, we propose a Bayesian formulation of the reconstruction problem that incorporates the signal model. Additionally, we investigate the use of complementary undersampling patterns to determine optimal undersampling schemes for quantitative MRI.
Theory: We introduce a novel nonlinear approximate…
▽ More
Purpose: To achieve automatic hyperparameter estimation for the joint recovery of quantitative MR images, we propose a Bayesian formulation of the reconstruction problem that incorporates the signal model. Additionally, we investigate the use of complementary undersampling patterns to determine optimal undersampling schemes for quantitative MRI.
Theory: We introduce a novel nonlinear approximate message passing framework, referred to as ``AMP-PE'', that enables the simultaneous recovery of distribution parameters and quantitative maps.
Methods: We employed the variable flip angle multi-echo (VFA-ME) method to acquire measurements. Both retrospective and prospective undersampling approaches were utilized to obtain Fourier measurements using variable-density and Poisson-disk patterns. Furthermore, we extensively explored various undersampling schemes, incorporating complementary patterns across different flip angles and/or echo times.
Results: AMP-PE adopts a model-based joint recovery strategy, it outperforms the $l_1$-norm minimization approach that follows a decoupled recovery strategy. A comparison with an existing joint-recovery approach further demonstrates the advantageous outcomes of AMP-PE. For quantitative $T_1$ mapping using VFA-ME, employing identical k-space sampling patterns across different echo times produced the best performance. Whereas for $T_2^*$ and proton density mappings, using complementary sampling patterns across different flip angles yielded the best performance.
Conclusion: AMP-PE is equipped with built-in parameter estimation, and works naturally in clinical settings with varying acquisition protocols and scanners. It also achieves improved performance by combining information from the MR signal model and the sparse prior on images.
△ Less
Submitted 10 September, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
TransMRSR: Transformer-based Self-Distilled Generative Prior for Brain MRI Super-Resolution
Authors:
Shan Huang,
Xiaohong Liu,
Tao Tan,
Menghan Hu,
Xiaoer Wei,
Tingli Chen,
Bin Sheng
Abstract:
Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural info…
▽ More
Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural information combination; and (2) large-scale restoration when applied for reconstructing thick-slice MRI into high-resolution (HR) iso-tropic data. To address these problems, we propose a novel two-stage network for brain MRI SR named TransMRSR based on the convolutional blocks to extract local information and transformer blocks to capture long-range dependencies. TransMRSR consists of three modules: the shallow local feature extraction, the deep non-local feature capture, and the HR image reconstruction. We perform a generative task to encapsulate diverse priors into a generative network (GAN), which is the decoder sub-module of the deep non-local feature capture part, in the first stage. The pre-trained GAN is used for the second stage of SR task. We further eliminate the potential latent space shift caused by the two-stage training strategy through the self-distilled truncation trick. The extensive experiments show that our method achieves superior performance to other SSIR methods on both public and private datasets. Code is released at https://github.com/goddesshs/TransMRSR.git .
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement
Authors:
Han Yin,
Jisheng Bai,
Mou Wang,
Siwei Huang,
Yafei Jia,
Jianfeng Chen
Abstract:
3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both t…
▽ More
3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both the time and frequency domains. And an attention mechanism is proposed to fuse the original signal, reference signal, and generated masks. Moreover, we introduce a loss function to simultaneously optimize the network in the time-frequency and time domains. Experimental results show that our system outperforms the state-of-the-art systems on the dataset of ICASSP L3DAS23 challenge.
△ Less
Submitted 19 November, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus
Authors:
Lufei Gao,
Shan Huang,
Li Liu
Abstract:
Cued Speech (CS) is a multi-modal visual coding system combining lip reading with several hand cues at the phonetic level to make the spoken language visible to the hearing impaired. Previous studies solved asynchronous problems between lip and hand movements by a cuer\footnote{The people who perform Cued Speech are called the cuer.}-dependent piecewise linear model for English and French CS. In t…
▽ More
Cued Speech (CS) is a multi-modal visual coding system combining lip reading with several hand cues at the phonetic level to make the spoken language visible to the hearing impaired. Previous studies solved asynchronous problems between lip and hand movements by a cuer\footnote{The people who perform Cued Speech are called the cuer.}-dependent piecewise linear model for English and French CS. In this work, we innovatively propose three statistical measure on the lip stream to build an interpretable and generalizable model for predicting hand preceding time (HPT), which achieves cuer-independent by a proper normalization. Particularly, we build the first Mandarin CS corpus comprising annotated videos from five speakers including three normal and two hearing impaired individuals. Consequently, we show that the hand preceding phenomenon exists in Mandarin CS production with significant differences between normal and hearing impaired people. Extensive experiments demonstrate that our model outperforms the baseline and the previous state-of-the-art methods.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Average AoI Minimization for Energy Harvesting Relay-aided Status Update Network Using Deep Reinforcement Learning
Authors:
Sin-Yu Huang,
Kuang-Hao,
Liu
Abstract:
A dual-hop status update system aided by energy harvesting (EH) relays with finite data and energy buffers is studied in this work. To achieve timely status updates, the best relays should be selected to minimize the average age of information (AoI), which is a recently proposed metric to evaluate information freshness. The average AoI minimization can be formulated as a Markov decision process (M…
▽ More
A dual-hop status update system aided by energy harvesting (EH) relays with finite data and energy buffers is studied in this work. To achieve timely status updates, the best relays should be selected to minimize the average age of information (AoI), which is a recently proposed metric to evaluate information freshness. The average AoI minimization can be formulated as a Markov decision process (MDP), but the state space for capturing channel and buffer evolution grows exponentially with the number of relays, leading to high solution complexity. We propose a relay selection (RS) scheme based on deep reinforcement learning (DRL) according to the instantaneous channel packet freshness and buffer information of each relay. Simulation results show a significant improvement of the proposed DRL-based RS scheme over state-of-art approaches.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Learning Music Sequence Representation from Text Supervision
Authors:
Tianyu Chen,
Yuan Xie,
Shuai Zhang,
Shaohan Huang,
Haoyi Zhou,
Jianxin Li
Abstract:
Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals. To excavate better MUsic SEquence Representation from labeled audio, we propose a novel text-supervision pre-training method, namely MUSER. MUSER adopts an audio-spectrum-text tri-modal contrastive learning framework, where the text input could be any form o…
▽ More
Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals. To excavate better MUsic SEquence Representation from labeled audio, we propose a novel text-supervision pre-training method, namely MUSER. MUSER adopts an audio-spectrum-text tri-modal contrastive learning framework, where the text input could be any form of meta-data with the help of text templates while the spectrum is derived from an audio sequence. Our experiments reveal that MUSER could be more flexibly adapted to downstream tasks compared with the current data-hungry pre-training method, and it only requires 0.056% of pre-training data to achieve the state-of-the-art performance.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
A Reminder of its Brittleness: Language Reward Shaping May Hinder Learning for Instruction Following Agents
Authors:
Sukai Huang,
Nir Lipovetzky,
Trevor Cohn
Abstract:
Teaching agents to follow complex written instructions has been an important yet elusive goal. One technique for enhancing learning efficiency is language reward shaping (LRS). Within a reinforcement learning (RL) framework, LRS involves training a reward function that rewards behaviours precisely aligned with given language instructions. We argue that the apparent success of LRS is brittle, and p…
▽ More
Teaching agents to follow complex written instructions has been an important yet elusive goal. One technique for enhancing learning efficiency is language reward shaping (LRS). Within a reinforcement learning (RL) framework, LRS involves training a reward function that rewards behaviours precisely aligned with given language instructions. We argue that the apparent success of LRS is brittle, and prior positive findings can be attributed to weak RL baselines. Specifically, we identified suboptimal LRS designs that reward partially matched trajectories, and we characterised a novel reward perturbation to capture this issue using the concept of loosening task constraints. We provided theoretical and empirical evidence that agents trained using LRS rewards converge more slowly compared to pure RL agents. Our work highlights the brittleness of existing LRS methods, which has been overlooked in the previous studies.
△ Less
Submitted 17 August, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
An image segmentation algorithm based on multi-scale feature pyramid network
Authors:
Yu Xiao,
Xin Yang,
Sijuan Huang,
Lihua Guo
Abstract:
Medical image segmentation is particularly critical as a prerequisite for relevant quantitative analysis in the treatment of clinical diseases. For example, in clinical cervical cancer radiotherapy, after acquiring subabdominal MRI images, a fast and accurate image segmentation of organs and tumors in MRI images can optimize the clinical radiotherapy process, whereas traditional approaches use man…
▽ More
Medical image segmentation is particularly critical as a prerequisite for relevant quantitative analysis in the treatment of clinical diseases. For example, in clinical cervical cancer radiotherapy, after acquiring subabdominal MRI images, a fast and accurate image segmentation of organs and tumors in MRI images can optimize the clinical radiotherapy process, whereas traditional approaches use manual annotation by specialist doctors, which is time-consuming and laborious, therefore, automatic organ segmentation of subabdominal MRI images is a valuable research topic.
△ Less
Submitted 28 June, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Single-Photon Counting Receivers for Optical Wireless Communications in Future 6G Networks
Authors:
Shenjie Huang,
Danial Chitnis,
Cheng Chen,
Harald Haas,
Mohammad-Ali Khalighi,
Robert K. Henderson,
Majid Safari
Abstract:
Optical wireless communication (OWC) offers several complementary advantages to radio-frequency wireless networks such as its massive available spectrum; hence, it is widely anticipated that OWC will assume a pivotal role in the forthcoming sixth generation wireless communication networks. Although significant progress has been achieved in OWC over the past decades, the outage induced by occasiona…
▽ More
Optical wireless communication (OWC) offers several complementary advantages to radio-frequency wireless networks such as its massive available spectrum; hence, it is widely anticipated that OWC will assume a pivotal role in the forthcoming sixth generation wireless communication networks. Although significant progress has been achieved in OWC over the past decades, the outage induced by occasionally low received optical power continues to pose a key limiting factor for its deployment. In this work, we discuss the potential role of single-photon counting (SPC) receivers as a promising solution to overcome this limitation. We present an overview of the applications of SPC-based OWC systems in 6G networks, introduce their major performance-limiting factors, propose a performance enhancement framework to tackle these issues, and identify critical areas of open problems for future research.
△ Less
Submitted 30 October, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral Image Denoising
Authors:
Haijin Zeng,
Jiezhang Cao,
Kai Feng,
Shaoguang Huang,
Hongyan Zhang,
Hiep Luong,
Wilfried Philips
Abstract:
Hyperspectral imaging (HI) has emerged as a powerful tool in diverse fields such as medical diagnosis, industrial inspection, and agriculture, owing to its ability to detect subtle differences in physical properties through high spectral resolution. However, hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both mode…
▽ More
Hyperspectral imaging (HI) has emerged as a powerful tool in diverse fields such as medical diagnosis, industrial inspection, and agriculture, owing to its ability to detect subtle differences in physical properties through high spectral resolution. However, hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed. However, model-based approaches rely on hand-crafted priors and hyperparameters, while learning-based methods are incapable of estimating the inherent degradation patterns and noise distributions in the imaging procedure, which could inform supervised learning. Secondly, learning-based algorithms predominantly rely on CNN and fail to capture long-range dependencies, resulting in limited interpretability. This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues. Firstly, DNA-Net models sparse noise, Gaussian noise, and explicitly represent image prior using transformer. Then the model is unfolded into an end-to-end network, the hyperparameters within the model are estimated from the noisy HSI and degradation model and utilizes them to control each iteration. Additionally, we introduce a novel U-Shaped Local-Non-local-Spectral Transformer (U-LNSA) that captures spectral correlation, local contents, and non-local dependencies simultaneously. By integrating U-LNSA into DNA-Net, we present the first Transformer-based deep unfolding HSI denoising method. Experimental results show that DNA-Net outperforms state-of-the-art methods, and the modeling of noise distributions helps in cases with heavy noise.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.