-
A Deep-Learning-Based Lable-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising
Authors:
Shuaiyu Yuan,
Tristan Whitmarsh,
Dimitri A Kessler,
Otso Arponen,
Mary A McLean,
Gabrielle Baxter,
Frank Riemer,
Aneurin J Kennerley,
William J Brackenbury,
Fiona J Gilbert,
Joshua D Kaggie
Abstract:
New multinuclear MRI techniques, such as sodium MRI, generally suffer from low image quality due to an inherently low signal. Postprocessing methods, such as image denoising, have been developed for image enhancement. However, the assessment of these enhanced images is challenging especially considering when there is a lack of high resolution and high signal images as reference, such as in sodium…
▽ More
New multinuclear MRI techniques, such as sodium MRI, generally suffer from low image quality due to an inherently low signal. Postprocessing methods, such as image denoising, have been developed for image enhancement. However, the assessment of these enhanced images is challenging especially considering when there is a lack of high resolution and high signal images as reference, such as in sodium MRI. No-reference Image Quality Assessment (NR-IQA) metrics are approaches to solve this problem. Existing learning-based NR-IQA metrics rely on labels derived from subjective human opinions or metrics like Signal-to-Noise Ratio (SNR), which are either time-consuming or lack accurate ground truths, resulting in unreliable assessment. We note that deep learning (DL) models have a unique characteristic in that they are specialized to a characteristic training set, meaning that deviations between the input testing data from the training data will reduce prediction accuracy. Therefore, we propose a novel DL-based NR-IQA metric, the Model Specialization Metric (MSM), which does not depend on ground-truth images or labels. MSM measures the difference between the input image and the model's prediction for evaluating the quality of the input image. Experiments conducted on both simulated distorted proton T1-weighted MR images and denoised sodium MR images demonstrate that MSM exhibits a superior evaluation performance on various simulated noises and distortions. MSM also has a substantial agreement with the expert evaluations, achieving an averaged Cohen's Kappa coefficient of 0.6528, outperforming the existing NR-IQA metrics.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy
Authors:
Long Bai,
Tong Chen,
Qiaozhi Tan,
Wan Jun Nah,
Yanheng Li,
Zhicheng He,
Sishen Yuan,
Zhen Chen,
Jinlin Wu,
Mobarakol Islam,
Zhen Li,
Hongbin Liu,
Hongliang Ren
Abstract:
Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema…
▽ More
Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DiT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DiT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance.
△ Less
Submitted 8 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models
Authors:
Kun Huang,
Xiao Ma,
Yuhan Zhang,
Na Su,
Songtao Yuan,
Yong Liu,
Qiang Chen,
Huazhu Fu
Abstract:
Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty t…
▽ More
Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty to synthesize high-resolution OCT volumes. In this paper, we introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. First, we propose non-holistic autoencoders to efficiently build a bidirectional mapping between high-resolution volume space and low-resolution latent space. In tandem with autoencoders, we propose cascaded diffusion processes to synthesize high-resolution OCT volumes with a global-to-local refinement process, amortizing the memory and computational demands. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods. Moreover, performance gains on two down-stream fine-grained segmentation tasks demonstrate the benefit of the proposed method in training deep learning models for medical imaging tasks. The code is public available at: https://github.com/nicetomeetu21/CA-LDM.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal Transitivity
Authors:
Sishen Yuan,
Guang Li,
Baijia Liang,
Lailu Li,
Qingzhuo Zheng,
Shuang Song,
Zhen Li,
Hongliang Ren
Abstract:
Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduce…
▽ More
Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduces a chained flexible capsule endoscope (FCE) design concept, specifically conceived to navigate the inherent volume constraints of capsule endoscopes whilst augmenting their therapeutic functionalities. The FCE's distinctive flexibility originates from a conventional rotating joint design and the incision pattern in the flexible material. In vitro experiments validated the passive navigation ability of the FCE in rugged intestinal tracts. Further, the FCE demonstrates consistent reptile-like peristalsis under the influence of an external magnetic field, and possesses the capability for film expansion and disintegration under high-frequency electromagnetic stimulation. These findings illuminate a promising path toward amplifying the therapeutic capacities of capsule endoscopes without necessitating a size compromise.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach
Authors:
Sishen Yuan,
Baijia Liang,
Po Wa Wong,
Mingjing Xu,
Chi Hsuan Li,
Zhen Li,
Hongliang Ren
Abstract:
Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (P…
▽ More
Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (PDT) emerges as a promising prospect in this context. This study presents the development and implementation of a magnetically-guided origami robot, incorporating flexible printed circuit units for sustained and stable phototherapy of Helicobacter pylori. Each integrated unit is equipped with wireless charging capabilities, producing an optimal power output that can concurrently illuminate up to 15 LEDs at their maximum intensity. Crucially, these units can be remotely manipulated via a magnetic field, facilitating both translational and rotational movements. We propose an open-loop manual control sequence that allows the formation of a stable, compliant triangular structure through the interaction of internal magnets. This adaptable configuration is uniquely designed to withstand the dynamic squeezing environment prevalent in real-world gastric applications. The research herein represents a significant stride in leveraging technology for innovative medical solutions, particularly in the management of antibiotic-resistant Helicobacter pylori infections.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
Authors:
Runmin Dong,
Shuai Yuan,
Bin Luo,
Mengxuan Chen,
Jinxiao Zhang,
Lixian Zhang,
Weijia Li,
Juepeng Zheng,
Haohuan Fu
Abstract:
Reference-based super-resolution (RefSR) has the potential to build bridges across spatial and temporal resolutions of remote sensing images. However, existing RefSR methods are limited by the faithfulness of content reconstruction and the effectiveness of texture transfer in large scaling factors. Conditional diffusion models have opened up new opportunities for generating realistic high-resoluti…
▽ More
Reference-based super-resolution (RefSR) has the potential to build bridges across spatial and temporal resolutions of remote sensing images. However, existing RefSR methods are limited by the faithfulness of content reconstruction and the effectiveness of texture transfer in large scaling factors. Conditional diffusion models have opened up new opportunities for generating realistic high-resolution images, but effectively utilizing reference images within these models remains an area for further exploration. Furthermore, content fidelity is difficult to guarantee in areas without relevant reference information. To solve these issues, we propose a change-aware diffusion model named Ref-Diff for RefSR, using the land cover change priors to guide the denoising process explicitly. Specifically, we inject the priors into the denoising model to improve the utilization of reference information in unchanged areas and regulate the reconstruction of semantically relevant content in changed areas. With this powerful guidance, we decouple the semantics-guided denoising and reference texture-guided denoising processes to improve the model performance. Extensive experiments demonstrate the superior effectiveness and robustness of the proposed method compared with state-of-the-art RefSR methods in both quantitative and qualitative evaluations. The code and data are available at https://github.com/dongrunmin/RefDiff.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures
Authors:
Andrei Cozma,
Landon Harris,
Hairong Qi,
Ping Ji,
Wenpeng Guo,
Song Yuan
Abstract:
This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and te…
▽ More
This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and textures of tire X-ray images, the study emphasizes the significance of feature engineering to enhance the performance of defect detection systems. By meticulously integrating combinations of these features with a Random Forest (RF) classifier and comparing them against advanced models like YOLOv8, the research not only benchmarks the performance of traditional features in defect detection but also explores the synergy between classical and modern approaches. The experimental results demonstrate that these traditional features, when fine-tuned and combined with machine learning models, can significantly improve the accuracy and reliability of tire defect detection, aiming to set a new standard in automated quality assurance in tire manufacturing.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
DeepLight: Reconstructing High-Resolution Observations of Nighttime Light With Multi-Modal Remote Sensing Data
Authors:
Lixian Zhang,
Runmin Dong,
Shuai Yuan,
Jinxiao Zhang,
Mengxuan Chen,
Juepeng Zheng,
Haohuan Fu
Abstract:
Nighttime light (NTL) remote sensing observation serves as a unique proxy for quantitatively assessing progress toward meeting a series of Sustainable Development Goals (SDGs), such as poverty estimation, urban sustainable development, and carbon emission. However, existing NTL observations often suffer from pervasive degradation and inconsistency, limiting their utility for computing the indicato…
▽ More
Nighttime light (NTL) remote sensing observation serves as a unique proxy for quantitatively assessing progress toward meeting a series of Sustainable Development Goals (SDGs), such as poverty estimation, urban sustainable development, and carbon emission. However, existing NTL observations often suffer from pervasive degradation and inconsistency, limiting their utility for computing the indicators defined by the SDGs. In this study, we propose a novel approach to reconstruct high-resolution NTL images using multi-modal remote sensing data. To support this research endeavor, we introduce DeepLightMD, a comprehensive dataset comprising data from five heterogeneous sensors, offering fine spatial resolution and rich spectral information at a national scale. Additionally, we present DeepLightSR, a calibration-aware method for building bridges between spatially heterogeneous modality data in the multi-modality super-resolution. DeepLightSR integrates calibration-aware alignment, an auxiliary-to-main multi-modality fusion, and an auxiliary-embedded refinement to effectively address spatial heterogeneity, fuse diversely representative features, and enhance performance in $8\times$ super-resolution (SR) tasks. Extensive experiments demonstrate the superiority of DeepLightSR over 8 competing methods, as evidenced by improvements in PSNR (2.01 dB $ \sim $ 13.25 dB) and PIQE (0.49 $ \sim $ 9.32). Our findings underscore the practical significance of our proposed dataset and model in reconstructing high-resolution NTL data, supporting efficiently and quantitatively assessing the SDG progress.
△ Less
Submitted 23 May, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
3D high-resolution imaging algorithm using 1D MIMO array for autonomous driving application
Authors:
Sen Yuan,
Francesco Fioranelli,
Alexander Yarovoy
Abstract:
The problem of 3D high-resolution imaging in automotive multiple-input multiple-output (MIMO) side-looking radar using a 1D array is considered. The concept of motion-enhanced snapshots is introduced for generating larger apertures in the azimuth dimension. For the first time, 3D imaging capabilities can be achieved with high angular resolution using a 1D MIMO antenna array, which can alleviate th…
▽ More
The problem of 3D high-resolution imaging in automotive multiple-input multiple-output (MIMO) side-looking radar using a 1D array is considered. The concept of motion-enhanced snapshots is introduced for generating larger apertures in the azimuth dimension. For the first time, 3D imaging capabilities can be achieved with high angular resolution using a 1D MIMO antenna array, which can alleviate the requirement for large radar systems in autonomous vehicles. The robustness to variations in the vehicle's movement trajectory is also considered and addressed with relevant compensations in the steering vector. The available degrees of freedom as well as the Signal to Noise Ratio (SNR) are shown to increase with the proposed method compared to conventional imaging approaches. The performance of the algorithm has been studied in simulations, and validated with experimental data collected in a realistic driving scenario.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destriping
Authors:
Shiqi Yang,
Hanlin Qin,
Shuai Yuan,
Xiang Yan,
Hossein Rahmani
Abstract:
CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destriping task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise unde…
▽ More
CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destriping task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise under unsupervised constraints. This poses a threat to the effectiveness of the cycle-consistency loss, leading to stripe noise residual in the denoised image. To address the above issue, we present a novel framework for single-frame infrared image destriping, named DestripeCycleGAN. In this model, the conventional auxiliary generator is replaced with a priori stripe generation model (SGM) to introduce vertical stripe noise in the clean data, and the gradient map is employed to re-establish cycle-consistency. Meanwhile, a Haar wavelet background guidance module (HBGM) has been designed to minimize the divergence of background details between the different domains. To preserve vertical edges, a multi-level wavelet U-Net (MWUNet) is proposed as the denoising generator, which utilizes the Haar wavelet transform as the sampler to decline directional information loss. Moreover, it incorporates the group fusion block (GFB) into skip connections to fuse the multi-scale features and build the context of long-distance dependencies. Extensive experiments on real and synthetic data demonstrate that our DestripeCycleGAN surpasses the state-of-the-art methods in terms of visual quality and quantitative evaluation. Our code will be made public at https://github.com/0wuji/DestripeCycleGAN.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Contingency Detection in Modern Power Systems: A Stochastic Hybrid System Method
Authors:
Shuo Yuan,
Le Yi Wang,
George Yin,
Masoud H. Nazari
Abstract:
This paper introduces a new stochastic hybrid system (SHS) framework for contingency detection in modern power systems (MPS). The framework uses stochastic hybrid system representations in state space models to expand and facilitate capability of contingency detection. In typical microgrids (MGs), buses may contain various synchronous generators, renewable generators, controllable loads, battery s…
▽ More
This paper introduces a new stochastic hybrid system (SHS) framework for contingency detection in modern power systems (MPS). The framework uses stochastic hybrid system representations in state space models to expand and facilitate capability of contingency detection. In typical microgrids (MGs), buses may contain various synchronous generators, renewable generators, controllable loads, battery systems, regular loads, etc. For development of SHS models in power systems, this paper introduces the concept of dynamic and non-dynamic buses. By converting a physical power grid into a virtual linearized state space model and representing contingencies as random switching of system structures and parameters, this paper formulates the contingency detection problem as a joint estimation problem of discrete event and continuous states in stochastic hybrid systems. This method offers unique advantages, including using common measurement signals on voltage and current synchrophasors to detect different types and locations of contingencies, avoiding expensive local direct fault measurements and detecting certain contingencies that cannot be directly measured. The method employs a small and suitably-designed probing signal to sustain the ability of persistent contingency detection. Joint estimation algorithms are presented with their proven convergence and reliability properties. Examples that use an IEEE 5-bus system demonstrate the main ideas and derivation steps. Simulation case studies on an IEEE 33-bus system are used for detecting transmission line faults and sensor interruptions.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Stochastic Hybrid System Modeling and State Estimation of Modern Power Systems under Contingency
Authors:
Shuo Yuan,
Le Yi Wang,
George Yin,
Masoud H. Nazari
Abstract:
This paper introduces a stochastic hybrid system (SHS) framework in state space model to capture sensor, communication, and system contingencies in modern power systems (MPS). Within this new framework, the paper concentrates on the development of state estimation methods and algorithms to provide reliable state estimation under randomly intermittent and noisy sensor data. MPSs employ diversified…
▽ More
This paper introduces a stochastic hybrid system (SHS) framework in state space model to capture sensor, communication, and system contingencies in modern power systems (MPS). Within this new framework, the paper concentrates on the development of state estimation methods and algorithms to provide reliable state estimation under randomly intermittent and noisy sensor data. MPSs employ diversified measurement devices for monitoring system operations that are subject to random measurement errors and rely on communication networks to transmit data whose channels encounter random packet loss and interruptions. The contingency and noise form two distinct and interacting stochastic processes that have a significant impact on state estimation accuracy and reliability. This paper formulates stochastic hybrid system models for MPSs, introduces coordinated observer design algorithms for state estimation, and establishes their convergence and reliability properties. A further study reveals a fundamental design tradeoff between convergence rates and steady-state error variances. Simulation studies on the IEEE 5-bus system and IEEE 33-bus system are used to illustrate the modeling methods, observer design algorithms, convergence properties, performance evaluations, and impact sensor system selections.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Joint Beam Direction Control and Radio Resource Allocation in Dynamic Multi-beam LEO Satellite Networks
Authors:
Shuo Yuan,
Yaohua Sun,
Mugen Peng,
Renzhi Yuan
Abstract:
Multi-beam low earth orbit (LEO) satellites are emerging as key components in beyond 5G and 6G to provide global coverage and high data rate. To fully unleash the potential of LEO satellite communication, resource management plays a key role. However, the uneven distribution of users, the coupling of multi-dimensional resources, complex inter-beam interference, and time-varying network topologies…
▽ More
Multi-beam low earth orbit (LEO) satellites are emerging as key components in beyond 5G and 6G to provide global coverage and high data rate. To fully unleash the potential of LEO satellite communication, resource management plays a key role. However, the uneven distribution of users, the coupling of multi-dimensional resources, complex inter-beam interference, and time-varying network topologies all impose significant challenges on effective communication resource management. In this paper, we study the joint optimization of beam direction and the allocation of spectrum, time, and power resource in a dynamic multi-beam LEO satellite network. The objective is to improve long-term user sum data rate while taking user fairness into account. Since the concerned resource management problem is mixed-integer non-convex programming, the problem is decomposed into three subproblems, namely beam direction control and time slot allocation, user subchannel assignment, and beam power allocation. Then, these subproblems are solved iteratively by leveraging matching with externalities and successive convex approximation, and the proposed algorithms are analyzed in terms of stability, convergence, and complexity. Extensive simulations are conducted, and the results demonstrate that our proposal can improve the number of served users by up to two times and the sum user data rate by up to 68%, compared to baseline schemes.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Auto-ICell: An Accessible and Cost-Effective Integrative Droplet Microfluidic System for Real-Time Single-Cell Morphological and Apoptotic Analysis
Authors:
Yuanyuan Wei,
Meiai Lin,
Shanhang Luo,
Syed Muhammad Tariq Abbasi,
Liwei Tan,
Guangyao Cheng,
Bijie Bai,
Yi-Ping Ho,
Scott Wu Yuan,
Ho-Pui Ho
Abstract:
The Auto-ICell system, a novel, and cost-effective integrated droplet microfluidic system, is introduced for real-time analysis of single-cell morphology and apoptosis. This system integrates a 3D-printed microfluidic chip with image analysis algorithms, enabling the generation of uniform droplet reactors and immediate image analysis. The system employs a color-based image analysis algorithm in th…
▽ More
The Auto-ICell system, a novel, and cost-effective integrated droplet microfluidic system, is introduced for real-time analysis of single-cell morphology and apoptosis. This system integrates a 3D-printed microfluidic chip with image analysis algorithms, enabling the generation of uniform droplet reactors and immediate image analysis. The system employs a color-based image analysis algorithm in the bright field for droplet content analysis. Meanwhile, in the fluorescence field, cell apoptosis is quantitatively measured through a combination of deep-learning-enabled multiple fluorescent channel analysis and a live/dead cell stain kit. Breast cancer cells are encapsulated within uniform droplets, with diameters ranging from 70 μm to 240 μm, generated at a high throughput of 1,500 droplets per minute. Real-time image analysis results are displayed within 2 seconds on a custom graphical user interface (GUI). The system provides an automatic calculation of the distribution and ratio of encapsulated dyes in the bright field, and in the fluorescent field, cell blebbing and cell circularity are observed and quantified respectively. The Auto-ICell system is non-invasive and provides online detection, offering a robust, time-efficient, user-friendly, and cost-effective solution for single-cell analysis. It significantly enhances the detection throughput of droplet single-cell analysis by reducing setup costs and improving operational performance. This study highlights the potential of the Auto-ICell system in advancing biological research and personalized disease treatment, with promising applications in cell culture, biochemical microreactors, drug carriers, cell-based assays, synthetic biology, and point-of-care diagnostics.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Joint Network Function Placement and Routing Optimization in Dynamic Software-defined Satellite-Terrestrial Integrated Networks
Authors:
Shuo Yuan,
Yaohua Sun,
Mugen Peng
Abstract:
Software-defined satellite-terrestrial integrated networks (SDSTNs) are seen as a promising paradigm for achieving high resource flexibility and global communication coverage. However, low latency service provisioning is still challenging due to the fast variation of network topology and limited onboard resource at low earth orbit satellites. To address this issue, we study service provisioning in…
▽ More
Software-defined satellite-terrestrial integrated networks (SDSTNs) are seen as a promising paradigm for achieving high resource flexibility and global communication coverage. However, low latency service provisioning is still challenging due to the fast variation of network topology and limited onboard resource at low earth orbit satellites. To address this issue, we study service provisioning in SDSTNs via joint optimization of virtual network function (VNF) placement and routing planning with network dynamics characterized by a time-evolving graph. Aiming at minimizing average service latency, the corresponding problem is formulated as an integer nonlinear programming under resource, VNF deployment, and time-slotted flow constraints. Since exhaustive search is intractable, we transform the primary problem into an integer linear programming by involving auxiliary variables and then propose a Benders decomposition based branch-and-cut (BDBC) algorithm. Towards practical use, a time expansion-based decoupled greedy (TEDG) algorithm is further designed with rigorous complexity analysis. Extensive experiments demonstrate the optimality of BDBC algorithm and the low complexity of TEDG algorithm. Meanwhile, it is indicated that they can improve the number of completed services within a configuration period by up to 58% and reduce the average service latency by up to 17% compared to baseline schemes.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Deep Learning Approach for Large-Scale, Real-Time Quantification of Green Fluorescent Protein-Labeled Biological Samples in Microreactors
Authors:
Yuanyuan Wei,
Sai Mu Dalike Abaxi,
Nawaz Mehmood,
Luoquan Li,
Fuyang Qu,
Guangyao Cheng,
Dehua Hu,
Yi-Ping Ho,
Scott Wu Yuan,
Ho-Pui Ho
Abstract:
Absolute quantification of biological samples entails determining expression levels in precise numerical copies, offering enhanced accuracy and superior performance for rare templates. However, existing methodologies suffer from significant limitations: flow cytometers are both costly and intricate, while fluorescence imaging relying on software tools or manual counting is time-consuming and prone…
▽ More
Absolute quantification of biological samples entails determining expression levels in precise numerical copies, offering enhanced accuracy and superior performance for rare templates. However, existing methodologies suffer from significant limitations: flow cytometers are both costly and intricate, while fluorescence imaging relying on software tools or manual counting is time-consuming and prone to inaccuracies. In this study, we have devised a comprehensive deep-learning-enabled pipeline that enables the automated segmentation and classification of GFP (green fluorescent protein)-labeled microreactors, facilitating real-time absolute quantification. Our findings demonstrate the efficacy of this technique in accurately predicting the sizes and occupancy status of microreactors using standard laboratory fluorescence microscopes, thereby providing precise measurements of template concentrations. Notably, our approach exhibits an analysis speed of quantifying over 2,000 microreactors (across 10 images) within remarkably 2.5 seconds, and a dynamic range spanning from 56.52 to 1569.43 copies per micron-liter. Furthermore, our Deep-dGFP algorithm showcases remarkable generalization capabilities, as it can be directly applied to various GFP-labeling scenarios, including droplet-based, microwell-based, and agarose-based biological applications. To the best of our knowledge, this represents the first successful implementation of an all-in-one image analysis algorithm in droplet digital PCR (polymerase chain reaction), microwell digital PCR, droplet single-cell sequencing, agarose digital PCR, and bacterial quantification, without necessitating any transfer learning steps, modifications, or retraining procedures. We firmly believe that our Deep-dGFP technique will be readily embraced by biomedical laboratories and holds potential for further development in related clinical applications.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Learn Single-horizon Disease Evolution for Predictive Generation of Post-therapeutic Neovascular Age-related Macular Degeneration
Authors:
Yuhan Zhang,
Kun Huang,
Mingchao Li,
Songtao Yuan,
Qiang Chen
Abstract:
Most of the existing disease prediction methods in the field of medical image processing fall into two classes, namely image-to-category predictions and image-to-parameter predictions. Few works have focused on image-to-image predictions. Different from multi-horizon predictions in other fields, ophthalmologists prefer to show more confidence in single-horizon predictions due to the low tolerance…
▽ More
Most of the existing disease prediction methods in the field of medical image processing fall into two classes, namely image-to-category predictions and image-to-parameter predictions. Few works have focused on image-to-image predictions. Different from multi-horizon predictions in other fields, ophthalmologists prefer to show more confidence in single-horizon predictions due to the low tolerance of predictive risk. We propose a single-horizon disease evolution network (SHENet) to predictively generate post-therapeutic SD-OCT images by inputting pre-therapeutic SD-OCT images with neovascular age-related macular degeneration (nAMD). In SHENet, a feature encoder converts the input SD-OCT images to deep features, then a graph evolution module predicts the process of disease evolution in high-dimensional latent space and outputs the predicted deep features, and lastly, feature decoder recovers the predicted deep features to SD-OCT images. We further propose an evolution reinforcement module to ensure the effectiveness of disease evolution learning and obtain realistic SD-OCT images by adversarial training. SHENet is validated on 383 SD-OCT cubes of 22 nAMD patients based on three well-designed schemes based on the quantitative and qualitative evaluations. Compared with other generative methods, the generative SD-OCT images of SHENet have the highest image quality. Besides, SHENet achieves the best structure protection and content prediction. Qualitative evaluations also demonstrate that SHENet has a better visual effect than other methods. SHENet can generate post-therapeutic SD-OCT images with both high prediction performance and good image quality, which has great potential to help ophthalmologists forecast the therapeutic effect of nAMD.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Exploring Effective Mask Sampling Modeling for Neural Image Compression
Authors:
Lin Liu,
Mingming Zhao,
Shanxin Yuan,
Wenlong Lyu,
Wengang Zhou,
Houqiang Li,
Yanfeng Wang,
Qi Tian
Abstract:
Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose…
▽ More
Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression. Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage. Moreover, to further reduce channel redundancy, we propose the Learnable Channel Mask Module (LCMM) and the Learnable Channel Completion Module (LCCM). Our plug-and-play CMSM, LCMM, LCCM modules can apply to both CNN-based and Transformer-based architectures, significantly reduce the computational cost, and improve the quality of images. Experiments on the public Kodak and Tecnick datasets demonstrate that our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing
Authors:
Jianfei Yang,
He Huang,
Yunjiao Zhou,
Xinyan Chen,
Yuecong Xu,
Shenghai Yuan,
Han Zou,
Chris Xiaoxuan Lu,
Lihua Xie
Abstract:
4D human perception plays an essential role in a myriad of applications, such as home automation and metaverse avatar simulation. However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for devi…
▽ More
4D human perception plays an essential role in a myriad of applications, such as home automation and metaverse avatar simulation. However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for device-free human sensing. In this paper, we propose MM-Fi, the first multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation action categories, to bridge the gap between wireless sensing and high-level human perception tasks. MM-Fi consists of over 320k synchronized frames of five modalities from 40 human subjects. Various annotations are provided to support potential sensing tasks, e.g., human pose estimation and action recognition. Extensive experiments have been conducted to compare the sensing capacity of each or several modalities in terms of multiple tasks. We envision that MM-Fi can contribute to wireless sensing research with respect to action recognition, human pose estimation, multi-modal learning, cross-modal supervision, and interdisciplinary healthcare research.
△ Less
Submitted 24 September, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
Segmentation of Aortic Vessel Tree in CT Scans with Deep Fully Convolutional Networks
Authors:
Shaofeng Yuan,
Feng Yang
Abstract:
Automatic and accurate segmentation of aortic vessel tree (AVT) in computed tomography (CT) scans is crucial for early detection, diagnosis and prognosis of aortic diseases, such as aneurysms, dissections and stenosis. However, this task remains challenges, due to the complexity of aortic vessel tree and amount of CT angiography data. In this technical report, we use two-stage fully convolutional…
▽ More
Automatic and accurate segmentation of aortic vessel tree (AVT) in computed tomography (CT) scans is crucial for early detection, diagnosis and prognosis of aortic diseases, such as aneurysms, dissections and stenosis. However, this task remains challenges, due to the complexity of aortic vessel tree and amount of CT angiography data. In this technical report, we use two-stage fully convolutional networks (FCNs) to automatically segment AVT in CTA scans from multiple centers. Specifically, we firstly adopt a 3D FCN with U-shape network architecture to segment AVT in order to produce topology attention and accelerate medical image analysis pipeline. And then another one 3D FCN is trained to segment branches of AVT along the pseudo-centerline of AVT. In the 2023 MICCAI Segmentation of the Aorta (SEG.A.) Challenge , the reported method was evaluated on the public dataset of 56 cases. The resulting Dice Similarity Coefficient (DSC) is 0.920, Jaccard Similarity Coefficient (JSC) is 0.861, Recall is 0.922, and Precision is 0.926 on a 5-fold random split of training and validation set.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
TelecomTM: A Fine-Grained and Ubiquitous Traffic Monitoring System Using Pre-Existing Telecommunication Fiber-Optic Cables as Sensors
Authors:
Jingxiao Liu,
Siyuan Yuan,
Yiwen Dong,
Biondo Biondi,
Hae Young Noh
Abstract:
We introduce the TelecomTM system that uses pre-existing telecommunication fiber-optic cables as virtual strain sensors to sense vehicle-induced ground vibrations for fine-grained and ubiquitous traffic monitoring and characterization. Here we call it a virtual sensor because it is a software-based representation of a physical sensor. Due to the extensively installed telecommunication fiber-optic…
▽ More
We introduce the TelecomTM system that uses pre-existing telecommunication fiber-optic cables as virtual strain sensors to sense vehicle-induced ground vibrations for fine-grained and ubiquitous traffic monitoring and characterization. Here we call it a virtual sensor because it is a software-based representation of a physical sensor. Due to the extensively installed telecommunication fiber-optic cables at the roadside, our system using redundant dark fibers enables to monitor traffic at low cost with low maintenance. Many existing traffic monitoring approaches use cameras, piezoelectric sensors, and smartphones, but they are limited due to privacy concerns and/or deployment requirements. Previous studies attempted to use telecommunication cables for traffic monitoring, but they were only exploratory and limited to simple tasks at a coarse granularity, e.g., vehicle detection, due to their hardware constraints and real-world challenges. In particular, those challenges are 1) unknown and heterogeneous properties of virtual sensors and 2) large and complex noise conditions. To this end, our TelecomTM system first characterizes the geographic location and analyzes the signal pattern of each virtual sensor through driving tests. We then develop a spatial-domain Bayesian filtering and smoothing algorithm to detect, track, and characterize each vehicle. Our approach uses the spatial dependency of multiple virtual sensors and Newton's laws of motion to combine the distributed sensor data to reduce uncertainties in vehicle detection and tracking. In our real-world evaluation on a two-way traffic road with 1120 virtual sensors, TelecomTM achieved 90.18% vehicle detection accuracy, 27$\times$ and 5$\times$ error reduction for vehicle position and speed tracking compared to a baseline method, and $\pm$3.92% and $\pm$11.98% percent error for vehicle wheelbase and weight estimation, respectively.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Tracking performance of PID for nonlinear stochastic systems
Authors:
Cheng Zhao,
Shuo Yuan
Abstract:
In this paper, we will consider a class of continuous-time stochastic control systems with both unknown nonlinear structure and unknown disturbances, and investigate the capability of the classical proportional-integral-derivative(PID) controller in tracking time-varying reference signals. First, under some suitable conditions on system nonlinear functions, reference signals, and unknown disturban…
▽ More
In this paper, we will consider a class of continuous-time stochastic control systems with both unknown nonlinear structure and unknown disturbances, and investigate the capability of the classical proportional-integral-derivative(PID) controller in tracking time-varying reference signals. First, under some suitable conditions on system nonlinear functions, reference signals, and unknown disturbances, we will show that PID controllers can be designed to globally stabilize such systems and ensure the boundedness of the tracking error. Analytic design formulae for PID gain matrices are also provided, which only involve some prior knowledge of the partial derivatives of system structural nonlinear functions. Besides, it will be shown that the steady-state tracking error hinges on three critical factors: i) the change rate of reference signals and external disturbances; ii) the intensity of random noises; iii) the selection of PID gains, and can be made arbitrarily small by choosing PID gains suitably large. Finally, by introducing a desired transient process which is shaped from the reference signal, we will present a new PID tuning rule, which can guarantee both nice steady-state and superior transient control performances.
△ Less
Submitted 18 March, 2023;
originally announced March 2023.
-
From Audio to Symbolic Encoding
Authors:
Shenli Yuan,
Lingjie Kong,
Jiushuang Guo
Abstract:
Automatic music transcription (AMT) aims to convert raw audio to symbolic music representation. As a fundamental problem of music information retrieval (MIR), AMT is considered a difficult task even for trained human experts due to overlap of multiple harmonics in the acoustic signal. On the other hand, speech recognition, as one of the most popular tasks in natural language processing, aims to tr…
▽ More
Automatic music transcription (AMT) aims to convert raw audio to symbolic music representation. As a fundamental problem of music information retrieval (MIR), AMT is considered a difficult task even for trained human experts due to overlap of multiple harmonics in the acoustic signal. On the other hand, speech recognition, as one of the most popular tasks in natural language processing, aims to translate human spoken language to texts. Based on the similar nature of AMT and speech recognition (as they both deal with tasks of translating audio signal to symbolic encoding), this paper investigated whether a generic neural network architecture could possibly work on both tasks. In this paper, we introduced our new neural network architecture built on top of the current state-of-the-art Onsets and Frames, and compared the performances of its multiple variations on AMT task. We also tested our architecture with the task of speech recognition. For AMT, our models were able to produce better results compared to the model trained using the state-of-art architecture; however, although similar architecture was able to be trained on the speech recognition task, it did not generate very ideal result compared to other task-specific models.
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
CSDN: Combing Shallow and Deep Networks for Accurate Real-time Segmentation of High-definition Intravascular Ultrasound Images
Authors:
Shaofeng Yuan,
Feng Yang
Abstract:
Intravascular ultrasound (IVUS) is the preferred modality for capturing real-time and high resolution cross-sectional images of the coronary arteries, and evaluating the stenosis. Accurate and real-time segmentation of IVUS images involves the delineation of lumen and external elastic membrane borders. In this paper, we propose a two-stream framework for efficient segmentation of 60 MHz high resol…
▽ More
Intravascular ultrasound (IVUS) is the preferred modality for capturing real-time and high resolution cross-sectional images of the coronary arteries, and evaluating the stenosis. Accurate and real-time segmentation of IVUS images involves the delineation of lumen and external elastic membrane borders. In this paper, we propose a two-stream framework for efficient segmentation of 60 MHz high resolution IVUS images. It combines shallow and deep networks, namely, CSDN. The shallow network with thick channels focuses to extract low-level details. The deep network with thin channels takes charge of learning high-level semantics. Treating the above information separately enables learning a model to achieve high accuracy and high efficiency for accurate real-time segmentation. To further improve the segmentation performance, mutual guided fusion module is used to enhance and fuse both different types of feature representation. The experimental results show that our CSDN accomplishes a good trade-off between analysis speed and segmentation accuracy.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Spatial Deep Deconvolution U-Net for Traffic Analyses with Distributed Acoustic Sensing
Authors:
Siyuan Yuan,
Martijn van den Ende,
Jingxiao Liu,
Hae Young Noh,
Robert Clapp,
Cédric Richard,
Biondo Biondi
Abstract:
Distributed Acoustic Sensing (DAS) that transforms city-wide fiber-optic cables into a large-scale strain sensing array has shown the potential to revolutionize urban traffic monitoring by providing a fine-grained, scalable, and low-maintenance monitoring solution. However, the real-world application of DAS is hindered by challenges such as noise contamination and interference among closely travel…
▽ More
Distributed Acoustic Sensing (DAS) that transforms city-wide fiber-optic cables into a large-scale strain sensing array has shown the potential to revolutionize urban traffic monitoring by providing a fine-grained, scalable, and low-maintenance monitoring solution. However, the real-world application of DAS is hindered by challenges such as noise contamination and interference among closely traveling cars. In response, we introduce a self-supervised U-Net model that can suppress background noise and compress car-induced DAS signals into high-resolution pulses through spatial deconvolution. Our work extends recent research by introducing three key advancements. Firstly, we perform a comprehensive resolution analysis of DAS-recorded traffic signals, laying a theoretical foundation for our approach. Secondly, we incorporate space-domain vehicle wavelets into our U-Net model, enabling consistent high-resolution outputs regardless of vehicle speed variations. Finally, we employ L-2 norm regularization in the loss function, enhancing our model's sensitivity to weaker signals from vehicles in remote traffic lanes. We evaluate the effectiveness and robustness of our method through field recordings under different traffic conditions and various driving speeds. Our results show that our method can enhance the spatial-temporal resolution and better resolve closely traveling cars. The spatial deconvolution U-Net model also enables the characterization of large-size vehicles to identify axle numbers and estimate the vehicle length. Monitoring large-size vehicles also benefits imaging deep earth by leveraging the surface waves induced by the dynamic vehicle-road interaction.
△ Less
Submitted 27 June, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Learning Visual Representation of Underwater Acoustic Imagery Using Transformer-Based Style Transfer Method
Authors:
Xiaoteng Zhou,
Changli Yu,
Shihao Yuan,
Xin Yuan,
Hangchi Yu,
Citong Luo
Abstract:
Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representat…
▽ More
Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representation of underwater acoustic imageries, which takes a transformer-based style transfer model as the main body. It could replace the low-level texture features of optical images with the visual features of underwater acoustic imageries while preserving their raw high-level semantic content. The proposed framework could fully use the rich optical image dataset to generate a pseudo-acoustic image dataset and use it as the initial sample to train the underwater acoustic target recognition model. The experiments select the dual-frequency identification sonar (DIDSON) as the underwater acoustic data source and also take fish, the most common marine creature, as the research subject. Experimental results show that the proposed method could generate high-quality and high-fidelity pseudo-acoustic samples, achieve the purpose of acoustic data enhancement and provide support for the underwater acoustic-optical images domain transfer research.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
The Contribution of Human Body Capacitance/Body-Area Electric Field To Individual and Collaborative Activity Recognition
Authors:
Sizhen Bian,
Vitor Fortes Rey,
Siyu Yuan,
Paul Lukowicz
Abstract:
The current dominated wearable body motion sensor is IMU. This work presented an alternative wearable motion-sensing approach: human body capacitance (HBC, also commonly defined as body-area electric field). While being less robust in tracking the posture and trajectory, HBC has two properties that make it an attractive. First, the deployment of the sensing node on the being tracked body part is n…
▽ More
The current dominated wearable body motion sensor is IMU. This work presented an alternative wearable motion-sensing approach: human body capacitance (HBC, also commonly defined as body-area electric field). While being less robust in tracking the posture and trajectory, HBC has two properties that make it an attractive. First, the deployment of the sensing node on the being tracked body part is not a requirement for HBC sensing approach. Second, HBC is sensitive to the body's interaction with its surroundings, including both touching and being in the immediate proximity of people and objects. We first described the sensing principle for HBC, sensor architecture and implementation, and methods for evaluation. We then presented two case studies demonstrating the usefulness of HBC as a complement/alternative to IMUs. First, we explored the exercise recognition and repetition counting of seven machine-free leg-only exercises and eleven general gym workouts with the signal source of HBC and IMU. The HBC sensing shows significant advantages over the IMU signals in classification(0.89 vs 0.78 in F-score) and counting(0.982 vs 0.938 in accuracy) of the leg-only exercises. For the general gym workouts, HBC only shows recognition improvement for certain workouts like adductor where legs alone complete the movement. And it also supplies better results over the IMU for workouts counting(0.800 vs. 0.756 when wearing the sensors on the wrist). In the second case, we tried to recognize actions related to manipulating objects and physical collaboration between users by using a wrist-worn HBC sensing unit. We detected collaboration between the users with 0.69 F-score when receiving data from a single user and 0.78 when receiving data from both users. The capacitive sensor can improve the recognition of collaborative activities with an F-score over a single wrist accelerometer approach by 16\%.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Electromagnetic Effective-Degree-of-Freedom Limit of a MIMO System in 2-D Inhomogeneous Environment
Authors:
Shuai S. A. Yuan,
Zi He,
Sheng Sun,
Xiaoming Chen,
Chongwen Huang,
Wei E. I. Sha
Abstract:
Compared with a single-input-single-output (SISO) wireless communication system, the benefit of multiple-input-multiple-output (MIMO) technology originates from its extra degree of freedom (DOF), also referred as scattering channels or spatial electromagnetic (EM) modes, brought by spatial multiplexing. When the physical sizes of transmitting and receiving arrays are fixed, and there are sufficien…
▽ More
Compared with a single-input-single-output (SISO) wireless communication system, the benefit of multiple-input-multiple-output (MIMO) technology originates from its extra degree of freedom (DOF), also referred as scattering channels or spatial electromagnetic (EM) modes, brought by spatial multiplexing. When the physical sizes of transmitting and receiving arrays are fixed, and there are sufficient antennas (typically with half-wavelength spacings), the DOF limit is only dependent on the propagating environment. Analytical methods can be used to estimate this limit in free space, and some approximate models are adopted in stochastic environments, such as Clarke's model and Ray-tracing methods. However, this DOF limit in an certain inhomogeneous environment has not been well discussed with rigorous full-wave numerical methods. In this work, volume integral equation (VIE) is implemented for investigating the limit of MIMO effective degree of freedom (EDOF) in three representative two-dimensional (2-D) inhomogeneous environments. Moreover, we clarify the relation between the performance of a MIMO system and the scattering characteristics of its propagating environment.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Extraction of Pulmonary Airway in CT Scans Using Deep Fully Convolutional Networks
Authors:
Shaofeng Yuan
Abstract:
Accurate, automatic and complete extraction of pulmonary airway in medical images plays an important role in analyzing thoracic CT volumes such as lung cancer detection, chronic obstructive pulmonary disease (COPD), and bronchoscopic-assisted surgery navigation. However, this task remains challenges, due to the complex tree-like structure of the airways. In this technical report, we use two-stage…
▽ More
Accurate, automatic and complete extraction of pulmonary airway in medical images plays an important role in analyzing thoracic CT volumes such as lung cancer detection, chronic obstructive pulmonary disease (COPD), and bronchoscopic-assisted surgery navigation. However, this task remains challenges, due to the complex tree-like structure of the airways. In this technical report, we use two-stage fully convolutional networks (FCNs) to automatically segment pulmonary airway in thoracic CT scans from multi-sites. Specifically, we firstly adopt a 3D FCN with U-shape network architecture to segment pulmonary airway in a coarse resolution in order to accelerate medical image analysis pipeline. And then another one 3D FCN is trained to segment pulmonary airway in a fine resolution. In the 2022 MICCAI Multi-site Multi-domain Airway Tree Modeling (ATM) Challenge, the reported method was evaluated on the public training set of 300 cases and independent private validation set of 50 cases. The resulting Dice Similarity Coefficient (DSC) is 0.914 $\pm$ 0.040, False Negative Error (FNE) is 0.079 $\pm$ 0.042, and False Positive Error (FPE) is 0.090 $\pm$ 0.066 on independent private validation set.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
SJ-HD^2R: Selective Joint High Dynamic Range and Denoising Imaging for Dynamic Scenes
Authors:
Wei Li,
Shuai Xiao,
Tianhong Dai,
Shanxin Yuan,
Tao Wang,
Cheng Li,
Fenglong Song
Abstract:
Ghosting artifacts, motion blur, and low fidelity in highlight are the main challenges in High Dynamic Range (HDR) imaging from multiple Low Dynamic Range (LDR) images. These issues come from using the medium-exposed image as the reference frame in previous methods. To deal with them, we propose to use the under-exposed image as the reference to avoid these issues. However, the heavy noise in dark…
▽ More
Ghosting artifacts, motion blur, and low fidelity in highlight are the main challenges in High Dynamic Range (HDR) imaging from multiple Low Dynamic Range (LDR) images. These issues come from using the medium-exposed image as the reference frame in previous methods. To deal with them, we propose to use the under-exposed image as the reference to avoid these issues. However, the heavy noise in dark regions of the under-exposed image becomes a new problem. Therefore, we propose a joint HDR and denoising pipeline, containing two sub-networks: (i) a pre-denoising network (PreDNNet) to adaptively denoise input LDRs by exploiting exposure priors; (ii) a pyramid cascading fusion network (PCFNet), introducing an attention mechanism and cascading structure in a multi-scale manner. To further leverage these two paradigms, we propose a selective and joint HDR and denoising (SJ-HD$^2$R) imaging framework, utilizing scenario-specific priors to conduct the path selection with an accuracy of more than 93.3$\%$. We create the first joint HDR and denoising benchmark dataset, which contains a variety of challenging HDR and denoising scenes and supports the switching of the reference image. Extensive experiment results show that our method achieves superior performance to previous methods.
△ Less
Submitted 3 November, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
Vibration-Based Bridge Health Monitoring using Telecommunication Cables
Authors:
Jingxiao Liu,
Siyuan Yuan,
Bin Luo,
Biondo Biondi,
Hae Young Noh
Abstract:
Bridge Health Monitoring (BHM) enables early damage detection of bridges and is thus critical for avoiding more severe damages that might result in major financial and human losses. However, conventional BHM systems require dedicated sensors on bridges, which is costly to install and maintain and hard to scale up. To overcome this challenge, we introduce a new system that uses existing telecommuni…
▽ More
Bridge Health Monitoring (BHM) enables early damage detection of bridges and is thus critical for avoiding more severe damages that might result in major financial and human losses. However, conventional BHM systems require dedicated sensors on bridges, which is costly to install and maintain and hard to scale up. To overcome this challenge, we introduce a new system that uses existing telecommunication cables for Distributed Acoustic Sensing (DAS) to collect bridge dynamic strain responses. In addition, we develop a two-module physics-guided system identification method to extract bridge damage-sensitive information (e.g., natural frequencies and mode shapes) from noisy DAS data by constraining strain and displacement mode shapes by bridge dynamics. This approach does not require installation and maintenance of dedicated sensors on bridges. We evaluate our system with field experiments on a concrete bridge with fiber cable running in a conduit under the deck. Our system successfully identified modal frequencies and reconstructed meter-scale mode shapes.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation
Authors:
Zui Chen,
Yansen Jing,
Shengcheng Yuan,
Yifei Xu,
Jian Wu,
Hang Zhao
Abstract:
Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i.e.: the synthesizer parame…
▽ More
Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i.e.: the synthesizer parameters estimation problem. We proposed a multi-modal deep-learning-based pipeline Sound2Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
△ Less
Submitted 28 July, 2022; v1 submitted 6 May, 2022;
originally announced May 2022.
-
Improved singing voice separation with chromagram-based pitch-aware remixing
Authors:
Siyuan Yuan,
Zhepei Wang,
Umut Isik,
Ritwik Giri,
Jean-Marc Valin,
Michael M. Goodwin,
Arvindh Krishnaswamy
Abstract:
Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixing have been shown to make better use of existing data and mildly improve model performance. We propose a novel data augmentation technique, chromagram…
▽ More
Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixing have been shown to make better use of existing data and mildly improve model performance. We propose a novel data augmentation technique, chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed. By performing controlled experiments in both supervised and semi-supervised settings, we demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Research on Flexibility Margin of Electric-Hydrogen Coupling Energy Block Based on Model Predictive Control
Authors:
Zijiao Han,
Shun Yuan,
Yannan Dong,
Shaohua Ma,
Yudong Bian,
Xinyu Mao
Abstract:
Hydrogen energy plays an important role in the transformation of low-carbon energy, and electric hydrogen coupling will become a typical energy scenario. Aiming at the operation flexibility of low-carbon electricity hydrogen coupling system with high proportion of wind power and photovoltaic, this paper studies the flexibility margin of electricity hydrogen coupling energy block based on model pre…
▽ More
Hydrogen energy plays an important role in the transformation of low-carbon energy, and electric hydrogen coupling will become a typical energy scenario. Aiming at the operation flexibility of low-carbon electricity hydrogen coupling system with high proportion of wind power and photovoltaic, this paper studies the flexibility margin of electricity hydrogen coupling energy block based on model predictive control (MPC). By analyzing the power exchange characteristics of heterogeneous energy, the homogenization models of various heterogeneous energy sources are established. According to the analysis of power system flexibility margin, three dimensions of flexibility margin evaluation indexes are defined from the dimension of system operation, and an electricity hydrogen coupling energy block scheduling model is established. The model predictive control algorithm is used to optimize the power balance operation of the electro hydrogen coupling energy block, and the flexibility margin of the energy block is quantitatively analyzed and calculated. Through the example analysis, it is verified that the calculation method proposed in this paper can not only realize the on-line power balance optimization of electric hydrogen coupling energy block, but also effectively quantify the operation flexibility margin of electric hydrogen coupling energy block.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
NTU VIRAL: A Visual-Inertial-Ranging-Lidar Dataset, From an Aerial Vehicle Viewpoint
Authors:
Thien-Minh Nguyen,
Shenghai Yuan,
Muqing Cao,
Yang Lyu,
Thien Hoang Nguyen,
Lihua Xie
Abstract:
In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous dr…
▽ More
In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous driving and ground robots. Thus, to fill in this gap, we conduct a data collection exercise on an aerial platform equipped with an extensive and unique set of sensors: two 3D lidars, two hardware-synchronized global-shutter cameras, multiple Inertial Measurement Units (IMUs), and especially, multiple Ultra-wideband (UWB) ranging units. The comprehensive sensor suite resembles that of an autonomous driving car, but features distinct and challenging characteristics of aerial operations. We record multiple datasets in several challenging indoor and outdoor conditions. Calibration results and ground truth from a high-accuracy laser tracker are also included in each package. All resources can be accessed via our webpage https://ntu-aris.github.io/ntu_viral_dataset.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Adversarial Jamming for a More Effective Constellation Attack
Authors:
Haidong Xie,
Yizhou Xu,
Yuanqing Chen,
Nan Ji,
Shuai Yuan,
Naijin Liu,
Xueshuang Xiang
Abstract:
The common jamming mode in wireless communication is band barrage jamming, which is controllable and difficult to resist. Although this method is simple to implement, it is obviously not the best jamming waveform. Therefore, based on the idea of adversarial examples, we propose the adversarial jamming waveform, which can independently optimize and find the best jamming waveform. We attack QAM with…
▽ More
The common jamming mode in wireless communication is band barrage jamming, which is controllable and difficult to resist. Although this method is simple to implement, it is obviously not the best jamming waveform. Therefore, based on the idea of adversarial examples, we propose the adversarial jamming waveform, which can independently optimize and find the best jamming waveform. We attack QAM with adversarial jamming and find that the optimal jamming waveform is equivalent to the amplitude and phase between the nearest constellation points. Furthermore, by verifying the jamming performance on a hardware platform, it is shown that our method significantly improves the bit error rate compared to other methods.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Electromagnetic Effective Degree of Freedom of a MIMO System in Free Space
Authors:
Shuai S. A. Yuan,
Zi He,
Xiaoming Chen,
Chongwen Huang,
Wei E. I. Sha
Abstract:
Effective degree of freedom (EDOF) of a multiple-input-multiple-output (MIMO) system represents its equivalent number of independent single-input-single-output (SISO) systems, which directly characterizes the communication performance. Traditional EDOF only considers single polarization, where the full polarized components degrade into two independent transverse components under the far-field appr…
▽ More
Effective degree of freedom (EDOF) of a multiple-input-multiple-output (MIMO) system represents its equivalent number of independent single-input-single-output (SISO) systems, which directly characterizes the communication performance. Traditional EDOF only considers single polarization, where the full polarized components degrade into two independent transverse components under the far-field approximation. However, the traditional model is not applicable to complex scenarios especially for the near-field region. Based on an electromagnetic (EM) channel model built from the dyadic Green's function, we first calculate the EM EDOF to estimate the performance of an arbitrary MIMO system with full polarizations in free space. Then, we clarify the relations between the limit of EDOF and the optimal number of sources/receivers. Finally, potential benefits of near-field MIMO communications are demonstrated with the EM EDOF, in which the contribution of the longitudinally polarized source is taken into account. This work establishes a fundamental EM framework for MIMO wireless communications.
△ Less
Submitted 1 January, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Joint Sensing, Communication, and Computation Resource Allocation for Cooperative Perception in Fog-Based Vehicular Networks
Authors:
Xinran Zhang,
Zhimin He,
Yaohua Sun,
Shuo Yuan,
Mugen Peng
Abstract:
To enlarge the perception range and reliability of individual autonomous vehicles, cooperative perception has been received much attention. However, considering the high volume of shared messages, limited bandwidth and computation resources in vehicular networks become bottlenecks. In this paper, we investigate how to balance the volume of shared messages and constrained resources in fog-based veh…
▽ More
To enlarge the perception range and reliability of individual autonomous vehicles, cooperative perception has been received much attention. However, considering the high volume of shared messages, limited bandwidth and computation resources in vehicular networks become bottlenecks. In this paper, we investigate how to balance the volume of shared messages and constrained resources in fog-based vehicular networks. To this end, we first characterize sum satisfaction of cooperative perception taking account of its spatial-temporal value and latency performance. Next, the sensing block message, communication resource block, and computation resource are jointly allocated to maximize the sum satisfaction of cooperative perception, while satisfying the maximum latency and sojourn time constraints of vehicles. Owing to its non-convexity, we decouple the original problem into two separate sub-problems and devise corresponding solutions. Simulation results demonstrate that our proposed scheme can effectively boost the sum satisfaction of cooperative perception compared with existing baselines.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Wavelet-Based Network For High Dynamic Range Imaging
Authors:
Tianhong Dai,
Wei Li,
Xilei Cao,
Jianzhuang Liu,
Xu Jia,
Ales Leonardis,
Youliang Yan,
Shanxin Yuan
Abstract:
High dynamic range (HDR) imaging from multiple low dynamic range (LDR) images has been suffering from ghosting artifacts caused by scene and objects motion. Existing methods, such as optical flow based and end-to-end deep learning based solutions, are error-prone either in detail restoration or ghosting artifacts removal. Comprehensive empirical evidence shows that ghosting artifacts caused by lar…
▽ More
High dynamic range (HDR) imaging from multiple low dynamic range (LDR) images has been suffering from ghosting artifacts caused by scene and objects motion. Existing methods, such as optical flow based and end-to-end deep learning based solutions, are error-prone either in detail restoration or ghosting artifacts removal. Comprehensive empirical evidence shows that ghosting artifacts caused by large foreground motion are mainly low-frequency signals and the details are mainly high-frequency signals. In this work, we propose a novel frequency-guided end-to-end deep neural network (FHDRNet) to conduct HDR fusion in the frequency domain, and Discrete Wavelet Transform (DWT) is used to decompose inputs into different frequency bands. The low-frequency signals are used to avoid specific ghosting artifacts, while the high-frequency signals are used for preserving details. Using a U-Net as the backbone, we propose two novel modules: merging module and frequency-guided upsampling module. The merging module applies the attention mechanism to the low-frequency components to deal with the ghost caused by large foreground motion. The frequency-guided upsampling module reconstructs details from multiple frequency-specific components with rich details. In addition, a new RAW dataset is created for training and evaluating multi-frame HDR imaging algorithms in the RAW domain. Extensive experiments are conducted on public datasets and our RAW dataset, showing that the proposed FHDRNet achieves state-of-the-art performance.
△ Less
Submitted 7 November, 2023; v1 submitted 3 August, 2021;
originally announced August 2021.
-
Multi-Contrast MRI Super-Resolution via a Multi-Stage Integration Network
Authors:
Chun-Mei Feng,
Huazhu Fu,
Shuhao Yuan,
Yong Xu
Abstract:
Super-resolution (SR) plays a crucial role in improving the image quality of magnetic resonance imaging (MRI). MRI produces multi-contrast images and can provide a clear display of soft tissues. However, current super-resolution methods only employ a single contrast, or use a simple multi-contrast fusion mechanism, ignoring the rich relations among different contrasts, which are valuable for impro…
▽ More
Super-resolution (SR) plays a crucial role in improving the image quality of magnetic resonance imaging (MRI). MRI produces multi-contrast images and can provide a clear display of soft tissues. However, current super-resolution methods only employ a single contrast, or use a simple multi-contrast fusion mechanism, ignoring the rich relations among different contrasts, which are valuable for improving SR. In this work, we propose a multi-stage integration network (i.e., MINet) for multi-contrast MRI SR, which explicitly models the dependencies between multi-contrast images at different stages to guide image SR. In particular, our MINet first learns a hierarchical feature representation from multiple convolutional stages for each of different-contrast image. Subsequently, we introduce a multi-stage integration module to mine the comprehensive relations between the representations of the multi-contrast images. Specifically, the module matches each representation with all other features, which are integrated in terms of their similarities to obtain an enriched representation. Extensive experiments on fastMRI and real-world clinical datasets demonstrate that 1) our MINet outperforms state-of-the-art multi-contrast SR methods in terms of various metrics and 2) our multi-stage integration module is able to excavate complex interactions among multi-contrast features at different stages, leading to improved target-image quality.
△ Less
Submitted 5 July, 2021; v1 submitted 19 May, 2021;
originally announced May 2021.
-
RadioNet: Transformer based Radio Map Prediction Model For Dense Urban Environments
Authors:
Yu Tian,
Shuai Yuan,
Weisheng Chen,
Naijin Liu
Abstract:
Radio Map Prediction (RMP), aiming at estimating coverage of radio wave, has been widely recognized as an enabling technology for improving radio spectrum efficiency. However, fast and reliable radio map prediction can be very challenging due to the complicated interaction between radio waves and the environment. In this paper, a novel Transformer based deep learning model termed as RadioNet is pr…
▽ More
Radio Map Prediction (RMP), aiming at estimating coverage of radio wave, has been widely recognized as an enabling technology for improving radio spectrum efficiency. However, fast and reliable radio map prediction can be very challenging due to the complicated interaction between radio waves and the environment. In this paper, a novel Transformer based deep learning model termed as RadioNet is proposed for radio map prediction in urban scenarios. In addition, a novel Grid Embedding technique is proposed to substitute the original Position Embedding in Transformer to better anchor the relative position of the radiation source, destination and environment. The effectiveness of proposed method is verified on an urban radio wave propagation dataset. Compared with the SOTA model on RMP task, RadioNet reduces the validation loss by 27.3\%, improves the prediction reliability from 90.9\% to 98.9\%. The prediction speed is increased by 4 orders of magnitude, when compared with ray-tracing based method. We believe that the proposed method will be beneficial to high-efficiency wireless communication, real-time radio visualization, and even high-speed image rendering.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Mean Field MARL Based Bandwidth Negotiation Method for Massive Devices Spectrum Sharing
Authors:
Tianhao Li,
Yu Tian,
Shuai Yuan,
Naijin Liu
Abstract:
In this paper, a novel bandwidth negotiation mechanism is proposed for massive devices wireless spectrum sharing, in which individual device locally negotiates bandwidth usage with neighbor devices and globally optimal spectrum utilization is achieved through distributed decision-making. Since only sparse feedback is needed, the proposed mechanism can greatly reduce the signaling overhead. In orde…
▽ More
In this paper, a novel bandwidth negotiation mechanism is proposed for massive devices wireless spectrum sharing, in which individual device locally negotiates bandwidth usage with neighbor devices and globally optimal spectrum utilization is achieved through distributed decision-making. Since only sparse feedback is needed, the proposed mechanism can greatly reduce the signaling overhead. In order to solve the distributed optimization problem when massive devices coexist, mean field multi-agent reinforcement learning (MF-MARL) based bandwidth decision algorithm is proposed, which allow device make globally optimal decision leveraging only neighborhood observation. In simulation, distributed bandwidth negotiation between 1000 devices is demonstrated and the spectrum utilization rate is above 95%. The proposed method is beneficial to reduce spectrum conflicts, increase spectrum utilization for massive devices spectrum sharing.
△ Less
Submitted 30 April, 2021;
originally announced April 2021.
-
MILIOM: Tightly Coupled Multi-Input Lidar-Inertia Odometry and Mapping
Authors:
Thien-Minh Nguyen,
Shenghai Yuan,
Muqing Cao,
Yang Lyu,
Thien Hoang Nguyen,
Lihua Xie
Abstract:
In this letter we investigate a tightly coupled Lidar-Inertia Odometry and Mapping (LIOM) scheme, with the capability to incorporate multiple lidars with complementary field of view (FOV). In essence, we devise a time-synchronized scheme to combine extracted features from separate lidars into a single pointcloud, which is then used to construct a local map and compute the feature-map matching (FMM…
▽ More
In this letter we investigate a tightly coupled Lidar-Inertia Odometry and Mapping (LIOM) scheme, with the capability to incorporate multiple lidars with complementary field of view (FOV). In essence, we devise a time-synchronized scheme to combine extracted features from separate lidars into a single pointcloud, which is then used to construct a local map and compute the feature-map matching (FMM) coefficients. These coefficients, along with the IMU preinteration observations, are then used to construct a factor graph that will be optimized to produce an estimate of the sliding window trajectory. We also propose a key frame-based map management strategy to marginalize certain poses and pointclouds in the sliding window to grow a global map, which is used to assemble the local map in the later stage. The use of multiple lidars with complementary FOV and the global map ensures that our estimate has low drift and can sustain good localization in situations where single lidar use gives poor result, or even fails to work. Multi-thread computation implementations are also adopted to fractionally cut down the computation time and ensure real-time performance. We demonstrate the efficacy of our system via a series of experiments on public datasets collected from an aerial vehicle.
△ Less
Submitted 5 July, 2021; v1 submitted 24 April, 2021;
originally announced April 2021.
-
Noise Attention based Spectrum Anomaly Detection Method for Unauthorized Bands
Authors:
Jing Xu,
Yu Tian,
Shuai Yuan,
Naijin Liu
Abstract:
Spectrum anomaly detection is of great importance in wireless communication to secure safety and improve spectrum efficiency. However, spectrum anomaly detection faces many difficulties, especially in unauthorized frequency bands. For example, the composition of unauthorized frequency bands is very complex and the abnormal usage patterns are unknown in prior. In this paper, a noise attention metho…
▽ More
Spectrum anomaly detection is of great importance in wireless communication to secure safety and improve spectrum efficiency. However, spectrum anomaly detection faces many difficulties, especially in unauthorized frequency bands. For example, the composition of unauthorized frequency bands is very complex and the abnormal usage patterns are unknown in prior. In this paper, a noise attention method is proposed for unsupervised spectrum anomaly detection in unauthorized bands. First of all, we theoretically prove that the anomalies in unauthorized bands will raise the noise floor of spectrogram after VAE reconstruction. Then, we introduce a novel anomaly metric named as noise attention score to more effectively capture spectrum anomaly. The effectiveness of the proposed method is experimentally verified in 2.4 GHz ISM band. Leveraging the noise attention score, the AUC metric of anomaly detection is increased by 0.193. The proposed method is beneficial to reliably detecting abnormal spectrum while keeping low false alarm rate.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning
Authors:
Siyang Yuan,
Pengyu Cheng,
Ruiyi Zhang,
Weituo Hao,
Zhe Gan,
Lawrence Carin
Abstract:
Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and pre-known speakers. However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a ch…
▽ More
Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and pre-known speakers. However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a challenging problem. We propose a novel zero-shot voice transfer method via disentangled representation learning. The proposed method first encodes speaker-related style and voice content of each input voice into separated low-dimensional embedding spaces, and then transfers to a new voice by combining the source content embedding and target style embedding through a decoder. With information-theoretic guidance, the style and content embedding spaces are representative and (ideally) independent of each other. On real-world VCTK datasets, our method outperforms other baselines and obtains state-of-the-art results in terms of transfer accuracy and voice naturalness for voice style transfer experiments under both many-to-many and zero-shot setups.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Extremum seeking control of a class of constrained nonlinear systems
Authors:
Shuai Yuan,
Filippo Fabiani,
Simone Baldi
Abstract:
This paper studies the extremum seeking control (ESC) problem for a class of constrained nonlinear systems. Specifically, we focus on a family of constraints allowing to reformulate the original nonlinear system in the so-called input-output normal form. To steer the system to optimize a performance function without knowing its explicit form, we propose a novel numerical optimization-based extremu…
▽ More
This paper studies the extremum seeking control (ESC) problem for a class of constrained nonlinear systems. Specifically, we focus on a family of constraints allowing to reformulate the original nonlinear system in the so-called input-output normal form. To steer the system to optimize a performance function without knowing its explicit form, we propose a novel numerical optimization-based extremum seeking control (NOESC) design consisting of a constrained numerical optimization method and an inversion based feedforward controller. In particular, a projected gradient descent algorithm is exploited to produce the state sequence to optimize the performance function, whereas a suitable boundary value problem accommodates the finite-time state transition between each two consecutive points of the state sequence. Compared to available NOESC methods, the proposed approach i) can explicitly deal with output constraints; ii) the performance function can consider a direct dependence on the states of the internal dynamics; iii) the internal dynamics do not have to be necessarily stable. The effectiveness of the proposed ESC scheme is shown through extensive numerical simulations.
△ Less
Submitted 23 March, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Uncovering Interpretable Internal States of Merging Tasks at Highway On-Ramps for Autonomous Driving Decision-Making
Authors:
Huanjie Wang,
Wenshuo Wang,
Shihua Yuan,
Xueyuan Li
Abstract:
Humans make daily routine decisions based on their internal states in intricate interaction scenarios. This paper presents a probabilistically reconstructive learning approach to identify the internal states of multi-vehicle sequential interactions when merging at highway on-ramps. We treated the merging task's sequential decision as a dynamic, stochastic process and then integrated the internal s…
▽ More
Humans make daily routine decisions based on their internal states in intricate interaction scenarios. This paper presents a probabilistically reconstructive learning approach to identify the internal states of multi-vehicle sequential interactions when merging at highway on-ramps. We treated the merging task's sequential decision as a dynamic, stochastic process and then integrated the internal states into an HMM-GMR model, a probabilistic combination of an extended Gaussian mixture regression (GMR) and hidden Markov models (HMM). We also developed a variant expectation-maximum (EM) algorithm to estimate the model parameters and verified it based on a real-world data set. Experiment results reveal that three interpretable internal states can semantically describe the interactive merge procedure at highway on-ramps. This finding provides a basis to develop an efficient model-based decision-making algorithm for autonomous vehicles (AVs) in a partially observable environment.
△ Less
Submitted 13 May, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Vision Based Autonomous UAV Plane Estimation And Following for Building Inspection
Authors:
Yang Lyu,
Muqing Cao,
Shenghai Yuan,
Lihua Xie
Abstract:
Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure…
▽ More
Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure as concatenation of planes, and the control as an optimal reference tracking control problem. First, a vision based adaptive observer is proposed which can realize stable plane pose estimation under very mild observation conditions. Second, a model predictive controller is designed to achieve stable tracking and smooth transition in a multi-plane scenario, while the persistent excitation (PE) condition of the observer and the maneuver constraints of the UAV are satisfied. The proposed autonomous plane pose estimation and plane tracking methods are tested in both simulation and practical building fasçade inspection scenarios, which demonstrate their effectiveness and practicability.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
OCTA-500: A Retinal Dataset for Optical Coherence Tomography Angiography Study
Authors:
Mingchao Li,
Kun Huang,
Qiuzhuo Xu,
Jiadong Yang,
Yuhan Zhang,
Zexuan Ji,
Keren Xie,
Songtao Yuan,
Qinghuai Liu,
Qiang Chen
Abstract:
Optical coherence tomography angiography (OCTA) is a novel imaging modality that has been widely utilized in ophthalmology and neuroscience studies to observe retinal vessels and microvascular systems. However, publicly available OCTA datasets remain scarce. In this paper, we introduce the largest and most comprehensive OCTA dataset dubbed OCTA-500, which contains OCTA imaging under two fields of…
▽ More
Optical coherence tomography angiography (OCTA) is a novel imaging modality that has been widely utilized in ophthalmology and neuroscience studies to observe retinal vessels and microvascular systems. However, publicly available OCTA datasets remain scarce. In this paper, we introduce the largest and most comprehensive OCTA dataset dubbed OCTA-500, which contains OCTA imaging under two fields of view (FOVs) from 500 subjects. The dataset provides rich images and annotations including two modalities (OCT/OCTA volumes), six types of projections, four types of text labels (age / gender / eye / disease) and seven types of segmentation labels (large vessel/capillary/artery/vein/2D FAZ/3D FAZ/retinal layers). Then, we propose a multi-object segmentation task called CAVF, which integrates capillary segmentation, artery segmentation, vein segmentation, and FAZ segmentation under a unified framework. In addition, we optimize the 3D-to-2D image projection network (IPN) to IPN-V2 to serve as one of the segmentation baselines. Experimental results demonstrate that IPN-V2 achieves an ~10% mIoU improvement over IPN on CAVF task. Finally, we further study the impact of several dataset characteristics: the training set size, the model input (OCT/OCTA, 3D volume/2D projection), the baseline networks, and the diseases. The dataset and code are publicly available at: https://ieee-dataport.org/open-access/octa-500.
△ Less
Submitted 25 December, 2022; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Remote Configuration of the ProASIC3 on the ALICE Inner Tracking System Readout Unit
Authors:
Shiming Yuan,
Johan Alme,
Dieter Röhrich,
Matthias Richter,
Magnus Rentsch Ersdal,
Piero Giubilato,
Gianluca Aglieri Rinella,
Arild Velure,
Matteo Lupi,
Johann Joachim Schambach
Abstract:
A Large Ion Collider Experiment (ALICE) is one of the four major experiments conducted at the CERN Large Hadron Collider (LHC). The ALICE detector is currently undergoing an upgrade for the upcoming Run 3 at the LHC. The new Inner Tracking System (ITS) sub-detector is part of this upgrade. The front-end electronics of the ITS is composed by 192 Readout Units, installed in a radiation environment.…
▽ More
A Large Ion Collider Experiment (ALICE) is one of the four major experiments conducted at the CERN Large Hadron Collider (LHC). The ALICE detector is currently undergoing an upgrade for the upcoming Run 3 at the LHC. The new Inner Tracking System (ITS) sub-detector is part of this upgrade. The front-end electronics of the ITS is composed by 192 Readout Units, installed in a radiation environment. Single Event Upsets (SEUs) in the SRAM-based Xilinx Kintex Ultrascale FPGAs used in the ITS readout represent a real concern. To clear SEUs affecting the Kintex configuration memory, a secondary Flash-based Microsemi ProASIC3E (PA3) FPGA is used. This device configures and continuously scrubs the Xilinx FPGA while data-taking is ongoing, which avoids accumulation of SEUs. The communication path to the RUs is via the radiation hard Gigabit Transceiver (GBT) system on 100 m long optical links. The PA3 is reachable via the GBT Slow Control Adapter (GBT-SCA) ASIC using a dedicated JTAG bus driving channel. During the course of Run 3, it is foreseeable that the FPGA design of the PA3 will require upgrades to correct possible issues and add new functionality. It is therefore mandatory that the PA3 itself can be configured remotely, for which a dedicated software tool is needed. This paper presents the design and implementation of the distributed tools to re-configure remotely the PA3 FPGAs.
△ Less
Submitted 7 November, 2020;
originally announced November 2020.