Search | arXiv e-print repository

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

Authors: Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie

Abstract: The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the k… ▽ More The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the knowledge distillation (KD) method to use a larger teacher model to help train a smaller student model. We design a knowledge distillation (KD) method, integrating attention transfer and Kullback-Leibler divergence (AT-KL) to train the student model Distil-DCCRN. Additionally, we use a model with better performance and a more complicated structure, Uformer, as the teacher model. Unlike previous KD approaches that mainly focus on model outputs, our method also leverages the intermediate features from the models' middle layers, facilitating rich knowledge transfer across different structured models despite variations in layer configurations and discrepancies in the channel and time dimensions of intermediate features. Employing our AT-KL approach, Distil-DCCRN outperforms DCCRN as well as several other competitive models in both PESQ and SI-SNR metrics on the DNS test set and achieves comparable results to DCCRN in DNSMOS. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted by IEEE Signal Processing Letters

arXiv:2404.15958 [pdf, other]

Platooning of Heterogeneous Vehicles with Actuation Delays: Theoretical and Experimental Results

Authors: Redmer de Haan, Lorenzo Redi, Tom van der Sande, Erjen Lefeber

Abstract: In this paper we present a prediction-based Cooperative Adaptive Cruise Controller for vehicles with actuation delay, applicable within heterogeneous platoons. We provide a stability analysis for the discrete-time implementation of this controller, which shows the effect of the used sampling times and can be used for selecting appropriate controller gains. The theoretical results are validated by… ▽ More In this paper we present a prediction-based Cooperative Adaptive Cruise Controller for vehicles with actuation delay, applicable within heterogeneous platoons. We provide a stability analysis for the discrete-time implementation of this controller, which shows the effect of the used sampling times and can be used for selecting appropriate controller gains. The theoretical results are validated by means of experiments using full scale vehicles. This is an extended version of a paper with the same title (submitted to IFAC TDS 2024). Additional mathematical details are provided in this extended version. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2403.01513 [pdf]

CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion

Authors: Jiao Ding, Jie Chang, Renrui Han, Li Yang

Abstract: Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanis… ▽ More Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanism. This model enhances the standard UNet architecture by employing the Canny operator for edge detection in sample images, paralleling this with a similar network structure for semantic feature extraction. A key innovation is the Double SENet Feature Fusion Block, applied across corresponding network layers to effectively combine features from both image paths. Moreover, we have developed a Multiscale Convolution approach, replacing the standard Convolution in UNet, to adapt to the varied lesion sizes and shapes. This addition not only aids in accurately classifying lesion edge pixels but also significantly improves channel differentiation and expands the capacity of the model. Our evaluations on public datasets demonstrate CDSE-UNet's superior performance over other leading models, particularly in segmenting large and small lesion areas, accurately delineating lesion edges, and effectively suppressing noise △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.01808 [pdf, other]

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

arXiv:2401.03697 [pdf, other]

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

Authors: Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

Abstract: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-en… ▽ More This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-end automatic speech recognition (ASR) systems. Experiments show that our approach achieves a character error rate (CER) of 24.2% and 33.2% on the Dev and Eval set, respectively, obtaining the second place in the challenge. △ Less

Submitted 6 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2312.13722 [pdf, other]

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

Authors: Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu

Abstract: Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose… ▽ More Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose a novel streaming adaptive bandwidth extension solution dubbed BAE-Net, which is suitable to handle the low-resolution speech with unknown and varying effective bandwidth. To address the challenges of recovering both the high-frequency magnitude and phase speech content blindly, we devise a dual-stream architecture that incorporates the magnitude inpainting and phase refinement. For potential applications on edge devices, this paper also introduces BAE-NET-lite, which is a lightweight, streaming and efficient framework. Quantitative results demonstrate the superiority of BAE-Net in terms of both performance and computational efficiency when compared with existing state-of-the-art BWE methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2310.04369 [pdf, other]

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

Authors: Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

Abstract: A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-freque… ▽ More A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2303.07621 [pdf, other]

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

Authors: Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie

Abstract: In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise,… ▽ More In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise, reverberation, and artifacts introduced by the first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835 score, our system ranks 4th in the non-real-time track. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2207.13881 [pdf, other]

Feature Extraction, Modulation and Recognition of Mixed Signal Based on SVM

Authors: Rong Han, Zihuai Lin

Abstract: This paper introduces likelihood-based and feature-based modulation recognition methods. In the feature-based modulation simulation part, instantaneous feature, cyclic spectrum, high-order cumulants, and wavelet transform features are used as the entry point, and six digital signals including 2ASK, 4ASK, BPSK, QPSK, 2FSK and 4FSK are simulated, showing the difference of signals in multiple dimensi… ▽ More This paper introduces likelihood-based and feature-based modulation recognition methods. In the feature-based modulation simulation part, instantaneous feature, cyclic spectrum, high-order cumulants, and wavelet transform features are used as the entry point, and six digital signals including 2ASK, 4ASK, BPSK, QPSK, 2FSK and 4FSK are simulated, showing the difference of signals in multiple dimensions △ Less

Submitted 28 July, 2022; originally announced July 2022.

arXiv:2206.07649 [pdf, ps, other]

Atrial Fibrillation Detection Using Weight-Pruned, Log-Quantised Convolutional Neural Networks

Authors: Xiu Qi Chang, Ann Feng Chew, Benjamin Chen Ming Choong, Shuhui Wang, Rui Han, Wang He, Li Xiaolin, Rajesh C. Panicker, Deepu John

Abstract: Deep neural networks (DNN) are a promising tool in medical applications. However, the implementation of complex DNNs on battery-powered devices is challenging due to high energy costs for communication. In this work, a convolutional neural network model is developed for detecting atrial fibrillation from electrocardiogram (ECG) signals. The model demonstrates high performance despite being trained… ▽ More Deep neural networks (DNN) are a promising tool in medical applications. However, the implementation of complex DNNs on battery-powered devices is challenging due to high energy costs for communication. In this work, a convolutional neural network model is developed for detecting atrial fibrillation from electrocardiogram (ECG) signals. The model demonstrates high performance despite being trained on limited, variable-length input data. Weight pruning and logarithmic quantisation are combined to introduce sparsity and reduce model size, which can be exploited for reduced data movement and lower computational complexity. The final model achieved a 91.1% model compression ratio while maintaining high model accuracy of 91.7% and less than 1% loss. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2201.12702 [pdf, ps, other]

Robotic Wireless Energy Transfer in Dynamic Environments: System Design and Experimental Validation

Authors: Shuai Wang, Ruihua Han, Yuncong Hong, Qi Hao, Miaowen Wen, Leila Musavian, Shahid Mumtaz, Derrick Wing Kwan Ng

Abstract: Wireless energy transfer (WET) is a ground-breaking technology for cutting the last wire between mobile sensors and power grids in smart cities. Yet, WET only offers effective transmission of energy over a short distance. Robotic WET is an emerging paradigm that mounts the energy transmitter on a mobile robot and navigates the robot through different regions in a large area to charge remote energy… ▽ More Wireless energy transfer (WET) is a ground-breaking technology for cutting the last wire between mobile sensors and power grids in smart cities. Yet, WET only offers effective transmission of energy over a short distance. Robotic WET is an emerging paradigm that mounts the energy transmitter on a mobile robot and navigates the robot through different regions in a large area to charge remote energy harvesters. However, it is challenging to determine the robotic charging strategy in an unknown and dynamic environment due to the uncertainty of obstacles. This paper proposes a hardware-in-the-loop joint optimization framework that offers three distinctive features: 1) efficient model updates and re-optimization based on the last-round experimental data; 2) iterative refinement of the anchor list for adaptation to different environments; 3) verification of algorithms in a high-fidelity Gazebo simulator and a multi-robot testbed. Experimental results show that the proposed framework significantly saves the WET mission completion time while satisfying collision avoidance and energy harvesting constraints. △ Less

Submitted 10 February, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

Comments: single column, 18 pages, 6 figures, to appear in IEEE Communications Magazine

Journal ref: IEEE Communications Magazine, Mar. 2022

arXiv:2112.03511 [pdf, other]

Control Parameters Considered Harmful: Detecting Range Specification Bugs in Drone Configuration Modules via Learning-Guided Search

Authors: Ruidong Han, Chao Yang, Siqi Ma, JiangFeng Ma, Cong Sun, Juanru Li, Elisa Bertino

Abstract: In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter re… ▽ More In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter receives a value in the recommended value range, certain combinations of parameter values may affect the drone physical stability. In this paper we develop a novel learning-guided search system to find such combinations, that we refer to as incorrect configurations. Our system applies metaheuristic search algorithms mutating configurations to detect the configuration parameters that have values driving the drone to unstable physical states. To guide the mutations, our system leverages a machine learning predictor as the fitness evaluator. Finally, by utilizing multi-objective optimization, our system returns the feasible ranges based on the mutation search results. Because in our system the mutations are guided by a predictor, evaluating the parameter configurations does not require realistic/simulation executions. Therefore, our system supports a comprehensive and yet efficient detection of incorrect configurations. We have carried out an experimental evaluation of our system. The evaluation results show that the system successfully reports potentially incorrect configurations, of which over 85% lead to actual unstable physical states. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: Accepted to ICSE2022 Technical Track

arXiv:2111.09251 [pdf, other]

A fast solver for the pseudo-two-dimensional model of lithium-ion batteries

Authors: Rachel Han, Colin Macdonald, Brian Wetton

Abstract: The pseudo-two-dimensional (P2D) model is a complex mathematical model that can capture the electrochemical processes in Li-ion batteries. However, the model also brings a heavy computational burden. Many simplifications to the model have been introduced in the literature to reduce the complexity. We present a method for fast computation of the P2D model which can be used when simplifications are… ▽ More The pseudo-two-dimensional (P2D) model is a complex mathematical model that can capture the electrochemical processes in Li-ion batteries. However, the model also brings a heavy computational burden. Many simplifications to the model have been introduced in the literature to reduce the complexity. We present a method for fast computation of the P2D model which can be used when simplifications are not accurate enough. By rearranging the calculations, we reduce the complexity of the linear algebra problem. We also employ automatic differentiation, using an open source package JAX for robustness, while also allowing easy implementation of changes to coefficient expressions. The method alleviates the computational bottleneck in P2D models without compromising accuracy. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2108.09229 [pdf]

Using Uncertainty in Deep Learning Reconstruction for Cone-Beam CT of the Brain

Authors: Pengwei Wu, Alejandro Sisniega, Ali Uneri, Runze Han, Craig Jones, Prasad Vagdargi, Xiaoxuan Zhang, Mark Luciano, William Anderson, Jeffrey Siewerdsen

Abstract: Contrast resolution beyond the limits of conventional cone-beam CT (CBCT) systems is essential to high-quality imaging of the brain. We present a deep learning reconstruction method (dubbed DL-Recon) that integrates physically principled reconstruction models with DL-based image synthesis based on the statistical uncertainty in the synthesis image. A synthesis network was developed to generate a s… ▽ More Contrast resolution beyond the limits of conventional cone-beam CT (CBCT) systems is essential to high-quality imaging of the brain. We present a deep learning reconstruction method (dubbed DL-Recon) that integrates physically principled reconstruction models with DL-based image synthesis based on the statistical uncertainty in the synthesis image. A synthesis network was developed to generate a synthesized CBCT image (DL-Synthesis) from an uncorrected filtered back-projection (FBP) image. To improve generalizability (including accurate representation of lesions not seen in training), voxel-wise epistemic uncertainty of DL-Synthesis was computed using a Bayesian inference technique (Monte-Carlo dropout). In regions of high uncertainty, the DL-Recon method incorporates information from a physics-based reconstruction model and artifact-corrected projection data. Two forms of the DL-Recon method are proposed: (i) image-domain fusion of DL-Synthesis and FBP (DL-FBP) weighted by DL uncertainty; and (ii) a model-based iterative image reconstruction (MBIR) optimization using DL-Synthesis to compute a spatially varying regularization term based on DL uncertainty (DL-MBIR). The error in DL-Synthesis images was correlated with the uncertainty in the synthesis estimate. Compared to FBP and PWLS, the DL-Recon methods (both DL-FBP and DL-MBIR) showed ~50% reduction in noise (at matched spatial resolution) and ~40-70% improvement in image uniformity. Conventional DL-Synthesis alone exhibited ~10-60% under-estimation of lesion contrast and ~5-40% reduction in lesion segmentation accuracy (Dice coefficient) in simulated and real brain lesions, suggesting a lack of reliability / generalizability for structures unseen in the training data. DL-FBP and DL-MBIR improved the accuracy of reconstruction by directly incorporating information from the measurements in regions of high uncertainty. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: This work was presented at the 16th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine (Fully3D), July 19-23, 2021, Leuven, Belgium

arXiv:2108.08498 [pdf, other]

Blind Identification of State-Space Models in Physical Coordinates

Authors: Runzhe Han, Christian Bohn, Georg Bauer

Abstract: Blind identification is popular for modeling a system without the input information, such as in the research areas of structural health monitoring and audio signal processing. Existing blind identification methods have both advantages and disadvantages, in this paper, we briefly outline current methods and propose a novel blind identification method for identifying state-space models in physical c… ▽ More Blind identification is popular for modeling a system without the input information, such as in the research areas of structural health monitoring and audio signal processing. Existing blind identification methods have both advantages and disadvantages, in this paper, we briefly outline current methods and propose a novel blind identification method for identifying state-space models in physical coordinates. The idea behind this proposed method is first to regard the collected input data of a state-space model as a part of a periodic signal sequence, and then transform the state-space model with input and output into a model without input by augmenting the state-space model with the input model (which is a periodic signal model), and afterwards use merely the output information to identify a state-space model up to a similarity transformation, and finally derive the state-space model in physical coordinates by using a unique similarity transformation. With the above idea, physical parameters and modal parameters of a state-space system can be obtained. Both numerical and practical examples were used to validate the proposed method. The result showed the effectiveness of the novel blind identification method. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2106.04452 [pdf, other]

3KG: Contrastive Learning of 12-Lead Electrocardiograms using Physiologically-Inspired Augmentations

Authors: Bryan Gopal, Ryan W. Han, Gautham Raghupathi, Andrew Y. Ng, Geoffrey H. Tison, Pranav Rajpurkar

Abstract: We propose 3KG, a physiologically-inspired contrastive learning approach that generates views using 3D augmentations of the 12-lead electrocardiogram. We evaluate representation quality by fine-tuning a linear layer for the downstream task of 23-class diagnosis on the PhysioNet 2020 challenge training data and find that 3KG achieves a $9.1\%$ increase in mean AUC over the best self-supervised base… ▽ More We propose 3KG, a physiologically-inspired contrastive learning approach that generates views using 3D augmentations of the 12-lead electrocardiogram. We evaluate representation quality by fine-tuning a linear layer for the downstream task of 23-class diagnosis on the PhysioNet 2020 challenge training data and find that 3KG achieves a $9.1\%$ increase in mean AUC over the best self-supervised baseline when trained on $1\%$ of labeled data. Our empirical analysis shows that combining spatial and temporal augmentations produces the strongest representations. In addition, we investigate the effect of this physiologically-inspired pretraining on downstream performance on different disease subgroups and find that 3KG makes the greatest gains for conduction and rhythm abnormalities. Our method allows for flexibility in incorporating other self-supervised strategies and highlights the potential for similar modality-specific augmentations for other biomedical signals. △ Less

Submitted 20 September, 2021; v1 submitted 21 April, 2021; originally announced June 2021.

Comments: 11 pages, 3 figures, paper revision with new set of experiments and comparison to previous methods

arXiv:2010.00175 [pdf]

C-Arm Non-Circular Orbits: Geometric Calibration, Image Quality, and Avoidance of Metal Artifacts

Authors: Pengwei Wu, Niral Sheth, Alejandro Sisniega, Tongyu Wang, Ali Uneri, Runze Han, Rohan Vijayan, Prasad Vagdargi, Bjoern Kreher, Holger Kunze, Gerhard Kleinszig, Sebastian Vogt, Sheng-Fu Larry Lo, Nicholas Theodore, Jeffrey Siewerdsen

Abstract: Metal artifacts present a frequent challenge to cone-beam CT (CBCT) in image-guided surgery, obscuring visualization of metal instruments and adjacent anatomy. Recent advances in mobile C-arm systems have enabled 3D imaging capacity with non-circular orbits. We extend a previously proposed metal artifacts avoidance (MAA) method to reduce the influence of metal artifacts by prospectively defining a… ▽ More Metal artifacts present a frequent challenge to cone-beam CT (CBCT) in image-guided surgery, obscuring visualization of metal instruments and adjacent anatomy. Recent advances in mobile C-arm systems have enabled 3D imaging capacity with non-circular orbits. We extend a previously proposed metal artifacts avoidance (MAA) method to reduce the influence of metal artifacts by prospectively defining a non-circular orbit that avoids metal-induced biases in projection domain. Accurate geometric calibration is an important challenge to accurate 3D image reconstruction for such orbits. We investigate the performance of interpolation-based calibration from a library of circular orbits for any non-circular orbit. We apply the method to non-circular scans acquired for MAA, which involves: (i) coarse 3D localization of metal objects via only two scout views using an end-to-end trained neural network; (ii) calculation of the metal-induced x-ray spectral shift for all possible views; and (iii) identification of the non-circular orbit that minimizes the variations in spectral shift. Non-circular orbits with interpolation-based geometric calibration yielded reasonably accurate 3D image reconstruction. The end-to-end neural network accurately localized metal implants with just two scout views even in complex anatomical scenes, improving Dice coefficient by ~42% compared to a more conventional cascade of separately trained U-nets. In a spine phantom with pedicle screw instrumentation, non-circular orbits identified by the MAA method reduced the magnitude of metal "blomming" artifacts (apparent width of the screw shaft) in CBCT reconstructions by ~70%. The proposed imaging and calibration methods present a practical means to improve image quality in mobile C-arm CBCT by identifying non-circular scan protocols that improve sampling and reduce metal-induced biases in the projection data. △ Less

Submitted 30 September, 2020; originally announced October 2020.

Comments: This work was presented at the 6th International Conference on Image Formation in X-Ray Computed Tomography, August, 2020, Regensburg, Germany

arXiv:1910.04918 [pdf]

Deep Learning for Prostate Pathology

Authors: Okyaz Eminaga, Yuri Tolkach, Christian Kunder, Mahmood Abbas, Ryan Han, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Sebastian Huss, Andreas Loening, Robert West, Geoffrey Sonn, Richard Fan, Olaf Bettendorf, James Brook, Daniel Rubin

Abstract: The current study detects different morphologies related to prostate pathology using deep learning models; these models were evaluated on 2,121 hematoxylin and eosin (H&E) stain histology images captured using bright field microscopy, which spanned a variety of image qualities, origins (whole slide, tissue micro array, whole mount, Internet), scanning machines, timestamps, H&E staining protocols,… ▽ More The current study detects different morphologies related to prostate pathology using deep learning models; these models were evaluated on 2,121 hematoxylin and eosin (H&E) stain histology images captured using bright field microscopy, which spanned a variety of image qualities, origins (whole slide, tissue micro array, whole mount, Internet), scanning machines, timestamps, H&E staining protocols, and institutions. For case usage, these models were applied for the annotation tasks in clinician-oriented pathology reports for prostatectomy specimens. The true positive rate (TPR) for slides with prostate cancer was 99.7% by a false positive rate of 0.785%. The F1-scores of Gleason patterns reported in pathology reports ranged from 0.795 to 1.0 at the case level. TPR was 93.6% for the cribriform morphology and 72.6% for the ductal morphology. The correlation between the ground truth and the prediction for the relative tumor volume was 0.987 n. Our models cover the major components of prostate pathology and successfully accomplish the annotation tasks. △ Less

Submitted 15 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

arXiv:1907.11458 [pdf, other]

Multiple Human Association between Top and Horizontal Views by Matching Subjects' Spatial Distributions

Authors: Ruize Han, Yujun Zhang, Wei Feng, Chenxing Gong, Xiaoyu Zhang, Jiewen Zhao, Liang Wan, Song Wang

Abstract: Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view data, e.g., those from wearable cameras on the ground. Collaborative analysis of different-view data can facilitate various kinds of applications, such as human tracking, person identification, and human activity recognition. However, for such collabo… ▽ More Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view data, e.g., those from wearable cameras on the ground. Collaborative analysis of different-view data can facilitate various kinds of applications, such as human tracking, person identification, and human activity recognition. However, for such collaborative analysis, the first step is to associate people, referred to as subjects in this paper, across these two views. This is a very challenging problem due to large human-appearance difference between top and horizontal views. In this paper, we present a new approach to address this problem by exploring and matching the subjects' spatial distributions between the two views. More specifically, on the top-view image, we model and match subjects' relative positions to the horizontal-view camera in both views and define a matching cost to decide the actual location of horizontal-view camera and its view angle in the top-view image. We collect a new dataset consisting of top-view and horizontal-view image pairs for performance evaluation and the experimental results show the effectiveness of the proposed method. △ Less

Submitted 26 July, 2019; originally announced July 2019.

Showing 1–19 of 19 results for author: Han, R