-
Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems
Authors:
Haozhe Xu,
Cong Wu,
Yangyang Gu,
Xingcan Shang,
Jing Chen,
Kun He,
Ruiying Du
Abstract:
The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This…
▽ More
The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This lack of comprehensive analysis presents a challenge for VCS designers in fully understanding and mitigating the security issues within these systems.
Addressing this gap, our study introduces a hierarchical model structure for VCS, providing a novel lens for categorizing and analyzing existing literature in a systematic manner. We classify attacks based on their technical principles and thoroughly evaluate various attributes, such as their methods, targets, vectors, and behaviors. Furthermore, we consolidate and assess the defense mechanisms proposed in current research, offering actionable recommendations for enhancing VCS security. Our work makes a significant contribution by simplifying the complexity inherent in VCS security, aiding designers in effectively identifying and countering potential threats, and setting a foundation for future advancements in VCS security research.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation
Authors:
Runzhao Yang,
Yinda Chen,
Zhihong Zhang,
Xiaoyu Liu,
Zongren Li,
Kunlun He,
Zhiwei Xiong,
Jinli Suo,
Qionghai Dai
Abstract:
In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multi…
▽ More
In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multiple medical data blocks using a single INR network. By employing wavelet transforms and quantization, we introduce a codebook containing frequency domain information as a prior input to the INR network. This enhances the representational power of INR and provides distinctive conditioning for different image blocks. Furthermore, our research introduces a new technique for the knowledge distillation of implicit representations, simplifying complex model knowledge into more manageable formats to improve compression ratios. Extensive testing on CT and electron microscopy (EM) datasets has demonstrated that UniCompress outperforms traditional INR methods and commercial compression solutions like HEVC, especially in complex and high compression scenarios. Notably, compared to existing INR techniques, UniCompress achieves a 4$\sim$5 times increase in compression speed, marking a significant advancement in the field of medical image compression. Codes will be publicly available.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement
Authors:
Igor Morawski,
Kai He,
Shusil Dangi,
Winston H. Hsu
Abstract:
Currently, low-light conditions present a significant challenge for machine cognition. In this paper, rather than optimizing models by assuming that human and machine cognition are correlated, we use zero-reference low-light enhancement to improve the performance of downstream task models. We propose to improve the zero-reference low-light enhancement method by leveraging the rich visual-linguisti…
▽ More
Currently, low-light conditions present a significant challenge for machine cognition. In this paper, rather than optimizing models by assuming that human and machine cognition are correlated, we use zero-reference low-light enhancement to improve the performance of downstream task models. We propose to improve the zero-reference low-light enhancement method by leveraging the rich visual-linguistic CLIP prior without any need for paired or unpaired normal-light data, which is laborious and difficult to collect. We propose a simple but effective strategy to learn prompts that help guide the enhancement method and experimentally show that the prompts learned without any need for normal-light data improve image contrast, reduce over-enhancement, and reduce noise over-amplification. Next, we propose to reuse the CLIP model for semantic guidance via zero-shot open vocabulary classification to optimize low-light enhancement for task-based performance rather than human visual perception. We conduct extensive experimental results showing that the proposed method leads to consistent improvements across various datasets regarding task-based performance and compare our method against state-of-the-art methods, showing favorable results across various low-light datasets.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Authors:
Haolin Wu,
Jing Chen,
Ruiying Du,
Cong Wu,
Kun He,
Xingcan Shang,
Hao Ren,
Guowen Xu
Abstract:
The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation…
▽ More
The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model
Authors:
Maoxuan Zhou,
Wei Kang,
Kun He
Abstract:
In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly…
▽ More
In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly, Gram angular field coding technique is used to encode the time domain signal of the rolling bearing and generate the feature map to retain the complete information of the vibration signal. Then, the re-sulting data is divided into a training set, a validation set, and a test set. Among them, the training set is input into the gradient penalty Wasserstein distance generation adversarial network to complete the training, and a new sample with similar features to the training sample is obtained, and then the original training set is expanded. Next, multi-scale convolution is used to extract the fault features of the extended training set, and the feature graph is normalized by example to overcome the influence of the difference in feature distribution. Finally, the attention mechanism is applied to the adaptive weighting of normalized features and the extraction of deep features, and the fault diagnosis is completed by the softmax classifier. Compared with ResNet method, the experimental results show that the proposed method has better generalization performance and anti-noise performance.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
State-action control barrier functions: Imposing safety on learning-based control with low online computational costs
Authors:
Kanghui He,
Shengling Shi,
Ton van den Boom,
Bart De Schutter
Abstract:
Learning-based control with safety guarantees usually requires real-time safety certification and modifications of possibly unsafe learning-based policies. The control barrier function (CBF) method uses a safety filter containing a constrained optimization problem to produce safe policies. However, finding a valid CBF for a general nonlinear system requires a complex function parameterization, whi…
▽ More
Learning-based control with safety guarantees usually requires real-time safety certification and modifications of possibly unsafe learning-based policies. The control barrier function (CBF) method uses a safety filter containing a constrained optimization problem to produce safe policies. However, finding a valid CBF for a general nonlinear system requires a complex function parameterization, which in general, makes the policy optimization problem difficult to solve in real time. For nonlinear systems with nonlinear state constraints, this paper proposes the novel concept of state-action CBFs, which not only characterize the safety at each state but also evaluate the control inputs taken at each state. State-action CBFs, in contrast to CBFs, enable a flexible parameterization, resulting in a safety filter that involves a convex quadratic optimization problem. This, in turn, significantly alleviates the online computational burden. To synthesize state-action CBFs, we propose a learning-based approach exploiting Hamilton-Jacobi reachability. The effect of learning errors on the effectiveness of state-action CBFs is addressed by constraint tightening and introducing a new concept called contractive CBFs. These contributions ensure formal safety guarantees for learned CBFs and control policies, enhancing the applicability of learning-based control in real-time scenarios. Simulation results on an inverted pendulum with elastic walls validate the proposed CBFs in terms of constraint satisfaction and CPU time.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Approximate Dynamic Programming for Constrained Piecewise Affine Systems with Stability and Safety Guarantees
Authors:
Kanghui He,
Shengling Shi,
Ton van den Boom,
Bart De Schutter
Abstract:
Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this paper, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement lea…
▽ More
Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this paper, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement learning. We accommodate non-convex union-of-polyhedra state constraints and linear input constraints into ADP by designing PWA penalty functions. PWA function approximation is used, which allows for a mixed-integer encoding to implement ADP. The main advantage of the proposed ADP method is its online computational efficiency. Particularly, we propose two control policies, which lead to solving a smaller-scale mixed-integer linear program than conventional hybrid MPC, or a single convex quadratic program, depending on whether the policy is implicitly determined online or explicitly computed offline. We characterize the stability and safety properties of the closed-loop systems, as well as the sub-optimality of the proposed policies, by quantifying the approximation errors of value functions and policies. We also develop an offline mixed-integer linear programming-based method to certify the reliability of the proposed method. Simulation results on an inverted pendulum with elastic walls and on an adaptive cruise control problem validate the control performance in terms of constraint satisfaction and CPU time.
△ Less
Submitted 6 January, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Two Independent Teachers are Better Role Model
Authors:
Afifa Khaled,
Ahmed A. Mubarak,
Kun He
Abstract:
Recent deep learning models have attracted substantial attention in infant brain analysis. These models have performed state-of-the-art performance, such as semi-supervised techniques (e.g., Temporal Ensembling, mean teacher). However, these models depend on an encoder-decoder structure with stacked local operators to gather long-range information, and the local operators limit the efficiency and…
▽ More
Recent deep learning models have attracted substantial attention in infant brain analysis. These models have performed state-of-the-art performance, such as semi-supervised techniques (e.g., Temporal Ensembling, mean teacher). However, these models depend on an encoder-decoder structure with stacked local operators to gather long-range information, and the local operators limit the efficiency and effectiveness. Besides, the $MRI$ data contain different tissue properties ($TPs$) such as $T1$ and $T2$. One major limitation of these models is that they use both data as inputs to the segment process, i.e., the models are trained on the dataset once, and it requires much computational and memory requirements during inference. In this work, we address the above limitations by designing a new deep-learning model, called 3D-DenseUNet, which works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss. The self-attention module connects the down-sampling blocks to up-sampling blocks, and integrates the feature maps in three dimensions of spatial and channel, effectively improving the representation potential and discriminating ability of the model. Additionally, we propose a new method called Two Independent Teachers ($2IT$), that summarizes the model weights instead of label predictions. Each teacher model is trained on different types of brain data, $T1$ and $T2$, respectively. Then, a fuse model is added to improve test accuracy and enable training with fewer parameters and labels compared to the Temporal Ensembling method without modifying the network architecture. Empirical results demonstrate the effectiveness of the proposed method. The code is available at https://github.com/AfifaKhaled/Two-Independent-Teachers-are-Better-Role-Model.
△ Less
Submitted 21 December, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis
Authors:
Yonghao Li,
Tao Zhou,
Kelei He,
Yi Zhou,
Dinggang Shen
Abstract:
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data whil…
▽ More
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data while a large number of unpaired data. To take advantage of both paired and unpaired data, in this paper, we propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis. Specifically, an Edge-preserving Masked AutoEncoder (Edge-MAE) is first pre-trained in a self-supervised manner to simultaneously perform 1) image imputation for randomly masked patches in each image and 2) whole edge map estimation, which effectively learns both contextual and structural information. Besides, a novel patch-wise loss is proposed to enhance the performance of Edge-MAE by treating different masked patches differently according to the difficulties of their respective imputations. Based on this proposed pre-training, in the subsequent fine-tuning stage, a Dual-scale Selective Fusion (DSF) module is designed (in our MT-Net) to synthesize missing-modality images by integrating multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Further, this pre-trained encoder is also employed to extract high-level features from the synthesized image and corresponding ground-truth image, which are required to be similar (consistent) in the training. Experimental results show that our MT-Net achieves comparable performance to the competing methods even using $70\%$ of all available paired data. Our code will be publicly available at https://github.com/lyhkevin/MT-Net.
△ Less
Submitted 18 June, 2023; v1 submitted 2 December, 2022;
originally announced December 2022.
-
Approximate Dynamic Programming for Constrained Linear Systems: A Piecewise Quadratic Approximation Approach
Authors:
Kanghui He,
Shengling Shi,
Ton van den Boom,
Bart De Schutter
Abstract:
Approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability guarantees, although its computation is sometimes prohibitive. This paper introduces an approach combining the two methodologies to overcome their individual limitations. The predictiv…
▽ More
Approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability guarantees, although its computation is sometimes prohibitive. This paper introduces an approach combining the two methodologies to overcome their individual limitations. The predictive control law for constrained linear quadratic regulation (CLQR) problems has been proven to be piecewise affine (PWA) while the value function is piecewise quadratic. We exploit these formal results from MPC to design an ADP method for CLQR problems. A novel convex and piecewise quadratic neural network with a local-global architecture is proposed to provide an accurate approximation of the value function, which is used as the cost-to-go function in the online dynamic programming problem. An efficient decomposition algorithm is developed to speed up the online computation. Rigorous stability analysis of the closed-loop system is conducted for the proposed control scheme under the condition that a good approximation of the value function is achieved. Comparative simulations are carried out to demonstrate the potential of the proposed method in terms of online computation and optimality.
△ Less
Submitted 6 April, 2023; v1 submitted 20 May, 2022;
originally announced May 2022.
-
NetRCA: An Effective Network Fault Cause Localization Algorithm
Authors:
Chaoli Zhang,
Zhiqiang Zhou,
Yingying Zhang,
Linxiao Yang,
Kai He,
Qingsong Wen,
Liang Sun
Abstract:
Localizing the root cause of network faults is crucial to network operation and maintenance. However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. Firstly, we extract effective derived features from…
▽ More
Localizing the root cause of network faults is crucial to network operation and maintenance. However, due to the complicated network architectures and wireless environments, as well as limited labeled data, accurately localizing the true root cause is challenging. In this paper, we propose a novel algorithm named NetRCA to deal with this problem. Firstly, we extract effective derived features from the original raw data by considering temporal, directional, attribution, and interaction characteristics. Secondly, we adopt multivariate time series similarity and label propagation to generate new training data from both labeled and unlabeled data to overcome the lack of labeled samples. Thirdly, we design an ensemble model which combines XGBoost, rule set learning, attribution model, and graph algorithm, to fully utilize all data information and enhance performance. Finally, experiments and analysis are conducted on the real-world dataset from ICASSP 2022 AIOps Challenge to demonstrate the superiority and effectiveness of our approach.
△ Less
Submitted 6 March, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
A New Entity Extraction Method Based on Machine Reading Comprehension
Authors:
Xiaobo Jiang,
Kun He,
Jiajun He,
Guangyu Yan
Abstract:
Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extract…
▽ More
Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. Experiments have proved that MRC-I2DP represents an overall state-of-the-art model in 7 from the scientific and public domains, achieving a performance improvement of up to compared to the model model in F1.
△ Less
Submitted 20 August, 2021; v1 submitted 13 August, 2021;
originally announced August 2021.
-
The Detection of Thoracic Abnormalities ChestX-Det10 Challenge Results
Authors:
Jie Lian,
Jingyu Liu,
Yizhou Yu,
Mengyuan Ding,
Yaoci Lu,
Yi Lu,
Jie Cai,
Deshou Lin,
Miao Zhang,
Zhe Wang,
Kai He,
Yijie Yu
Abstract:
The detection of thoracic abnormalities challenge is organized by the Deepwise AI Lab. The challenge is divided into two rounds. In this paper, we present the results of 6 teams which reach the second round. The challenge adopts the ChestX-Det10 dateset proposed by the Deepwise AI Lab. ChestX-Det10 is the first chest X-Ray dataset with instance-level annotations, including 10 categories of disease…
▽ More
The detection of thoracic abnormalities challenge is organized by the Deepwise AI Lab. The challenge is divided into two rounds. In this paper, we present the results of 6 teams which reach the second round. The challenge adopts the ChestX-Det10 dateset proposed by the Deepwise AI Lab. ChestX-Det10 is the first chest X-Ray dataset with instance-level annotations, including 10 categories of disease/abnormality of 3,543 images. The annotations are located at https://github.com/Deepwise-AILab/ChestX-Det10-Dataset. In the challenge, we randomly split all data into 3001 images for training and 542 images for testing.
△ Less
Submitted 21 October, 2020; v1 submitted 19 October, 2020;
originally announced October 2020.
-
HF-UNet: Learning Hierarchically Inter-Task Relevance in Multi-Task U-Net for Accurate Prostate Segmentation
Authors:
Kelei He,
Chunfeng Lian,
Bing Zhang,
Xin Zhang,
Xiaohuan Cao,
Dong Nie,
Yang Gao,
Junfeng Zhang,
Dinggang Shen
Abstract:
Accurate segmentation of the prostate is a key step in external beam radiation therapy treatments. In this paper, we tackle the challenging task of prostate segmentation in CT images by a two-stage network with 1) the first stage to fast localize, and 2) the second stage to accurately segment the prostate. To precisely segment the prostate in the second stage, we formulate prostate segmentation in…
▽ More
Accurate segmentation of the prostate is a key step in external beam radiation therapy treatments. In this paper, we tackle the challenging task of prostate segmentation in CT images by a two-stage network with 1) the first stage to fast localize, and 2) the second stage to accurately segment the prostate. To precisely segment the prostate in the second stage, we formulate prostate segmentation into a multi-task learning framework, which includes a main task to segment the prostate, and an auxiliary task to delineate the prostate boundary. Here, the second task is applied to provide additional guidance of unclear prostate boundary in CT images. Besides, the conventional multi-task deep networks typically share most of the parameters (i.e., feature representations) across all tasks, which may limit their data fitting ability, as the specificities of different tasks are inevitably ignored. By contrast, we solve them by a hierarchically-fused U-Net structure, namely HF-UNet. The HF-UNet has two complementary branches for two tasks, with the novel proposed attention-based task consistency learning block to communicate at each level between the two decoding branches. Therefore, HF-UNet endows the ability to learn hierarchically the shared representations for different tasks, and preserve the specificities of learned representations for different tasks simultaneously. We did extensive evaluations of the proposed method on a large planning CT image dataset, including images acquired from 339 patients. The experimental results show HF-UNet outperforms the conventional multi-task network architectures and the state-of-the-art methods.
△ Less
Submitted 23 May, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling
Authors:
Kelei He,
Chunfeng Lian,
Ehsan Adeli,
Jing Huo,
Yang Gao,
Bing Zhang,
Junfeng Zhang,
Dinggang Shen
Abstract:
Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmenta…
▽ More
Fully convolutional networks (FCNs), including UNet and VNet, are widely-used network architectures for semantic segmentation in recent studies. However, conventional FCN is typically trained by the cross-entropy or Dice loss, which only calculates the error between predictions and ground-truth labels for pixels individually. This often results in non-smooth neighborhoods in the predicted segmentation. To address this problem, we propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate by a multi-task UNet architecture. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Therefore, the proposed network has a dual-branch architecture that tackles two tasks: 1) a segmentation sub-network aiming to generate the prostate segmentation, and 2) a voxel-metric learning sub-network aiming to improve the quality of the learned feature space supervised by a metric loss. Specifically, the voxel-metric learning sub-network samples tuples (including triplets and pairs) in voxel-level through the intermediate feature maps. Unlike conventional deep metric learning methods that generate triplets or pairs in image-level before the training phase, our proposed voxel-wise tuples are sampled in an online manner and operated in an end-to-end fashion via multi-task learning. To evaluate the proposed method, we implement extensive experiments on a real CT image dataset consisting of 339 patients. The ablation studies show that our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss. And the comparisons show that the proposed method outperforms the state-of-the-art methods by a reasonable margin.
△ Less
Submitted 23 January, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Synergistic Learning of Lung Lobe Segmentation and Hierarchical Multi-Instance Classification for Automated Severity Assessment of COVID-19 in CT Images
Authors:
Kelei He,
Wei Zhao,
Xingzhi Xie,
Wen Ji,
Mingxia Liu,
Zhenyu Tang,
Feng Shi,
Yang Gao,
Jun Liu,
Junfeng Zhang,
Dinggang Shen
Abstract:
Understanding chest CT imaging of the coronavirus disease 2019 (COVID-19) will help detect infections early and assess the disease progression. Especially, automated severity assessment of COVID-19 in CT images plays an essential role in identifying cases that are in great need of intensive clinical care. However, it is often challenging to accurately assess the severity of this disease in CT imag…
▽ More
Understanding chest CT imaging of the coronavirus disease 2019 (COVID-19) will help detect infections early and assess the disease progression. Especially, automated severity assessment of COVID-19 in CT images plays an essential role in identifying cases that are in great need of intensive clinical care. However, it is often challenging to accurately assess the severity of this disease in CT images, due to variable infection regions in the lungs, similar imaging biomarkers, and large inter-case variations. To this end, we propose a synergistic learning framework for automated severity assessment of COVID-19 in 3D CT images, by jointly performing lung lobe segmentation and multi-instance classification. Considering that only a few infection regions in a CT image are related to the severity assessment, we first represent each input image by a bag that contains a set of 2D image patches (with each cropped from a specific slice). A multi-task multi-instance deep network (called M$^2$UNet) is then developed to assess the severity of COVID-19 patients and also segment the lung lobe simultaneously. Our M$^2$UNet consists of a patch-level encoder, a segmentation sub-network for lung lobe segmentation, and a classification sub-network for severity assessment (with a unique hierarchical multi-instance learning strategy). Here, the context information provided by segmentation can be implicitly employed to improve the performance of severity assessment. Extensive experiments were performed on a real COVID-19 CT image dataset consisting of 666 chest CT images, with results suggesting the effectiveness of our proposed method compared to several state-of-the-art methods.
△ Less
Submitted 24 May, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19
Authors:
Feng Shi,
Jun Wang,
Jun Shi,
Ziyan Wu,
Qian Wang,
Zhenyu Tang,
Kelei He,
Yinghuan Shi,
Dinggang Shen
Abstract:
(This paper was submitted as an invited paper to IEEE Reviews in Biomedical Engineering on April 6, 2020.) The pandemic of coronavirus disease 2019 (COVID-19) is spreading all over the world. Medical imaging such as X-ray and computed tomography (CT) plays an essential role in the global fight against COVID-19, whereas the recently emerging artificial intelligence (AI) technologies further strengt…
▽ More
(This paper was submitted as an invited paper to IEEE Reviews in Biomedical Engineering on April 6, 2020.) The pandemic of coronavirus disease 2019 (COVID-19) is spreading all over the world. Medical imaging such as X-ray and computed tomography (CT) plays an essential role in the global fight against COVID-19, whereas the recently emerging artificial intelligence (AI) technologies further strengthen the power of the imaging tools and help medical specialists. We hereby review the rapid responses in the community of medical imaging (empowered by AI) toward COVID-19. For example, AI-empowered image acquisition can significantly help automate the scanning procedure and also reshape the workflow with minimal contact to patients, providing the best protection to the imaging technicians. Also, AI can improve work efficiency by accurate delination of infections in X-ray and CT images, facilitating subsequent quantification. Moreover, the computer-aided platforms help radiologists make clinical decisions, i.e., for disease diagnosis, tracking, and prognosis. In this review paper, we thus cover the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis, and follow-up. We particularly focus on the integration of AI with X-ray and CT, both of which are widely used in the frontline hospitals, in order to depict the latest progress of medical imaging and radiology fighting against COVID-19.
△ Less
Submitted 7 April, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Automatic Data Augmentation via Deep Reinforcement Learning for Effective Kidney Tumor Segmentation
Authors:
Tiexin Qin,
Ziyuan Wang,
Kelei He,
Yinghuan Shi,
Yang Gao,
Dinggang Shen
Abstract:
Conventional data augmentation realized by performing simple pre-processing operations (\eg, rotation, crop, \etc) has been validated for its advantage in enhancing the performance for medical image segmentation. However, the data generated by these conventional augmentation methods are random and sometimes harmful to the subsequent segmentation. In this paper, we developed a novel automatic learn…
▽ More
Conventional data augmentation realized by performing simple pre-processing operations (\eg, rotation, crop, \etc) has been validated for its advantage in enhancing the performance for medical image segmentation. However, the data generated by these conventional augmentation methods are random and sometimes harmful to the subsequent segmentation. In this paper, we developed a novel automatic learning-based data augmentation method for medical image segmentation which models the augmentation task as a trial-and-error procedure using deep reinforcement learning (DRL). In our method, we innovatively combine the data augmentation module and the subsequent segmentation module in an end-to-end training manner with a consistent loss. Specifically, the best sequential combination of different basic operations is automatically learned by directly maximizing the performance improvement (\ie, Dice ratio) on the available validation set. We extensively evaluated our method on CT kidney tumor segmentation which validated the promising results of our method.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
RobustPeriod: Time-Frequency Mining for Robust Multiple Periodicity Detection
Authors:
Qingsong Wen,
Kai He,
Liang Sun,
Yingying Zhang,
Min Ke,
Huan Xu
Abstract:
Periodicity detection is a crucial step in time series tasks, including monitoring and forecasting of metrics in many areas, such as IoT applications and self-driving database management system. In many of these applications, multiple periodic components exist and are often interlaced with each other. Such dynamic and complicated periodic patterns make the accurate periodicity detection difficult.…
▽ More
Periodicity detection is a crucial step in time series tasks, including monitoring and forecasting of metrics in many areas, such as IoT applications and self-driving database management system. In many of these applications, multiple periodic components exist and are often interlaced with each other. Such dynamic and complicated periodic patterns make the accurate periodicity detection difficult. In addition, other components in the time series, such as trend, outliers and noises, also pose additional challenges for accurate periodicity detection. In this paper, we propose a robust and general framework for multiple periodicity detection. Our algorithm applies maximal overlap discrete wavelet transform to transform the time series into multiple temporal-frequency scales such that different periodic components can be isolated. We rank them by wavelet variance, and then at each scale detect single periodicity by our proposed Huber-periodogram and Huber-ACF robustly. We rigorously prove the theoretical properties of Huber-periodogram and justify the use of Fisher's test on Huber-periodogram for periodicity detection. To further refine the detected periods, we compute unbiased autocorrelation function based on Wiener-Khinchin theorem from Huber-periodogram for improved robustness and efficiency. Experiments on synthetic and real-world datasets show that our algorithm outperforms other popular ones for both single and multiple periodicity detection.
△ Less
Submitted 7 March, 2021; v1 submitted 21 February, 2020;
originally announced February 2020.
-
Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
Authors:
Ke-Xin He,
Yu-Han Shen,
Wei-Qiang Zhang
Abstract:
Sound event detection with weakly labeled data is considered as a problem of multi-instance learning. And the choice of pooling function is the key to solving this problem. In this paper, we proposed a hierarchical pooling structure to improve the performance of weakly labeled sound event detection system. Proposed pooling structure has made remarkable improvements on three types of pooling functi…
▽ More
Sound event detection with weakly labeled data is considered as a problem of multi-instance learning. And the choice of pooling function is the key to solving this problem. In this paper, we proposed a hierarchical pooling structure to improve the performance of weakly labeled sound event detection system. Proposed pooling structure has made remarkable improvements on three types of pooling function without adding any parameters. Moreover, our system has achieved competitive performance on Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge using hierarchical pooling structure.
△ Less
Submitted 27 November, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Through-the-Wall Imaging Exploiting 2.4GHz Commodity Wi-Fi
Authors:
Wei Zhong,
Kai He,
Lianlin Li
Abstract:
In this letter, we experimentally investigate a low-cost through-the-wall imaging exploiting Wi-Fi signals in an indoor environment from the perspective of holographic imaging. In our experiments, a pair of antennas in a synthetic aperture mode is used to acquire signals produced by commodity Wi-Fi devices and reflected from the scene in a synthetic aperture mode. The classical filtered back propa…
▽ More
In this letter, we experimentally investigate a low-cost through-the-wall imaging exploiting Wi-Fi signals in an indoor environment from the perspective of holographic imaging. In our experiments, a pair of antennas in a synthetic aperture mode is used to acquire signals produced by commodity Wi-Fi devices and reflected from the scene in a synthetic aperture mode. The classical filtered back propagation (FBP) algorithm is then employed to form the image based on these signals. We use an IEEE 802.11n wireless router working at 2.4GHz with bandwidth of 20MHz. Selected experimental results are provided to demonstrate the performance of the proposed Wi-Fi based imaging scheme.
△ Less
Submitted 9 March, 2019;
originally announced March 2019.
-
Undirected graphs: is the shift-enabled condition trivial or necessary?
Authors:
Liyan Chen,
Samuel Cheng,
Kanghang He,
Lina Stankovic,
Vladimir Stankovic
Abstract:
It has recently been shown that, contrary to the wide belief that a shift-enabled condition (necessary for any shift-invariant filter to be representable by a graph shift matrix) can be ignored because any non-shift-enabled matrix can be converted to a shift-enabled matrix, such a conversion in general may not hold for a directed graph with non-symmetric shift matrix. This letter extends this prio…
▽ More
It has recently been shown that, contrary to the wide belief that a shift-enabled condition (necessary for any shift-invariant filter to be representable by a graph shift matrix) can be ignored because any non-shift-enabled matrix can be converted to a shift-enabled matrix, such a conversion in general may not hold for a directed graph with non-symmetric shift matrix. This letter extends this prior work, focusing on undirected graphs where the shift matrix is generally symmetric. We show that while, in this case, the shift matrix can be converted to satisfy the original shift-enabled condition, the converted matrix is not associated with the original graph, that is, it does not capture anymore the structure of the graph signal. We show via a counterexample, that a non-shift-enabled matrix cannot be converted to a shift-enabled one and still maintain the topological structure of the underlying graph, which is necessary to facilitate localized signal processing.
△ Less
Submitted 22 May, 2019; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection
Authors:
Yu-Han Shen,
Ke-Xin He,
Wei-Qiang Zhang
Abstract:
In this paper, we propose a temporal-frequential attention model for sound event detection (SED). Our network learns how to listen with two attention models: a temporal attention model and a frequential attention model. Proposed system learns when to listen using the temporal attention model while it learns where to listen on the frequency axis using the frequential attention model. With these two…
▽ More
In this paper, we propose a temporal-frequential attention model for sound event detection (SED). Our network learns how to listen with two attention models: a temporal attention model and a frequential attention model. Proposed system learns when to listen using the temporal attention model while it learns where to listen on the frequency axis using the frequential attention model. With these two models, we attempt to make our system pay more attention to important frames or segments and important frequency components for sound event detection. Our proposed method is demonstrated on the task 2 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge and achieves competitive performance.
△ Less
Submitted 28 October, 2018;
originally announced October 2018.
-
SAM-GCNN: A Gated Convolutional Neural Network with Segment-Level Attention Mechanism for Home Activity Monitoring
Authors:
Yu-Han Shen,
Ke-Xin He,
Wei-Qiang Zhang
Abstract:
In this paper, we propose a method for home activity monitoring. We demonstrate our model on dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Challenge Task 5. This task aims to classify multi-channel audios into one of the provided pre-defined classes. All of these classes are daily activities performed in a home environment. To tackle this task, we propose a gat…
▽ More
In this paper, we propose a method for home activity monitoring. We demonstrate our model on dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Challenge Task 5. This task aims to classify multi-channel audios into one of the provided pre-defined classes. All of these classes are daily activities performed in a home environment. To tackle this task, we propose a gated convolutional neural network with segment-level attention mechanism (SAM-GCNN). The proposed framework is a convolutional model with two auxiliary modules: a gated convolutional neural network and a segment-level attention mechanism. Furthermore, we adopted model ensemble to enhance the capability of generalization of our model. We evaluated our work on the development dataset of DCASE 2018 Task 5 and achieved competitive performance, with a macro-averaged F-1 score increasing from 83.76% to 89.33%, compared with the convolutional baseline system.
△ Less
Submitted 14 November, 2018; v1 submitted 3 October, 2018;
originally announced October 2018.
-
The Third Evolution Equation for Optimal Control Computation
Authors:
Sheng Zhang,
Fei Liao,
Kai-Feng He
Abstract:
The Variation Evolving Method (VEM) that originates from the continuous-time dynamics stability theory seeks the optimal solutions with variation evolution principle. After establishing the first and the second evolution equations within its frame, the third evolution equation is developed. This equation only solves the control variables along the variation time to get the optimal solution, and it…
▽ More
The Variation Evolving Method (VEM) that originates from the continuous-time dynamics stability theory seeks the optimal solutions with variation evolution principle. After establishing the first and the second evolution equations within its frame, the third evolution equation is developed. This equation only solves the control variables along the variation time to get the optimal solution, and its definite conditions may be arbitrary since the equation can eliminate possible infeasibilities. With this equation, the dimension of the resulting Initial-value Problem (IVP), transformed via the semi-discrete method, is greatly reduced. Therefore it might relieve the computation burden in seeking solutions. Illustrative examples are solved and it is shown that the proposed equation may produce more precise numerical solutions than the second evolution equation, and its computation time may be shorter for the dense discretization.
△ Less
Submitted 11 February, 2018;
originally announced February 2018.
-
Computation of Optimal Control Problems with Terminal Constraint via Modified Evolution Partial Differential Equation
Authors:
Sheng Zhang,
Kai-Feng He,
Fei Liao
Abstract:
The Variation Evolving Method (VEM), which seeks the optimal solutions with the variation evolution principle, is further developed to be more flexible in solving the Optimal Control Problems (OCPs) with terminal constraint. With the first-order stable dynamics to eliminate the infeasibilities, the Modified Evolution Partial Differential Equation (MEPDE) that is valid in the infeasible solution do…
▽ More
The Variation Evolving Method (VEM), which seeks the optimal solutions with the variation evolution principle, is further developed to be more flexible in solving the Optimal Control Problems (OCPs) with terminal constraint. With the first-order stable dynamics to eliminate the infeasibilities, the Modified Evolution Partial Differential Equation (MEPDE) that is valid in the infeasible solution domain is proposed, and a Lyapunov functional is constructed to theoretically ensure its validity. In particular, it is proved that even with the infinite-time convergence dynamics, the violated terminal inequality constraints, which are inactive for the optimal solution, will enter the feasible domain in finite time. Through transforming the MEPDE to the finite-dimensional Initial-value Problem (IVP) with the semi-discrete method, the OCPs may be solved with common Ordinary Differential Equation (ODE) numerical integration methods. Illustrative examples are presented to show the effectiveness of the proposed method.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
Aircraft trajectory control with feedback linearization for general nonlinear system
Authors:
Sheng Zhang,
Fei Liao,
Yanqing Chen,
Kaifeng He
Abstract:
The feedback linearization method is further developed for the controller design on general nonlinear systems. Through the Lyapunov stability theory, the intractable nonlinear implicit algebraic control equations are effectively solved, and the asymptotically tracking performance is guaranteed. Moreover, it is proved that the controller may be used in an inverse-free version to the set-point contr…
▽ More
The feedback linearization method is further developed for the controller design on general nonlinear systems. Through the Lyapunov stability theory, the intractable nonlinear implicit algebraic control equations are effectively solved, and the asymptotically tracking performance is guaranteed. Moreover, it is proved that the controller may be used in an inverse-free version to the set-point control. With this method, a nonlinear aircraft outer-loop trajectory controller is developed. For the concern regarding the controller's robustness, the integral control technique is combined to counteract the adverse effect from modeling errors. Simulation results verify the well performance of the proposed controller.
△ Less
Submitted 28 December, 2017;
originally announced December 2017.
-
Variation Evolving for Optimal Control Computation, a Compact Way
Authors:
Sheng Zhang,
Jiang-Tao Huang,
Kai-Feng He,
Fei Liao
Abstract:
A compact version of the variation evolving method (VEM) is developed in the primal variable space for optimal control computation. Following the idea that originates from the Lyapunov continuous-time dynamics stability theory in the control field, the optimal solution is analogized to the stable equilibrium point of a dynamic system and obtained asymptotically through the variation motion. With t…
▽ More
A compact version of the variation evolving method (VEM) is developed in the primal variable space for optimal control computation. Following the idea that originates from the Lyapunov continuous-time dynamics stability theory in the control field, the optimal solution is analogized to the stable equilibrium point of a dynamic system and obtained asymptotically through the variation motion. With the introduction of a virtual dimension, namely the variation time, the evolution partial differential equation (EPDE), which seeks the optimal solution with a theoretical guarantee, is developed for the optimal control problem (OCP) with free terminal states, and the equivalent optimality conditions with no employment of costates are established in the primal space. These conditions show that the optimal feedback control law is generally not analytically available because the optimal control is related to the future states. Since the derived EPDE is suitable to be computed with the semi-discrete method in the field of PDE numerical calculation, the optimal solution may be obtained by solving the resulting finite-dimensional initial-value problem (IVP).
△ Less
Submitted 21 November, 2020; v1 submitted 5 September, 2017;
originally announced September 2017.
-
A Variation Evolving Method for Optimal Control
Authors:
Sheng Zhang,
En-Mi Yong,
Wei-Qi Qian,
Kai-Feng He
Abstract:
A new method for the optimal solutions is proposed. Originating from the continuous-time dynamics stability theory in the control field, the optimal solution is anticipated to be obtained in an asymptotically evolving way. By introducing a virtual dimension, the variation time, a dynamic system that describes the variation motion is deduced from the Optimal Control Problem (OCP), and the optimal s…
▽ More
A new method for the optimal solutions is proposed. Originating from the continuous-time dynamics stability theory in the control field, the optimal solution is anticipated to be obtained in an asymptotically evolving way. By introducing a virtual dimension, the variation time, a dynamic system that describes the variation motion is deduced from the Optimal Control Problem (OCP), and the optimal solution is its equilibrium point. Through this method, the intractable OCP is transformed to the Initial-value Problem (IVP) and it may be solved with mature Ordinary Differential Equation (ODE) numerical integration methods. Especially, the deduced dynamic system is globally stable, so any initial value will evolve to the extremal solution ultimately.
△ Less
Submitted 8 April, 2017; v1 submitted 29 March, 2017;
originally announced March 2017.