Search | arXiv e-print repository

MNeRV: A Multilayer Neural Representation for Videos

Authors: Qingling Chang, Haohui Yu, Shuxuan Fu, Zhiqiang Zeng, Chuangquan Chen

Abstract: As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HN… ▽ More As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HNeRV, etc.). However, this small number of decoding layers can easily lead to the problem of redundant model parameters due to the large proportion of parameters in a single decoding layer, which greatly restricts the video regression ability of neural network models. In this paper, we propose a multilayer neural representation for videos (MNeRV) and design a new decoder M-Decoder and its matching encoder M-Encoder. MNeRV has more encoding and decoding layers, which effectively alleviates the problem of redundant model parameters caused by too few layers. In addition, we design MNeRV blocks to perform more uniform and effective parameter allocation between decoding layers. In the field of video regression reconstruction, we achieve better reconstruction quality (+4.06 PSNR) with fewer parameters. Finally, we showcase MNeRV performance in downstream tasks such as video restoration and video interpolation. The source code of MNeRV is available at https://github.com/Aaronbtb/MNeRV. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 14 pages, 12 figures, 8 table

arXiv:2403.20035 [pdf, other]

UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Authors: Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

Abstract: Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba,… ▽ More Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet . △ Less

Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2303.14357 [pdf, other]

Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation

Authors: Xiaoxiao He, Chaowei Tan, Bo Liu, Liping Si, Weiwu Yao, Liang Zhao, Di Liu, Qilong Zhangli, Qi Chang, Kang Li, Dimitris N. Metaxas

Abstract: Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative mo… ▽ More Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative model is subpar under limited supervision. On the other hand, large institutions have the resources to compile data repositories with high-resolution images and labels. Therefore, individual clients can utilize the knowledge acquired in the public data repositories to mitigate the shortage of private annotated images. In this paper, we propose a federated few-shot learning method with dual knowledge distillation. This method allows joint training with limited annotations across clients without jeopardizing privacy. The supervised learning of the proposed method extracts features from limited labeled data in each client, while the unsupervised data is used to distill both feature and response-based knowledge from a national data repository to further improve the accuracy of the collaborative model and reduce the communication cost. Extensive evaluations are conducted on 3D magnetic resonance knee images from a private clinical dataset. Our proposed method shows superior performance and less training time than other semi-supervised federated learning methods. Codes and additional visualization results are available at https://github.com/hexiaoxiao-cs/fedml-knee. △ Less

Submitted 17 April, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

arXiv:2303.12198 [pdf, other]

doi 10.1109/EMBC44109.2020.9176007

Autofluorescence Bronchoscopy Video Analysis for Lesion Frame Detection

Authors: Qi Chang, Rebecca Bascom, Jennifer Toth, Danish Ahmad, William E. Higgins

Abstract: Because of the significance of bronchial lesions as indicators of early lung cancer and squamous cell carcinoma, a critical need exists for early detection of bronchial lesions. Autofluorescence bronchoscopy (AFB) is a primary modality used for bronchial lesion detection, as it shows high sensitivity to suspicious lesions. The physician, however, must interactively browse a long video stream to lo… ▽ More Because of the significance of bronchial lesions as indicators of early lung cancer and squamous cell carcinoma, a critical need exists for early detection of bronchial lesions. Autofluorescence bronchoscopy (AFB) is a primary modality used for bronchial lesion detection, as it shows high sensitivity to suspicious lesions. The physician, however, must interactively browse a long video stream to locate lesions, making the search exceedingly tedious and error prone. Unfortunately, limited research has explored the use of automated AFB video analysis for efficient lesion detection. We propose a robust automatic AFB analysis approach that distinguishes informative and uninformative AFB video frames in a video. In addition, for the informative frames, we determine the frames containing potential lesions and delineate candidate lesion regions. Our approach draws upon a combination of computer-based image analysis, machine learning, and deep learning. Thus, the analysis of an AFB video stream becomes more tractable. Tests with patient AFB video indicate that $\ge$97\% of frames were correctly labeled as informative or uninformative. In addition, $\ge$97\% of lesion frames were correctly identified, with false positive and false negative rates $\le$3\%. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.11258 [pdf, other]

doi 10.1117/12.2579931

Bronchoscopic video synchronization for interactive multimodal inspection of bronchial lesions

Authors: Qi Chang, Patrick D. Byrnes, Danish Ahmad, Jennifer Toth, Rebecca Bascom, William E. Higgins

Abstract: With lung cancer being the most fatal cancer worldwide, it is important to detect the disease early. A potentially effective way of detecting early cancer lesions developing along the airway walls (epithelium) is bronchoscopy. To this end, developments in bronchoscopy offer three promising noninvasive modalities for imaging bronchial lesions: white-light bronchoscopy (WLB), autofluorescence bronch… ▽ More With lung cancer being the most fatal cancer worldwide, it is important to detect the disease early. A potentially effective way of detecting early cancer lesions developing along the airway walls (epithelium) is bronchoscopy. To this end, developments in bronchoscopy offer three promising noninvasive modalities for imaging bronchial lesions: white-light bronchoscopy (WLB), autofluorescence bronchoscopy (AFB), and narrow-band imaging (NBI). While these modalities give complementary views of the airway epithelium, the physician must manually inspect each video stream produced by a given modality to locate the suspect cancer lesions. Unfortunately, no effort has been made to rectify this situation by providing efficient quantitative and visual tools for analyzing these video streams. This makes the lesion search process extremely time-consuming and error-prone, thereby making it impractical to utilize these rich data sources effectively. We propose a framework for synchronizing multiple bronchoscopic videos to enable an interactive multimodal analysis of bronchial lesions. Our methods first register the video streams to a reference 3D chest computed-tomography (CT) scan to produce multimodal linkages to the airway tree. Our methods then temporally correlate the videos to one another to enable synchronous visualization of the resulting multimodal data set. Pictorial and quantitative results illustrate the potential of the methods. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2207.07759 [pdf, ps, other]

doi 10.1117/12.2647897

ESFPNet: efficient deep learning architecture for real-time lesion segmentation in autofluorescence bronchoscopic video

Authors: Qi Chang, Danish Ahmad, Jennifer Toth, Rebecca Bascom, William E. Higgins

Abstract: Lung cancer tends to be detected at an advanced stage, resulting in a high patient mortality rate. Thus, much recent research has focused on early disease detection Bronchoscopy is the procedure of choice for an effective noninvasive way of detecting early manifestations (bronchial lesions) of lung cancer. In particular, autofluorescence bronchoscopy (AFB) discriminates the autofluorescence proper… ▽ More Lung cancer tends to be detected at an advanced stage, resulting in a high patient mortality rate. Thus, much recent research has focused on early disease detection Bronchoscopy is the procedure of choice for an effective noninvasive way of detecting early manifestations (bronchial lesions) of lung cancer. In particular, autofluorescence bronchoscopy (AFB) discriminates the autofluorescence properties of normal (green) and diseased tissue (reddish brown) with different colors. Because recent studies show AFB's high sensitivity in searching lesions, it has become a potentially pivotal method in bronchoscopic airway exams. Unfortunately, manual inspection of AFB video is extremely tedious and error prone, while limited effort has been expended toward potentially more robust automatic AFB lesion analysis. We propose a real-time (processing throughput of 27 frames/sec) deep-learning architecture dubbed ESFPNet for accurate segmentation and robust detection of bronchial lesions in AFB video streams. The architecture features an encoder structure that exploits pretrained Mix Transformer (MiT) encoders and an efficient stage-wise feature pyramid (ESFP) decoder structure. Segmentation results from the AFB airway-exam videos of 20 lung cancer patients indicate that our approach gives a mean Dice index = 0.756 and an average Intersection of Union = 0.624, results that are superior to those generated by other recent architectures. Thus, ESFPNet gives the physician a potential tool for confident real-time lesion segmentation and detection during a live bronchoscopic airway exam. Moreover, our model shows promising potential applicability to other domains, as evidenced by its state-of-the-art (SOTA) performance on the CVC-ClinicDB, ETIS-LaribPolypDB datasets, and superior performance on the Kvasir, CVC-ColonDB datasets. △ Less

Submitted 8 December, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: SPIE 2023 drafts update

arXiv:2207.00492 [pdf]

Reinforcement Learning Based User-Guided Motion Planning for Human-Robot Collaboration

Authors: Tian Yu, Qing Chang

Abstract: Robots are good at performing repetitive tasks in modern manufacturing industries. However, robot motions are mostly planned and preprogrammed with a notable lack of adaptivity to task changes. Even for slightly changed tasks, the whole system must be reprogrammed by robotics experts. Therefore, it is highly desirable to have a flexible motion planning method, with which robots can adapt to specif… ▽ More Robots are good at performing repetitive tasks in modern manufacturing industries. However, robot motions are mostly planned and preprogrammed with a notable lack of adaptivity to task changes. Even for slightly changed tasks, the whole system must be reprogrammed by robotics experts. Therefore, it is highly desirable to have a flexible motion planning method, with which robots can adapt to specific task changes in unstructured environments, such as production systems or warehouses, with little or no intervention from non-expert personnel. In this paper, we propose a user-guided motion planning algorithm in combination with the reinforcement learning (RL) method to enable robots automatically generate their motion plans for new tasks by learning from a few kinesthetic human demonstrations. To achieve adaptive motion plans for a specific application environment, e.g., desk assembly or warehouse loading/unloading, a library is built by abstracting features of common human demonstrated tasks. The definition of semantical similarity between features in the library and features of a new task is proposed and further used to construct the reward function in RL. The RL policy can automatically generate motion plans for a new task if it determines that new task constraints can be satisfied with the current library and request additional human demonstrations. Multiple experiments conducted on common tasks and scenarios demonstrate that the proposed user-guided RL-assisted motion planning method is effective. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2206.11204 [pdf]

Paint shop vehicle sequencing based on quantum computing considering color changeover and painting quality

Authors: Jing Huang, Hua-Tzu Fan, Guoxian Xiao, Qing Chang

Abstract: As customer demands become increasingly diverse, the colors and styles of vehicles offered by automotive companies have also grown substantially. It poses great challenges to design and management of automotive manufacturing system, among which is the proper sequencing of vehicles in everyday operation of the paint shop. With typically hundreds of vehicles in one shift, the paint shop sequencing p… ▽ More As customer demands become increasingly diverse, the colors and styles of vehicles offered by automotive companies have also grown substantially. It poses great challenges to design and management of automotive manufacturing system, among which is the proper sequencing of vehicles in everyday operation of the paint shop. With typically hundreds of vehicles in one shift, the paint shop sequencing problem is intractable in classical computing. In this paper, we propose to solve a general paint shop sequencing problem using state-of-the-art quantum computing algorithms. Most existing works are solely focused on reducing color changeover costs, i.e., costs incurred by different colors between consecutive vehicles. This work reveals that different sequencing of vehicles also significantly affects the quality performance of the painting process. We use a machine learning model pretrained on historical data to predict the probability of painting defect. The problem is formulated as a combinational optimization problem with two cost components, i.e., color changeover cost and repair cost. The problem is further converted to a quantum optimization problem and solved with Quantum Approximation Optimization Algorithm (QAOA). As a matter of fact, current quantum computers are still limited in accuracy and scalability. However, with a simplified case study, we demonstrate how the classic sequencing problem in paint shop can be formulated and solved using quantum computing and demonstrate the potential of quantum computing in solving real problems in manufacturing systems. △ Less

Submitted 22 June, 2022; originally announced June 2022.

arXiv:2206.07649 [pdf, ps, other]

Atrial Fibrillation Detection Using Weight-Pruned, Log-Quantised Convolutional Neural Networks

Authors: Xiu Qi Chang, Ann Feng Chew, Benjamin Chen Ming Choong, Shuhui Wang, Rui Han, Wang He, Li Xiaolin, Rajesh C. Panicker, Deepu John

Abstract: Deep neural networks (DNN) are a promising tool in medical applications. However, the implementation of complex DNNs on battery-powered devices is challenging due to high energy costs for communication. In this work, a convolutional neural network model is developed for detecting atrial fibrillation from electrocardiogram (ECG) signals. The model demonstrates high performance despite being trained… ▽ More Deep neural networks (DNN) are a promising tool in medical applications. However, the implementation of complex DNNs on battery-powered devices is challenging due to high energy costs for communication. In this work, a convolutional neural network model is developed for detecting atrial fibrillation from electrocardiogram (ECG) signals. The model demonstrates high performance despite being trained on limited, variable-length input data. Weight pruning and logarithmic quantisation are combined to introduce sparsity and reduce model size, which can be exploited for reduced data movement and lower computational complexity. The final model achieved a 91.1% model compression ratio while maintaining high model accuracy of 91.7% and less than 1% loss. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2206.07163 [pdf, other]

DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method

Authors: Qi Chang, Zhennan Yan, Mu Zhou, Di Liu, Khalid Sawalha, Meng Ye, Qilong Zhangli, Mikael Kanski, Subhi Al Aref, Leon Axel, Dimitris Metaxas

Abstract: Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-… ▽ More Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies.The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion.Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: MICCAI2022

arXiv:2203.10726 [pdf, other]

TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Ligong Han, Xiaoxiao He, Zhaoyang Xia, Song Wen, Qi Chang, Zhennan Yan, Mu Zhou, Dimitris Metaxas

Abstract: Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im… ▽ More Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation. △ Less

Submitted 5 September, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2201.08955 [pdf, other]

doi 10.1109/EMBC48229.2022.9871529

Modality Bank: Learn multi-modality images across data centers without sharing medical data

Authors: Qi Chang, Hui Qu, Zhennan Yan, Yunhe Gao, Lohendran Baskaran, Dimitris Metaxas

Abstract: Multi-modality images have been widely used and provide comprehensive information for medical image analysis. However, acquiring all modalities among all institutes is costly and often impossible in clinical settings. To leverage more comprehensive multi-modality information, we propose a privacy secured decentralized multi-modality adaptive learning architecture named ModalityBank. Our method cou… ▽ More Multi-modality images have been widely used and provide comprehensive information for medical image analysis. However, acquiring all modalities among all institutes is costly and often impossible in clinical settings. To leverage more comprehensive multi-modality information, we propose a privacy secured decentralized multi-modality adaptive learning architecture named ModalityBank. Our method could learn a set of effective domain-specific modulation parameters plugged into a common domain-agnostic network. We demonstrate by switching different sets of configurations, the generator could output high-quality images for a specific modality. Our method could also complete the missing modalities across all data centers, thus could be used for modality completion purposes. The downstream task trained from the synthesized multi-modality samples could achieve higher performance than learning from one real data center and achieve close-to-real performance compare with all real images. △ Less

Submitted 21 January, 2022; originally announced January 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2012.08604

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022, pp. 4758-4763

arXiv:2007.09221 [pdf, other]

Learn distributed GAN with Temporary Discriminators

Authors: Hui Qu, Yikai Zhang, Qi Chang, Zhennan Yan, Chao Chen, Dimitris Metaxas

Abstract: In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We s… ▽ More In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We show our design of loss function indeed learns the correct distribution with provable guarantees. The empirical experiments show that our approach is capable of generating synthetic data which is practical for real-world applications such as training a segmentation model. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted by ECCV2020. Code: https://github.com/huiqu18/TDGAN-PyTorch

arXiv:2007.04140 [pdf, other]

Mastering the working sequence in human-robot collaborative assembly based on reinforcement learning

Authors: Tian Yu, Jing Huang, Qing Chang

Abstract: A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine lear… ▽ More A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine learning algorithms developed by Google Deep Mind like Alphago Zero, in this paper, the human-robot collaborative assembly working process is formatted into a chessboard and the selection of moves in the chessboard is used to analogize the decision making by both human and robot in the HRC assembly working process. To obtain the optimal policy of working sequence to maximize the working efficiency, the robot is trained with a self-play algorithm based on reinforcement learning, without guidance or domain knowledge beyond game rules. A neural network is also trained to predict the distribution of the priority of move selections and whether a working sequence is the one resulting in the maximum of the HRC efficiency. An adjustable desk assembly is used to demonstrate the proposed HRC assembly algorithm and its efficiency. △ Less

Submitted 8 July, 2020; originally announced July 2020.

Comments: 11 pages, 6 figures

arXiv:2006.00080 [pdf, other]

Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data

Authors: Qi Chang, Hui Qu, Yikai Zhang, Mert Sabuncu, Chao Chen, Tong Zhang, Dimitris Metaxas

Abstract: In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model.We validate the proposed framework on the applicat… ▽ More In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model.We validate the proposed framework on the application of health entities learning problem which is known to be privacy sensitive. Our experiments show that our approach: 1) could learn the real image's distribution from multiple datasets without sharing the patient's raw data. 2) is more efficient and requires lower bandwidth than other distributed deep learning methods. 3) achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets. 4) has provable guarantees that the generator could learn the distributed distribution in an all important fashion thus is unbiased. △ Less

Submitted 14 June, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13856-13866

arXiv:1904.10597 [pdf]

Autonomous Voltage Control for Grid Operation Using Deep Reinforcement Learning

Authors: Ruisheng Diao, Zhiwei Wang, Di Shi, Qianyun Chang, Jiajun Duan, Xiaohu Zhang

Abstract: Modern power grids are experiencing grand challenges caused by the stochastic and dynamic nature of growing renewable energy and demand response. Traditional theoretical assumptions and operational rules may be violated, which are difficult to be adapted by existing control systems due to the lack of computational power and accurate grid models for use in real time, leading to growing concerns in… ▽ More Modern power grids are experiencing grand challenges caused by the stochastic and dynamic nature of growing renewable energy and demand response. Traditional theoretical assumptions and operational rules may be violated, which are difficult to be adapted by existing control systems due to the lack of computational power and accurate grid models for use in real time, leading to growing concerns in the secure and economic operation of the power grid. Existing operational control actions are typically determined offline, which are less optimized. This paper presents a novel paradigm, Grid Mind, for autonomous grid operational controls using deep reinforcement learning. The proposed AI agent for voltage control can learn its control policy through interactions with massive offline simulations, and adapts its behavior to new changes including not only load/generation variations but also topological changes. A properly trained agent is tested on the IEEE 14-bus system with tens of thousands of scenarios, and promising performance is demonstrated in applying autonomous voltage controls for secure grid operation. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: To be published (Accepted) in: Proceedings of the Power and Energy Society General Meeting (PESGM), Atlanta, GA, 2019

arXiv:1802.06935 [pdf, other]

Non-Local Graph-Based Prediction For Reversible Data Hiding In Images

Authors: Qi Chang, Gene Cheung, Yao Zhao, Xiaolong Li, Rongrong Ni

Abstract: Reversible data hiding (RDH) is desirable in applications where both the hidden message and the cover medium need to be recovered without loss. Among many RDH approaches is prediction-error expansion (PEE), containing two steps: i) prediction of a target pixel value, and ii) embedding according to the value of prediction-error. In general, higher prediction performance leads to larger embedding ca… ▽ More Reversible data hiding (RDH) is desirable in applications where both the hidden message and the cover medium need to be recovered without loss. Among many RDH approaches is prediction-error expansion (PEE), containing two steps: i) prediction of a target pixel value, and ii) embedding according to the value of prediction-error. In general, higher prediction performance leads to larger embedding capacity and/or lower signal distortion. Leveraging on recent advances in graph signal processing (GSP), we pose pixel prediction as a graph-signal restoration problem, where the appropriate edge weights of the underlying graph are computed using a similar patch searched in a semi-local neighborhood. Specifically, for each candidate patch, we first examine eigenvalues of its structure tensor to estimate its local smoothness. If sufficiently smooth, we pose a maximum a posteriori (MAP) problem using either a quadratic Laplacian regularizer or a graph total variation (GTV) term as signal prior. While the MAP problem using the first prior has a closed-form solution, we design an efficient algorithm for the second prior using alternating direction method of multipliers (ADMM) with nested proximal gradient descent. Experimental results show that with better quality GSP-based prediction, at low capacity the visual quality of the embedded image exceeds state-of-the-art methods noticeably. △ Less

Submitted 19 February, 2018; originally announced February 2018.

Showing 1–17 of 17 results for author: Chang, Q