Search | arXiv e-print repository

Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

Authors: Wu Yebo, Li Li, Tian Chunlin, Chang Tao, Lin Chi, Wang Cong, Xu Cheng-Zhong

Abstract: In this paper, we propose SmartFreeze, a framework that effectively reduces the memory footprint by conducting the training in a progressive manner. Instead of updating the full model in each training round, SmartFreeze divides the shared model into blocks consisting of a specified number of layers. It first trains the front block with a well-designed output module, safely freezes it after converg… ▽ More In this paper, we propose SmartFreeze, a framework that effectively reduces the memory footprint by conducting the training in a progressive manner. Instead of updating the full model in each training round, SmartFreeze divides the shared model into blocks consisting of a specified number of layers. It first trains the front block with a well-designed output module, safely freezes it after convergence, and then triggers the training of the next one. This process iterates until the whole model has been successfully trained. In this way, the backward computation of the frozen blocks and the corresponding memory space for storing the intermediate outputs and gradients are effectively saved. Except for the progressive training framework, SmartFreeze consists of the following two core components: a pace controller and a participant selector. The pace controller is designed to effectively monitor the training progress of each block at runtime and safely freezes them after convergence while the participant selector selects the right devices to participate in the training for each block by jointly considering the memory capacity, the statistical and system heterogeneity. Extensive experiments are conducted to evaluate the effectiveness of SmartFreeze on both simulation and hardware testbeds. The results demonstrate that SmartFreeze effectively reduces average memory usage by up to 82%. Moreover, it simultaneously improves the model accuracy by up to 83.1% and accelerates the training process up to 2.02X. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: Published as a conference paper at IWQoS 2024

arXiv:2407.09047 [pdf, other]

Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation

Authors: Wei Cong, Yang Cong, Yuyang Liu, Gan Sun

Abstract: Incremental semantic segmentation endeavors to segment newly encountered classes while maintaining knowledge of old classes. However, existing methods either 1) lack guidance from class-specific knowledge (i.e., old class prototypes), leading to a bias towards new classes, or 2) constrain class-shared knowledge (i.e., old model weights) excessively without discrimination, resulting in a preference… ▽ More Incremental semantic segmentation endeavors to segment newly encountered classes while maintaining knowledge of old classes. However, existing methods either 1) lack guidance from class-specific knowledge (i.e., old class prototypes), leading to a bias towards new classes, or 2) constrain class-shared knowledge (i.e., old model weights) excessively without discrimination, resulting in a preference for old classes. In this paper, to trade off model performance, we propose the Class-specific and Class-shared Knowledge (Cs2K) guidance for incremental semantic segmentation. Specifically, from the class-specific knowledge aspect, we design a prototype-guided pseudo labeling that exploits feature proximity from prototypes to correct pseudo labels, thereby overcoming catastrophic forgetting. Meanwhile, we develop a prototype-guided class adaptation that aligns class distribution across datasets via learning old augmented prototypes. Moreover, from the class-shared knowledge aspect, we propose a weight-guided selective consolidation to strengthen old memory while maintaining new memory by integrating old and new model weights based on weight importance relative to old classes. Experiments on public datasets demonstrate that our proposed Cs2K significantly improves segmentation performance and is plug-and-play. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2403.20309 [pdf, other]

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

Authors: Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang

Abstract: While novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision, it relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM). For instance, the recently developed Gaussian Splatting depends heavily on the accuracy of SfM-derived points and poses. However, SfM processes are time-consuming and often prove unreliable in… ▽ More While novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision, it relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM). For instance, the recently developed Gaussian Splatting depends heavily on the accuracy of SfM-derived points and poses. However, SfM processes are time-consuming and often prove unreliable in sparse-view scenarios, where matched features are scarce, leading to accumulated errors and limited generalization capability across datasets. In this study, we introduce a novel and efficient framework to enhance robust NVS from sparse-view images. Our framework, InstantSplat, integrates multi-view stereo(MVS) predictions with point-based representations to construct 3D Gaussians of large-scale scenes from sparse-view data within seconds, addressing the aforementioned performance and efficiency issues by SfM. Specifically, InstantSplat generates densely populated surface points across all training views and determines the initial camera parameters using pixel-alignment. Nonetheless, the MVS points are not globally accurate, and the pixel-wise prediction from all views results in an excessive Gaussian number, yielding a overparameterized scene representation that compromises both training speed and accuracy. To address this issue, we employ a grid-based, confidence-aware Farthest Point Sampling to strategically position point primitives at representative locations in parallel. Next, we enhance pose accuracy and tune scene parameters through a gradient-based joint optimization framework from self-supervision. By employing this simplified framework, InstantSplat achieves a substantial reduction in training time, from hours to mere seconds, and demonstrates robust performance across various numbers of views in diverse datasets. △ Less

Submitted 20 August, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: Project Page: https://instantsplat.github.io/

arXiv:2403.11234 [pdf, other]

Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias

Authors: Wenyu Zhang, Qingmu Liu, Felix Ong Wei Cong, Mohamed Ragab, Chuan-Sheng Foo

Abstract: Domain adaptation is a critical task in machine learning that aims to improve model performance on a target domain by leveraging knowledge from a related source domain. In this work, we introduce Universal Semi-Supervised Domain Adaptation (UniSSDA), a practical yet challenging setting where the target domain is partially labeled, and the source and target label space may not strictly match. UniSS… ▽ More Domain adaptation is a critical task in machine learning that aims to improve model performance on a target domain by leveraging knowledge from a related source domain. In this work, we introduce Universal Semi-Supervised Domain Adaptation (UniSSDA), a practical yet challenging setting where the target domain is partially labeled, and the source and target label space may not strictly match. UniSSDA is at the intersection of Universal Domain Adaptation (UniDA) and Semi-Supervised Domain Adaptation (SSDA): the UniDA setting does not allow for fine-grained categorization of target private classes not represented in the source domain, while SSDA focuses on the restricted closed-set setting where source and target label spaces match exactly. Existing UniDA and SSDA methods are susceptible to common-class bias in UniSSDA settings, where models overfit to data distributions of classes common to both domains at the expense of private classes. We propose a new prior-guided pseudo-label refinement strategy to reduce the reinforcement of common-class bias due to pseudo-labeling, a common label propagation strategy in domain adaptation. We demonstrate the effectiveness of the proposed strategy on benchmark datasets Office-Home, DomainNet, and VisDA. The proposed strategy attains the best performance across UniSSDA adaptation settings and establishes a new baseline for UniSSDA. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2402.16387 [pdf, other]

On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method

Authors: Weilin Cong, Jian Kang, Hanghang Tong, Mehrdad Mahdavi

Abstract: Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time. Although TGL has recently seen notable progress in algorithmic solutions, its theoretical foundations remain largely unexplored. This paper aims at bridging this gap by investigating the generalization ability o… ▽ More Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications, especially in domains where data can be represented as a graph and evolves over time. Although TGL has recently seen notable progress in algorithmic solutions, its theoretical foundations remain largely unexplored. This paper aims at bridging this gap by investigating the generalization ability of different TGL algorithms (e.g., GNN-based, RNN-based, and memory-based methods) under the finite-wide over-parameterized regime. We establish the connection between the generalization error of TGL algorithms and "the number of layers/steps" in the GNN-/RNN-based TGL methods and "the feature-label alignment (FLA) score", where FLA can be used as a proxy for the expressive power and explains the performance of memory-based methods. Guided by our theoretical analysis, we propose Simplified-Temporal-Graph-Network, which enjoys a small generalization error, improved overall performance, and lower model complexity. Extensive experiments on real-world datasets demonstrate the effectiveness of our method. Our theoretical findings and proposed algorithm offer essential insights into TGL from a theoretical standpoint, laying the groundwork for the designing practical TGL algorithms in future studies. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.04572 [pdf]

Harnessing LSTM for Nonlinear Ship Deck Motion Prediction in UAV Autonomous Landing amidst High Sea States

Authors: Feifan Yu, Wenyuan Cong, Xinmin Chen, Yue Lin, Jiqiang Wang

Abstract: Autonomous landing of UAVs in high sea states requires the UAV to land exclusively during the ship deck's "rest period," coinciding with minimal movement. Given this scenario, determining the ship's "rest period" based on its movement patterns becomes a fundamental prerequisite for addressing this challenge. This study employs the Long Short-Term Memory (LSTM) neural network to predict the ship's… ▽ More Autonomous landing of UAVs in high sea states requires the UAV to land exclusively during the ship deck's "rest period," coinciding with minimal movement. Given this scenario, determining the ship's "rest period" based on its movement patterns becomes a fundamental prerequisite for addressing this challenge. This study employs the Long Short-Term Memory (LSTM) neural network to predict the ship's motion across three dimensions: longi-tudinal, transverse, and vertical waves. In the absence of actual ship data under high sea states, this paper employs a composite sine wave model to simulate ship deck motion. Through this approach, a highly accurate model is established, exhibiting promising outcomes within various stochastic sine wave combination models. △ Less

Submitted 15 November, 2023; originally announced December 2023.

Comments: 11 pages, 7 figures, accept by ICANDVC2023

arXiv:2312.02660 [pdf, other]

Uniswap Daily Transaction Indices by Network

Authors: Nir Chemaya, Lin William Cong, Emma Jorgensen, Dingyue Liu, Luyao Zhang

Abstract: DeFi is transforming financial services by removing intermediaries and producing a wealth of open-source data. This transformation is propelled by Layer 2 (L2) solutions, aimed at boosting network efficiency and scalability beyond current Layer 1 (L1) capabilities. This study addresses the lack of detailed L2 impact analysis by examining over 50 million transactions from Uniswap. Our dataset, feat… ▽ More DeFi is transforming financial services by removing intermediaries and producing a wealth of open-source data. This transformation is propelled by Layer 2 (L2) solutions, aimed at boosting network efficiency and scalability beyond current Layer 1 (L1) capabilities. This study addresses the lack of detailed L2 impact analysis by examining over 50 million transactions from Uniswap. Our dataset, featuring transactions from L1 and L2 across networks like Ethereum and Polygon, provides daily indices revealing adoption, scalability, and decentralization within the DeFi space. These indices help to elucidate the complex relationship between DeFi and L2 technologies, advancing our understanding of the ecosystem. The dataset is enhanced by an open-source Python framework for computing decentralization indices, adaptable for various research needs. This positions the dataset as a vital resource for machine learning endeavors, particularly deep learning, contributing significantly to the development of Blockchain as Web3's infrastructure. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2310.14541 [pdf, other]

Continual Named Entity Recognition without Catastrophic Forgetting

Authors: Duzhen Zhang, Wei Cong, Jiahua Dong, Yahan Yu, Xiuyi Chen, Yonggang Zhang, Zhen Fang

Abstract: Continual Named Entity Recognition (CNER) is a burgeoning area, which involves updating an existing model by incorporating new entity types sequentially. Nevertheless, continual learning approaches are often severely afflicted by catastrophic forgetting. This issue is intensified in CNER due to the consolidation of old entity types from previous steps into the non-entity type at each step, leading… ▽ More Continual Named Entity Recognition (CNER) is a burgeoning area, which involves updating an existing model by incorporating new entity types sequentially. Nevertheless, continual learning approaches are often severely afflicted by catastrophic forgetting. This issue is intensified in CNER due to the consolidation of old entity types from previous steps into the non-entity type at each step, leading to what is known as the semantic shift problem of the non-entity type. In this paper, we introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones, thereby more effectively mitigating the problem of catastrophic forgetting. Additionally, we develop a confidence-based pseudo-labeling for the non-entity type, \emph{i.e.,} predicting entity types using the old model to handle the semantic shift of the non-entity type. Following the pseudo-labeling process, we suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution. We carried out comprehensive experiments on ten CNER settings using three different datasets. The results illustrate that our method significantly outperforms prior state-of-the-art approaches, registering an average improvement of $6.3$\% and $8.0$\% in Micro and Macro F1 scores, respectively. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP2023 main conference as a long paper

arXiv:2310.06949 [pdf, other]

Diffusion Prior Regularized Iterative Reconstruction for Low-dose CT

Authors: Wenjun Xia, Yongyi Shi, Chuang Niu, Wenxiang Cong, Ge Wang

Abstract: Computed tomography (CT) involves a patient's exposure to ionizing radiation. To reduce the radiation dose, we can either lower the X-ray photon count or down-sample projection views. However, either of the ways often compromises image quality. To address this challenge, here we introduce an iterative reconstruction algorithm regularized by a diffusion prior. Drawing on the exceptional imaging pro… ▽ More Computed tomography (CT) involves a patient's exposure to ionizing radiation. To reduce the radiation dose, we can either lower the X-ray photon count or down-sample projection views. However, either of the ways often compromises image quality. To address this challenge, here we introduce an iterative reconstruction algorithm regularized by a diffusion prior. Drawing on the exceptional imaging prowess of the denoising diffusion probabilistic model (DDPM), we merge it with a reconstruction procedure that prioritizes data fidelity. This fusion capitalizes on the merits of both techniques, delivering exceptional reconstruction results in an unsupervised framework. To further enhance the efficiency of the reconstruction process, we incorporate the Nesterov momentum acceleration technique. This enhancement facilitates superior diffusion sampling in fewer steps. As demonstrated in our experiments, our method offers a potential pathway to high-definition CT image reconstruction with minimized radiation. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2308.11793 [pdf, other]

Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Authors: Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen, Mukund Varma, Yi Wang, Zhangyang Wang

Abstract: Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward… ▽ More Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward inference pipeline. While those feedforward "neuralized" architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a recent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to enhance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respectively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes, indicating remarkably better cross-scene generalization in both zero-shot and few-shot settings. Our codes are available at https://github.com/VITA-Group/GNT-MOVE. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.08793 [pdf, other]

Task Relation Distillation and Prototypical Pseudo Label for Incremental Named Entity Recognition

Authors: Duzhen Zhang, Hongliu Li, Wei Cong, Rongtao Xu, Jiahua Dong, Xiuyi Chen

Abstract: Incremental Named Entity Recognition (INER) involves the sequential learning of new entity types without accessing the training data of previously learned types. However, INER faces the challenge of catastrophic forgetting specific for incremental learning, further aggravated by background shift (i.e., old and future entity types are labeled as the non-entity type in the current task). To address… ▽ More Incremental Named Entity Recognition (INER) involves the sequential learning of new entity types without accessing the training data of previously learned types. However, INER faces the challenge of catastrophic forgetting specific for incremental learning, further aggravated by background shift (i.e., old and future entity types are labeled as the non-entity type in the current task). To address these challenges, we propose a method called task Relation Distillation and Prototypical pseudo label (RDP) for INER. Specifically, to tackle catastrophic forgetting, we introduce a task relation distillation scheme that serves two purposes: 1) ensuring inter-task semantic consistency across different incremental learning tasks by minimizing inter-task relation distillation loss, and 2) enhancing the model's prediction confidence by minimizing intra-task self-entropy loss. Simultaneously, to mitigate background shift, we develop a prototypical pseudo label strategy that distinguishes old entity types from the current non-entity type using the old model. This strategy generates high-quality pseudo labels by measuring the distances between token embeddings and type-wise prototypes. We conducted extensive experiments on ten INER settings of three benchmark datasets (i.e., CoNLL2003, I2B2, and OntoNotes5). The results demonstrate that our method achieves significant improvements over the previous state-of-the-art methods, with an average increase of 6.08% in Micro F1 score and 7.71% in Macro F1 score. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Accepted by CIKM2023 as a long paper with an oral presentation

arXiv:2308.00376 [pdf, other]

Deep Image Harmonization with Learnable Augmentation

Authors: Li Niu, Junyan Cao, Wenyan Cong, Liqing Zhang

Abstract: The goal of image harmonization is adjusting the foreground appearance in a composite image to make the whole image harmonious. To construct paired training images, existing datasets adopt different ways to adjust the illumination statistics of foregrounds of real images to produce synthetic composite images. However, different datasets have considerable domain gap and the performances on small-sc… ▽ More The goal of image harmonization is adjusting the foreground appearance in a composite image to make the whole image harmonious. To construct paired training images, existing datasets adopt different ways to adjust the illumination statistics of foregrounds of real images to produce synthetic composite images. However, different datasets have considerable domain gap and the performances on small-scale datasets are limited by insufficient training data. In this work, we explore learnable augmentation to enrich the illumination diversity of small-scale datasets for better harmonization performance. In particular, our designed SYthetic COmposite Network (SycoNet) takes in a real image with foreground mask and a random vector to learn suitable color transformation, which is applied to the foreground of this real image to produce a synthetic composite image. Comprehensive experiments demonstrate the effectiveness of our proposed learnable augmentation for image harmonization. The code of SycoNet is released at https://github.com/bcmi/SycoNet-Adaptive-Image-Harmonization. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023

arXiv:2307.10845 [pdf, other]

Self-paced Weight Consolidation for Continual Learning

Authors: Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong

Abstract: Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the numb… ▽ More Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.10822 [pdf, other]

Gradient-Semantic Compensation for Incremental Semantic Segmentation

Authors: Wei Cong, Yang Cong, Jiahua Dong, Gan Sun, Henghui Ding

Abstract: Incremental semantic segmentation aims to continually learn the segmentation of new coming classes without accessing the training data of previously learned classes. However, most current methods fail to address catastrophic forgetting and background shift since they 1) treat all previous classes equally without considering different forgetting paces caused by imbalanced gradient back-propagation;… ▽ More Incremental semantic segmentation aims to continually learn the segmentation of new coming classes without accessing the training data of previously learned classes. However, most current methods fail to address catastrophic forgetting and background shift since they 1) treat all previous classes equally without considering different forgetting paces caused by imbalanced gradient back-propagation; 2) lack strong semantic guidance between classes. To tackle the above challenges, in this paper, we propose a Gradient-Semantic Compensation (GSC) model, which surmounts incremental semantic segmentation from both gradient and semantic perspectives. Specifically, to address catastrophic forgetting from the gradient aspect, we develop a step-aware gradient compensation that can balance forgetting paces of previously seen classes via re-weighting gradient backpropagation. Meanwhile, we propose a soft-sharp semantic relation distillation to distill consistent inter-class semantic relations via soft labels for alleviating catastrophic forgetting from the semantic aspect. In addition, we develop a prototypical pseudo re-labeling that provides strong semantic guidance to mitigate background shift. It produces high-quality pseudo labels for old classes in the background by measuring distances between pixels and class-wise prototypes. Extensive experiments on three public datasets, i.e., Pascal VOC 2012, ADE20K, and Cityscapes, demonstrate the effectiveness of our proposed GSC model. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.10584 [pdf, other]

Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap

Authors: Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang

Abstract: Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's ``Water Lilies, Evening Effect''? We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based… ▽ More Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's ``Water Lilies, Evening Effect''? We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, to ``inpaint more wildly'' by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve. Project page: https://vita-group.github.io/RefPaint/ △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2306.04107 [pdf, other]

BeMap: Balanced Message Passing for Fair Graph Neural Network

Authors: Xiao Lin, Jian Kang, Weilin Cong, Hanghang Tong

Abstract: Fairness in graph neural networks has been actively studied recently. However, existing works often do not explicitly consider the role of message passing in introducing or amplifying the bias. In this paper, we first investigate the problem of bias amplification in message passing. We empirically and theoretically demonstrate that message passing could amplify the bias when the 1-hop neighbors fr… ▽ More Fairness in graph neural networks has been actively studied recently. However, existing works often do not explicitly consider the role of message passing in introducing or amplifying the bias. In this paper, we first investigate the problem of bias amplification in message passing. We empirically and theoretically demonstrate that message passing could amplify the bias when the 1-hop neighbors from different demographic groups are unbalanced. Guided by such analyses, we propose BeMap, a fair message passing method, that leverages a balance-aware sampling strategy to balance the number of the 1-hop neighbors of each node among different demographic groups. Extensive experiments on node classification demonstrate the efficacy of BeMap in mitigating bias while maintaining classification accuracy. The code is available at https://github.com/xiaolin-cs/BeMap. △ Less

Submitted 8 March, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: Accepted at the Second Learning on Graphs Conference (LoG 2023)

arXiv:2304.04620 [pdf, other]

Federated Incremental Semantic Segmentation

Authors: Jiahua Dong, Duzhen Zhang, Yang Cong, Wei Cong, Henghui Ding, Dengxin Dai

Abstract: Federated learning-based semantic segmentation (FSS) has drawn widespread attention via decentralized training on local clients. However, most FSS models assume categories are fixed in advance, thus heavily undergoing forgetting on old categories in practical applications where local clients receive new categories incrementally while have no memory storage to access old classes. Moreover, new clie… ▽ More Federated learning-based semantic segmentation (FSS) has drawn widespread attention via decentralized training on local clients. However, most FSS models assume categories are fixed in advance, thus heavily undergoing forgetting on old categories in practical applications where local clients receive new categories incrementally while have no memory storage to access old classes. Moreover, new clients collecting novel classes may join in the global training of FSS, which further exacerbates catastrophic forgetting. To surmount the above challenges, we propose a Forgetting-Balanced Learning (FBL) model to address heterogeneous forgetting on old classes from both intra-client and inter-client aspects. Specifically, under the guidance of pseudo labels generated via adaptive class-balanced pseudo labeling, we develop a forgetting-balanced semantic compensation loss and a forgetting-balanced relation consistency loss to rectify intra-client heterogeneous forgetting of old categories with background shift. It performs balanced gradient propagation and relation consistency distillation within local clients. Moreover, to tackle heterogeneous forgetting from inter-client aspect, we propose a task transition monitor. It can identify new classes under privacy protection and store the latest old global model for relation distillation. Qualitative experiments reveal large improvement of our model against comparison methods. The code is available at https://github.com/JiahuaDong/FISS. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted to CVPR2023

arXiv:2303.12861 [pdf, other]

Parallel Diffusion Model-based Sparse-view Cone-beam Breast CT

Authors: Wenjun Xia, Hsin Wu Tseng, Chuang Niu, Wenxiang Cong, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Srinivasan Vedantham, Ge Wang

Abstract: Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation d… ▽ More Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation dose without compromising image quality, according to the ALARA principle (as low as reasonably achievable). Over the past years, deep learning has shown remarkable successes in various tasks, including low-dose CT especially few-view CT. Currently, the diffusion model presents the state of the art for CT reconstruction. To develop the first diffusion model-based breast CT reconstruction method, here we report innovations to address the large memory requirement for breast cone-beam CT reconstruction and high computational cost of the diffusion model. Specifically, in this study we transform the cutting-edge Denoising Diffusion Probabilistic Model (DDPM) into a parallel framework for sub-volume-based sparse-view breast CT image reconstruction in projection and image domains. This novel approach involves the concurrent training of two distinct DDPM models dedicated to processing projection and image data synergistically in the dual domains. Our experimental findings reveal that this method delivers competitive reconstruction performance at half to one-third of the standard radiation doses. This advancement demonstrates an exciting potential of diffusion-type models for volumetric breast reconstruction at high-resolution with much-reduced radiation dose and as such hopefully redefines breast cancer screening and diagnosis. △ Less

Submitted 28 January, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2302.11636 [pdf, other]

Do We Really Need Complicated Model Architectures For Temporal Networks?

Authors: Weilin Cong, Si Zhang, Jian Kang, Baichuan Yuan, Hao Wu, Xin Zhou, Hanghang Tong, Mehrdad Mahdavi

Abstract: Recurrent neural network (RNN) and self-attention mechanism (SAM) are the de facto methods to extract spatial-temporal information for temporal graph learning. Interestingly, we found that although both RNN and SAM could lead to a good performance, in practice neither of them is always necessary. In this paper, we propose GraphMixer, a conceptually and technically simple architecture that consists… ▽ More Recurrent neural network (RNN) and self-attention mechanism (SAM) are the de facto methods to extract spatial-temporal information for temporal graph learning. Interestingly, we found that although both RNN and SAM could lead to a good performance, in practice neither of them is always necessary. In this paper, we propose GraphMixer, a conceptually and technically simple architecture that consists of three components: (1) a link-encoder that is only based on multi-layer perceptrons (MLP) to summarize the information from temporal links, (2) a node-encoder that is only based on neighbor mean-pooling to summarize node information, and (3) an MLP-based link classifier that performs link prediction based on the outputs of the encoders. Despite its simplicity, GraphMixer attains an outstanding performance on temporal link prediction benchmarks with faster convergence and better generalization performance. These results motivate us to rethink the importance of simpler model architecture. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.08990 [pdf, other]

Efficiently Forgetting What You Have Learned in Graph Representation Learning via Projection

Authors: Weilin Cong, Mehrdad Mahdavi

Abstract: As privacy protection receives much attention, unlearning the effect of a specific node from a pre-trained graph learning model has become equally important. However, due to the node dependency in the graph-structured data, representation unlearning in Graph Neural Networks (GNNs) is challenging and less well explored. In this paper, we fill in this gap by first studying the unlearning problem in… ▽ More As privacy protection receives much attention, unlearning the effect of a specific node from a pre-trained graph learning model has become equally important. However, due to the node dependency in the graph-structured data, representation unlearning in Graph Neural Networks (GNNs) is challenging and less well explored. In this paper, we fill in this gap by first studying the unlearning problem in linear-GNNs, and then introducing its extension to non-linear structures. Given a set of nodes to unlearn, we propose PROJECTOR that unlearns by projecting the weight parameters of the pre-trained model onto a subspace that is irrelevant to features of the nodes to be forgotten. PROJECTOR could overcome the challenges caused by node dependency and enjoys a perfect data removal, i.e., the unlearned model parameters do not contain any information about the unlearned node features which is guaranteed by algorithmic construction. Empirical results on real-world datasets illustrate the effectiveness and efficiency of PROJECTOR. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2211.10388 [pdf, other]

Patch-Based Denoising Diffusion Probabilistic Model for Sparse-View CT Reconstruction

Authors: Wenjun Xia, Wenxiang Cong, Ge Wang

Abstract: Sparse-view computed tomography (CT) can be used to reduce radiation dose greatly but is suffers from severe image artifacts. Recently, the deep learning based method for sparse-view CT reconstruction has attracted a major attention. However, neural networks often have a limited ability to remove the artifacts when they only work in the image domain. Deep learning-based sinogram processing can ach… ▽ More Sparse-view computed tomography (CT) can be used to reduce radiation dose greatly but is suffers from severe image artifacts. Recently, the deep learning based method for sparse-view CT reconstruction has attracted a major attention. However, neural networks often have a limited ability to remove the artifacts when they only work in the image domain. Deep learning-based sinogram processing can achieve a better anti-artifact performance, but it inevitably requires feature maps of the whole image in a video memory, which makes handling large-scale or three-dimensional (3D) images rather challenging. In this paper, we propose a patch-based denoising diffusion probabilistic model (DDPM) for sparse-view CT reconstruction. A DDPM network based on patches extracted from fully sampled projection data is trained and then used to inpaint down-sampled projection data. The network does not require paired full-sampled and down-sampled data, enabling unsupervised learning. Since the data processing is patch-based, the deep learning workflow can be distributed in parallel, overcoming the memory problem of large-scale data. Our experiments show that the proposed method can effectively suppress few-view artifacts while faithfully preserving textural details. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2210.13852 [pdf, other]

TabMixer: Excavating Label Distribution Learning with Small-scale Features

Authors: Weiyi Cong, Zhuoran Zheng, Xiuyi Jia

Abstract: Label distribution learning (LDL) differs from multi-label learning which aims at representing the polysemy of instances by transforming single-label values into descriptive degrees. Unfortunately, the feature space of the label distribution dataset is affected by human factors and the inductive bias of the feature extractor causing uncertainty in the feature space. Especially, for datasets with s… ▽ More Label distribution learning (LDL) differs from multi-label learning which aims at representing the polysemy of instances by transforming single-label values into descriptive degrees. Unfortunately, the feature space of the label distribution dataset is affected by human factors and the inductive bias of the feature extractor causing uncertainty in the feature space. Especially, for datasets with small-scale feature spaces (the feature space dimension $\approx$ the label space), the existing LDL algorithms do not perform well. To address this issue, we seek to model the uncertainty augmentation of the feature space to alleviate the problem in LDL tasks. Specifically, we start with augmenting each feature value in the feature vector of a sample into a vector (sampling on a Gaussian distribution function). Which, the variance parameter of the Gaussian distribution function is learned by using a sub-network, and the mean parameter is filled by this feature value. Then, each feature vector is augmented to a matrix which is fed into a mixer with local attention (\textit{TabMixer}) to extract the latent feature. Finally, the latent feature is squeezed to yield an accurate label distribution via a squeezed network. Extensive experiments verify that our proposed algorithm can be competitive compared to other LDL algorithms on several benchmarks. △ Less

Submitted 25 October, 2022; originally announced October 2022.

arXiv:2206.08401 [pdf, other]

Is decentralized finance actually decentralized? A social network analysis of the Aave protocol on the Ethereum blockchain

Authors: Ziqiao Ao, Lin William Cong, Gergely Horvath, Luyao Zhang

Abstract: Decentralized finance (DeFi) has the potential to disrupt centralized finance by validating peer-to-peer transactions through tamper-proof smart contracts, thus significantly lowering the transaction cost charged by financial intermediaries. However, the actual realization of peer-to-peer transactions and the levels and effects of decentralization are largely unknown. Our research pioneers a block… ▽ More Decentralized finance (DeFi) has the potential to disrupt centralized finance by validating peer-to-peer transactions through tamper-proof smart contracts, thus significantly lowering the transaction cost charged by financial intermediaries. However, the actual realization of peer-to-peer transactions and the levels and effects of decentralization are largely unknown. Our research pioneers a blockchain network study that applies social network analysis to measure the level, dynamics, and impacts of decentralization in DeFi token transactions on the Ethereum blockchain. First, we find a significant core-periphery structure in the AAVE token transaction network where the cores include the two largest centralized crypto exchanges. Second, we provide evidence that multiple network features consistently characterize decentralization dynamics. Finally, we document that a more decentralized network significantly predicts a higher return and lower volatility of the decentralized market of AAVE tokens on the Ethereum blockchain. We point out that our approach is seminal for inspiring future extensions related to the facets of application scenarios, research questions, and methodologies on the mechanics of blockchain decentralization. △ Less

Submitted 30 November, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted at 29th Annual Global Finance Conference featuring Professor Robert Engle, The 2003 Nobel Laureate in Economic Sciences

ACM Class: E.0; G.1; G.3; I.6; J.4; J.6

arXiv:2205.00687 [pdf, other]

Deep Video Harmonization with Color Mapping Consistency

Authors: Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, Liqing Zhang

Abstract: Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. So far, video harmonization has only received limited attention and there is no public dataset for video harmonization. In this work, we construct a new video harmonization dataset HYouTube by adjusting the foreground of real videos to create synthetic composite videos. Moreover, we co… ▽ More Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. So far, video harmonization has only received limited attention and there is no public dataset for video harmonization. In this work, we construct a new video harmonization dataset HYouTube by adjusting the foreground of real videos to create synthetic composite videos. Moreover, we consider the temporal consistency in video harmonization task. Unlike previous works which establish the spatial correspondence, we design a novel framework based on the assumption of color mapping consistency, which leverages the color mapping of neighboring frames to refine the current frame. Extensive experiments on our HYouTube dataset prove the effectiveness of our proposed framework. Our dataset and code are available at https://github.com/bcmi/Video-Harmonization-Dataset-HYouTube. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2111.10447 [pdf, other]

DyFormer: A Scalable Dynamic Graph Transformer with Provable Benefits on Generalization Ability

Authors: Weilin Cong, Yanhong Wu, Yuandong Tian, Mengting Gu, Yinglong Xia, Chun-cheng Jason Chen, Mehrdad Mahdavi

Abstract: Transformers have achieved great success in several domains, including Natural Language Processing and Computer Vision. However, its application to real-world graphs is less explored, mainly due to its high computation cost and its poor generalizability caused by the lack of enough training data in the graph domain. To fill in this gap, we propose a scalable Transformer-like dynamic graph learning… ▽ More Transformers have achieved great success in several domains, including Natural Language Processing and Computer Vision. However, its application to real-world graphs is less explored, mainly due to its high computation cost and its poor generalizability caused by the lack of enough training data in the graph domain. To fill in this gap, we propose a scalable Transformer-like dynamic graph learning method named Dynamic Graph Transformer (DyFormer) with spatial-temporal encoding to effectively learn graph topology and capture implicit links. To achieve efficient and scalable training, we propose temporal-union graph structure and its associated subgraph-based node sampling strategy. To improve the generalization ability, we introduce two complementary self-supervised pre-training tasks and show that jointly optimizing the two pre-training tasks results in a smaller Bayesian error rate via an information-theoretic analysis. Extensive experiments on the real-world datasets illustrate that DyFormer achieves a consistent 1%-3% AUC gain (averaged over all time steps) compared with baselines on all benchmarks. △ Less

Submitted 29 January, 2023; v1 submitted 19 November, 2021; originally announced November 2021.

arXiv:2111.08227 [pdf, other]

doi 10.1088/1361-6560/ac5b21

Phase function estimation from a diffuse optical image via deep learning

Authors: Yuxuan Liang, Chuang Niu, Chen Wei, Shenghan Ren, Wenxiang Cong, Ge Wang

Abstract: The phase function is a key element of a light propagation model for Monte Carlo (MC) simulation, which is usually fitted with an analytic function with associated parameters. In recent years, machine learning methods were reported to estimate the parameters of the phase function of a particular form such as the Henyey-Greenstein phase function but, to our knowledge, no studies have been performed… ▽ More The phase function is a key element of a light propagation model for Monte Carlo (MC) simulation, which is usually fitted with an analytic function with associated parameters. In recent years, machine learning methods were reported to estimate the parameters of the phase function of a particular form such as the Henyey-Greenstein phase function but, to our knowledge, no studies have been performed to determine the form of the phase function. Here we design a convolutional neural network to estimate the phase function from a diffuse optical image without any explicit assumption on the form of the phase function. Specifically, we use a Gaussian mixture model as an example to represent the phase function generally and learn the model parameters accurately. The Gaussian mixture model is selected because it provides the analytic expression of phase function to facilitate deflection angle sampling in MC simulation, and does not significantly increase the number of free parameters. Our proposed method is validated on MC-simulated reflectance images of typical biological tissues using the Henyey-Greenstein phase function with different anisotropy factors. The effects of field of view (FOV) and spatial resolution on the errors are analyzed to optimize the estimation method. The mean squared error of the phase function is 0.01 and the relative error of the anisotropy factor is 3.28%. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 16 pages, 8 figures

arXiv:2111.08202 [pdf, other]

Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks

Authors: Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Mahmut T. Kandemir, Anand Sivasubramaniam

Abstract: Despite the recent success of Graph Neural Networks (GNNs), training GNNs on large graphs remains challenging. The limited resource capacities of the existing servers, the dependency between nodes in a graph, and the privacy concern due to the centralized storage and model learning have spurred the need to design an effective distributed algorithm for GNN training. However, existing distributed GN… ▽ More Despite the recent success of Graph Neural Networks (GNNs), training GNNs on large graphs remains challenging. The limited resource capacities of the existing servers, the dependency between nodes in a graph, and the privacy concern due to the centralized storage and model learning have spurred the need to design an effective distributed algorithm for GNN training. However, existing distributed GNN training methods impose either excessive communication costs or large memory overheads that hinders their scalability. To overcome these issues, we propose a communication-efficient distributed GNN training technique named $\text{Learn Locally, Correct Globally}$ (LLCG). To reduce the communication and memory overhead, each local machine in LLCG first trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging. However, ignoring node dependency could result in significant performance degradation. To solve the performance degradation, we propose to apply $\text{Global Server Corrections}$ on the server to refine the locally learned models. We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error. However, this residual error can be eliminated by utilizing the proposed global corrections to entail fast convergence rate. Extensive experiments on real-world datasets show that LLCG can significantly improve the efficiency without hurting the performance. △ Less

Submitted 13 March, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: The Tenth International Conference on Learning Representations (ICLR 2022)

arXiv:2110.15174 [pdf, other]

On Provable Benefits of Depth in Training Graph Convolutional Networks

Authors: Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

Abstract: Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing. Despite the apparent consensus, we observe that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs. Specifically, we argue that over-smoothing does not necessaril… ▽ More Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing. Despite the apparent consensus, we observe that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs. Specifically, we argue that over-smoothing does not necessarily happen in practice, a deeper model is provably expressive, can converge to global optimum with linear convergence rate, and achieve very high training accuracy as long as properly trained. Despite being capable of achieving high training accuracy, empirical results show that the deeper models generalize poorly on the testing stage and existing theoretical understanding of such behavior remains elusive. To achieve better understanding, we carefully analyze the generalization capability of GCNs, and show that the training strategies to achieve high training accuracy significantly deteriorate the generalization capability of GCNs. Motivated by these findings, we propose a decoupled structure for GCNs that detaches weight matrices from feature propagation to preserve the expressive power and ensure good generalization performance. We conduct empirical evaluations on various synthetic and real-world datasets to validate the correctness of our theory. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2109.10009 [pdf]

An AI-assisted Economic Model of Endogenous Mobility and Infectious Diseases: The Case of COVID-19 in the United States

Authors: Lin William Cong, Ke Tang, Bing Wang, Jingyuan Wang

Abstract: We build a deep-learning-based SEIR-AIM model integrating the classical Susceptible-Exposed-Infectious-Removed epidemiology model with forecast modules of infection, community mobility, and unemployment. Through linking Google's multi-dimensional mobility index to economic activities, public health status, and mitigation policies, our AI-assisted model captures the populace's endogenous response t… ▽ More We build a deep-learning-based SEIR-AIM model integrating the classical Susceptible-Exposed-Infectious-Removed epidemiology model with forecast modules of infection, community mobility, and unemployment. Through linking Google's multi-dimensional mobility index to economic activities, public health status, and mitigation policies, our AI-assisted model captures the populace's endogenous response to economic incentives and health risks. In addition to being an effective predictive tool, our analyses reveal that the long-term effective reproduction number of COVID-19 equilibrates around one before mass vaccination using data from the United States. We identify a "policy frontier" and identify reopening schools and workplaces to be the most effective. We also quantify protestors' employment-value-equivalence of the Black Lives Matter movement and find that its public health impact to be negligible. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: Preprint, not peer reviewed

arXiv:2109.08809 [pdf, other]

HYouTube: Video Harmonization Dataset

Authors: Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, Liqing Zhang

Abstract: Video composition aims to generate a composite video by combining the foreground of one video with the background of another video, but the inserted foreground may be incompatible with the background in terms of color and illumination. Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. So far, video harmonization has only received limi… ▽ More Video composition aims to generate a composite video by combining the foreground of one video with the background of another video, but the inserted foreground may be incompatible with the background in terms of color and illumination. Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. So far, video harmonization has only received limited attention and there is no public dataset for video harmonization. In this work, we construct a new video harmonization dataset HYouTube by adjusting the foreground of real videos to create synthetic composite videos. Considering the domain gap between real composite videos and synthetic composite videos, we additionally create 100 real composite videos via copy-and-paste. Datasets are available at https://github.com/bcmi/Video-Harmonization-Dataset-HYouTube. △ Less

Submitted 17 September, 2021; originally announced September 2021.

arXiv:2109.06671 [pdf, other]

High-Resolution Image Harmonization via Collaborative Dual Transformations

Authors: Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang

Abstract: Given a composite image, image harmonization aims to adjust the foreground to make it compatible with the background. High-resolution image harmonization is in high demand, but still remains unexplored. Conventional image harmonization methods learn global RGB-to-RGB transformation which could effortlessly scale to high resolution, but ignore diverse local context. Recent deep learning methods lea… ▽ More Given a composite image, image harmonization aims to adjust the foreground to make it compatible with the background. High-resolution image harmonization is in high demand, but still remains unexplored. Conventional image harmonization methods learn global RGB-to-RGB transformation which could effortlessly scale to high resolution, but ignore diverse local context. Recent deep learning methods learn the dense pixel-to-pixel transformation which could generate harmonious outputs, but are highly constrained in low resolution. In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. Our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both. Extensive experiments on high-resolution benchmark dataset and our created high-resolution real composite images demonstrate that our CDTNet strikes a good balance between efficiency and effectiveness. Our used datasets can be found in https://github.com/bcmi/CDTNet-High-Resolution-Image-Harmonization. △ Less

Submitted 23 March, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: Accepted by CVPR2022

arXiv:2108.08999 [pdf]

doi 10.3905/jfds.2020.1.053

Deep Sequence Modeling: Development and Applications in Asset Pricing

Authors: Lin William Cong, Ke Tang, Jingyuan Wang, Yang Zhang

Abstract: We predict asset returns and measure risk premia using a prominent technique from artificial intelligence -- deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this paper, we first overview the d… ▽ More We predict asset returns and measure risk premia using a prominent technique from artificial intelligence -- deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this paper, we first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. We then perform a comparative analysis of these methods using data on U.S. equities. We demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence, and that Long- and Short-term Memory (LSTM) based models tend to have the best out-of-sample performance. △ Less

Submitted 20 August, 2021; originally announced August 2021.

arXiv:2106.14490 [pdf, other]

Making Images Real Again: A Comprehensive Survey on Deep Image Composition

Authors: Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang

Abstract: As a common image editing operation, image composition aims to combine the foreground from one image and another background image, resulting in a composite image. However, there are many issues that could make the composite images unrealistic. These issues can be summarized as the inconsistency between foreground and background, which includes appearance inconsistency (e.g., incompatible illuminat… ▽ More As a common image editing operation, image composition aims to combine the foreground from one image and another background image, resulting in a composite image. However, there are many issues that could make the composite images unrealistic. These issues can be summarized as the inconsistency between foreground and background, which includes appearance inconsistency (e.g., incompatible illumination), geometry inconsistency (e.g., unreasonable size), and semantic inconsistency (e.g., mismatched semantic context). Image composition task could be decomposed into multiple sub-tasks, in which each sub-task targets at one or more issues. Specifically, object placement aims to find reasonable scale, location, and shape for the foreground. Image blending aims to address the unnatural boundary between foreground and background. Image harmonization aims to adjust the illumination statistics of foreground. Shadow generation aims to generate plausible shadow for the foreground. These sub-tasks can be executed sequentially or parallelly to acquire realistic composite images. To the best of our knowledge, there is no previous survey on image composition. In this paper, we conduct comprehensive survey over the sub-tasks and combinatorial task of image composition. For each one, we summarize the existing methods, available datasets, and common evaluation metrics. Datasets and codes for image composition are summarized at https://github.com/bcmi/Awesome-Image-Composition. We have also contributed the first image composition toolbox: libcom https://github.com/bcmi/libcom, which assembles 10+ image composition related functions (e.g., image blending, image harmonization, object placement, shadow generation, generative composition). The ultimate goal of this toolbox is solving all the problems related to image composition with simple `import libcom'. △ Less

Submitted 22 April, 2024; v1 submitted 28 June, 2021; originally announced June 2021.

arXiv:2103.17104 [pdf, other]

Deep Image Harmonization by Bridging the Reality Gap

Authors: Junyan Cao, Wenyan Cong, Li Niu, Jianfu Zhang, Liqing Zhang

Abstract: Image harmonization has been significantly advanced with large-scale harmonization dataset. However, the current way to build dataset is still labor-intensive, which adversely affects the extendability of dataset. To address this problem, we propose to construct rendered harmonization dataset with fewer human efforts to augment the existing real-world dataset. To leverage both real-world images an… ▽ More Image harmonization has been significantly advanced with large-scale harmonization dataset. However, the current way to build dataset is still labor-intensive, which adversely affects the extendability of dataset. To address this problem, we propose to construct rendered harmonization dataset with fewer human efforts to augment the existing real-world dataset. To leverage both real-world images and rendered images, we propose a cross-domain harmonization network to bridge the domain gap between two domains. Moreover, we also employ well-designed style classifiers and losses to facilitate cross-domain knowledge transfer. Extensive experiments demonstrate the potential of using rendered images for image harmonization and the effectiveness of our proposed network. △ Less

Submitted 11 October, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

Comments: Accepted by BMVC2022

arXiv:2103.02696 [pdf, other]

On the Importance of Sampling in Training GCNs: Tighter Analysis and Variance Reduction

Authors: Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

Abstract: Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of semi-supervised node classification tasks. Despite their great success, training GCNs on large graphs suffers from computational and memory issues. A potential path to circumvent these obstacles is sampling-based methods, where at each layer a subset of nodes is sampled. Although recent stud… ▽ More Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of semi-supervised node classification tasks. Despite their great success, training GCNs on large graphs suffers from computational and memory issues. A potential path to circumvent these obstacles is sampling-based methods, where at each layer a subset of nodes is sampled. Although recent studies have empirically demonstrated the effectiveness of sampling-based methods, these works lack theoretical convergence guarantees under realistic settings and cannot fully leverage the information of evolving parameters during optimization. In this paper, we describe and analyze a general doubly variance reduction schema that can accelerate any sampling method under the memory budget. The motivating impetus for the proposed schema is a careful analysis of the variance of sampling methods where it is shown that the induced variance can be decomposed into node embedding approximation variance (zeroth-order variance) during forward propagation and layerwise-gradient variance (first-order variance) during backward propagation. We theoretically analyze the convergence of the proposed schema and show that it enjoys an $\mathcal{O}(1/T)$ convergence rate. We complement our theoretical results by integrating the proposed schema in different sampling methods and applying them to different large real-world graphs. △ Less

Submitted 1 November, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

arXiv:2011.14873 [pdf, other]

Deep Interactive Denoiser (DID) for X-Ray Computed Tomography

Authors: Ti Bai, Biling Wang, Dan Nguyen, Bao Wang, Bin Dong, Wenxiang Cong, Mannudeep K. Kalra, Steve Jiang

Abstract: Low dose computed tomography (LDCT) is desirable for both diagnostic imaging and image guided interventions. Denoisers are openly used to improve the quality of LDCT. Deep learning (DL)-based denoisers have shown state-of-the-art performance and are becoming one of the mainstream methods. However, there exists two challenges regarding the DL-based denoisers: 1) a trained model typically does not g… ▽ More Low dose computed tomography (LDCT) is desirable for both diagnostic imaging and image guided interventions. Denoisers are openly used to improve the quality of LDCT. Deep learning (DL)-based denoisers have shown state-of-the-art performance and are becoming one of the mainstream methods. However, there exists two challenges regarding the DL-based denoisers: 1) a trained model typically does not generate different image candidates with different noise-resolution tradeoffs which sometimes are needed for different clinical tasks; 2) the model generalizability might be an issue when the noise level in the testing images is different from that in the training dataset. To address these two challenges, in this work, we introduce a lightweight optimization process at the testing phase on top of any existing DL-based denoisers to generate multiple image candidates with different noise-resolution tradeoffs suitable for different clinical tasks in real-time. Consequently, our method allows the users to interact with the denoiser to efficiently review various image candidates and quickly pick up the desired one, and thereby was termed as deep interactive denoiser (DID). Experimental results demonstrated that DID can deliver multiple image candidates with different noise-resolution tradeoffs, and shows great generalizability regarding various network architectures, as well as training and testing datasets with various noise levels. △ Less

Submitted 6 December, 2020; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: under review

arXiv:2009.09169 [pdf, other]

BargainNet: Background-Guided Domain Translation for Image Harmonization

Authors: Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang

Abstract: Image composition is a fundamental operation in image editing field. However, unharmonious foreground and background downgrade the quality of composite image. Image harmonization, which adjusts the foreground to improve the consistency, is an essential yet challenging task. Previous deep learning based methods mainly focus on directly learning the mapping from composite image to real image, while… ▽ More Image composition is a fundamental operation in image editing field. However, unharmonious foreground and background downgrade the quality of composite image. Image harmonization, which adjusts the foreground to improve the consistency, is an essential yet challenging task. Previous deep learning based methods mainly focus on directly learning the mapping from composite image to real image, while ignoring the crucial guidance role that background plays. In this work, with the assumption that the foreground needs to be translated to the same domain as background, we formulate image harmonization task as background-guided domain translation. Therefore, we propose an image harmonization network with a novel domain code extractor and well-tailored triplet losses, which could capture the background domain information to guide the foreground harmonization. Extensive experiments on the existing image harmonization benchmark demonstrate the effectiveness of our proposed method. Code is available at https://github.com/bcmi/BargainNet. △ Less

Submitted 3 April, 2021; v1 submitted 19 September, 2020; originally announced September 2020.

Comments: Accepted by ICME2021 as Oral

arXiv:2008.01846 [pdf]

Stabilizing Deep Tomographic Reconstruction

Authors: Weiwen Wu, Dianlin Hu, Wenxiang Cong, Hongming Shan, Shaoyu Wang, Chuang Niu, Pingkun Yan, Hengyong Yu, Varut Vardhanabhuti, Ge Wang

Abstract: Tomographic image reconstruction with deep learning is an emerging field, but a recent landmark study reveals that several deep reconstruction networks are unstable for computed tomography (CT) and magnetic resonance imaging (MRI). Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missing in a deeply reconstructed image… ▽ More Tomographic image reconstruction with deep learning is an emerging field, but a recent landmark study reveals that several deep reconstruction networks are unstable for computed tomography (CT) and magnetic resonance imaging (MRI). Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missing in a deeply reconstructed image, and (3) decreased imaging performance with increased input data. On the other hand, compressed sensing (CS) inspired reconstruction methods do not suffer from these instabilities because of their built-in kernel awareness. For deep reconstruction to realize its full potential and become a mainstream approach for tomographic imaging, it is thus critically important to meet this challenge by stabilizing deep reconstruction networks. Here we propose an Analytic Compressed Iterative Deep (ACID) framework to address this challenge. ACID synergizes a deep reconstruction network trained on big data, kernel awareness from CS-inspired processing, and iterative refinement to minimize the data residual relative to real measurement. Our study demonstrates that the deep reconstruction using ACID is accurate and stable, and sheds light on the converging mechanism of the ACID iteration under a Bounded Relative Error Norm (BREN) condition. In particular, the study shows that ACID-based reconstruction is resilient against adversarial attacks, superior to classic sparsity-regularized reconstruction alone, and eliminates the three kinds of instabilities. We anticipate that this integrative data-driven approach will help promote development and translation of deep tomographic image reconstruction networks into clinical applications. △ Less

Submitted 13 September, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: 78 pages, 30 figures, 149 references

arXiv:2007.03882 [pdf, other]

Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Authors: Chuang Niu, Wenxiang Cong, Fenglei Fan, Hongming Shan, Mengzhou Li, Jimin Liang, Ge Wang

Abstract: Deep neural network based methods have achieved promising results for CT metal artifact reduction (MAR), most of which use many synthesized paired images for training. As synthesized metal artifacts in CT images may not accurately reflect the clinical counterparts, an artifact disentanglement network (ADN) was proposed with unpaired clinical images directly, producing promising results on clinical… ▽ More Deep neural network based methods have achieved promising results for CT metal artifact reduction (MAR), most of which use many synthesized paired images for training. As synthesized metal artifacts in CT images may not accurately reflect the clinical counterparts, an artifact disentanglement network (ADN) was proposed with unpaired clinical images directly, producing promising results on clinical datasets. However, without sufficient supervision, it is difficult for ADN to recover structural details of artifact-affected CT images based on adversarial losses only. To overcome these problems, here we propose a low-dimensional manifold (LDM) constrained disentanglement network (DN), leveraging the image characteristics that the patch manifold is generally low-dimensional. Specifically, we design an LDM-DN learning algorithm to empower the disentanglement network through optimizing the synergistic network loss functions while constraining the recovered images to be on a low-dimensional patch manifold. Moreover, learning from both paired and unpaired data, an efficient hybrid optimization scheme is proposed to further improve the MAR performance on clinical datasets. Extensive experiments demonstrate that the proposed LDM-DN approach can consistently improve the MAR performance in paired and/or unpaired learning settings, outperforming competing methods on synthesized and clinical datasets. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:2006.13866 [pdf, other]

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Authors: Weilin Cong, Rana Forsati, Mahmut Kandemir, Mehrdad Mahdavi

Abstract: Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pron… ▽ More Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods. △ Less

Submitted 5 September, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

arXiv:2005.07627 [pdf, other]

Blockchain Architecture forAuditing Automation and TrustBuilding in Public Markets

Authors: Sean Cao, Lin William Cong, Meng Han, Qixuan Hou, Baozhong Yang

Abstract: Business transactions by public firms are required to be reported, verified, and audited periodically, which is traditionally a labor-intensive and time-consuming process. To streamline this procedure, we design FutureAB (Future Auditing Blockchain) which aims to automate the reporting and auditing process, thereby allowing auditors to focus on discretionary accounts to better detect and prevent f… ▽ More Business transactions by public firms are required to be reported, verified, and audited periodically, which is traditionally a labor-intensive and time-consuming process. To streamline this procedure, we design FutureAB (Future Auditing Blockchain) which aims to automate the reporting and auditing process, thereby allowing auditors to focus on discretionary accounts to better detect and prevent fraud. We demonstrate how distributed-ledger technologies build investor trust and disrupt the auditing industry. Our multi-functional design indicates that auditing firms can automate transaction verification without the need for a trusted third party by collaborating and sharing their information while preserving data privacy (commitment scheme) and security (immutability). We also explore how smart contracts and wallets facilitate the computerization and implementation of our system on Ethereum. Finally, performance evaluation reveals the efficacy and scalability of FutureAB in terms of both encryption (0.012 seconds per transaction) and verification (0.001 seconds per transaction). △ Less

Submitted 15 May, 2020; originally announced May 2020.

arXiv:1912.04278 [pdf, other]

doi 10.1109/ACCESS.2020.3033795

Deep Efficient End-to-end Reconstruction (DEER) Network for Few-view Breast CT Image Reconstruction

Authors: Huidong Xie, Hongming Shan, Wenxiang Cong, Chi Liu, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Ge Wang

Abstract: Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of small calcification (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose, few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-… ▽ More Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of small calcification (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose, few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-end Reconstruction (DEER) network for few-view breast CT image reconstruction. The major merits of our network include high dose efficiency, excellent image quality, and low model complexity. By the design, the proposed network can learn the reconstruction process with as few as O(N) parameters, where N is the side length of an image to be reconstructed, which represents orders of magnitude improvements relative to the state-of-the-art deep-learning-based reconstruction methods that map raw data to tomographic images directly. Also, validated on a cone-beam breast CT dataset prepared by Koning Corporation on a commercial scanner, our method demonstrates a competitive performance over the state-of-the-art reconstruction networks in terms of image quality. The source code of this paper is available at: https://github.com/HuidongXie/DEER. △ Less

Submitted 3 November, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

arXiv:1911.13239 [pdf, other]

DoveNet: Deep Image Harmonization via Domain Verification

Authors: Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang

Abstract: Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, aiming to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality publicly available dataset for image harmonization greatly hinders the… ▽ More Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, aiming to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality publicly available dataset for image harmonization greatly hinders the development of image harmonization techniques. In this work, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on COCO (resp., Adobe5k, Flickr, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, HFlickr, Hday2night) sub-dataset. Moreover, we propose a new deep image harmonization method DoveNet using a novel domain verification discriminator, with the insight that the foreground needs to be translated to the same domain as background. Extensive experiments on our constructed dataset demonstrate the effectiveness of our proposed method. Our dataset and code are available at https://github.com/bcmi/Image_Harmonization_Datasets. △ Less

Submitted 30 October, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: Accepted by CVPR2020. arXiv admin note: text overlap with arXiv:1908.10526

arXiv:1909.11721 [pdf]

doi 10.1117/12.2530234

Deep-learning-based Breast CT for Radiation Dose Reduction

Authors: Wenxiang Cong, Hongming Shan, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Ge Wang

Abstract: Cone-beam breast computed tomography (CT) provides true 3D breast images with isotropic resolution and high-contrast information, detecting calcifications as small as a few hundred microns and revealing subtle tissue differences. However, breast is highly sensitive to x-ray radiation. It is critically important for healthcare to reduce radiation dose. Few-view cone-beam CT only uses a fraction of… ▽ More Cone-beam breast computed tomography (CT) provides true 3D breast images with isotropic resolution and high-contrast information, detecting calcifications as small as a few hundred microns and revealing subtle tissue differences. However, breast is highly sensitive to x-ray radiation. It is critically important for healthcare to reduce radiation dose. Few-view cone-beam CT only uses a fraction of x-ray projection data acquired by standard cone-beam breast CT, enabling significant reduction of the radiation dose. However, insufficient sampling data would cause severe streak artifacts in CT images reconstructed using conventional methods. In this study, we propose a deep-learning-based method to establish a residual neural network model for the image reconstruction, which is applied for few-view breast CT to produce high quality breast CT images. We respectively evaluate the deep-learning-based image reconstruction using one third and one quarter of x-ray projection views of the standard cone-beam breast CT. Based on clinical breast imaging dataset, we perform a supervised learning to train the neural network from few-view CT images to corresponding full-view CT images. Experimental results show that the deep learning-based image reconstruction method allows few-view breast CT to achieve a radiation dose <6 mGy per cone-beam CT scan, which is a threshold set by FDA for mammographic screening. △ Less

Submitted 25 September, 2019; originally announced September 2019.

Comments: 7 pages, 4 figures

arXiv:1908.10526 [pdf, other]

Image Harmonization Dataset iHarmony4: HCOCO, HAdobe5k, HFlickr, and Hday2night

Authors: Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang

Abstract: Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, which aims to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality public dataset for image harmonization, which significantly hinder… ▽ More Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades the quality of composite image. Image harmonization, which aims to make the foreground compatible with the background, is a promising yet challenging task. However, the lack of high-quality public dataset for image harmonization, which significantly hinders the development of image harmonization techniques. Therefore, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on existing COCO (resp., Adobe5k, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, Hday2night) sub-dataset. To enrich the diversity of our dataset, we also generate synthesized composite images based on our collected Flick images, leading to our HFlickr sub-dataset. The image harmonization dataset iHarmony4 is released at https://github.com/bcmi/Image_Harmonization_Datasets. △ Less

Submitted 20 March, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: Our full paper arXiv:1911.13239 "DoveNet: Deep Image Harmonization via Domain Verification" is accepted by CVPR2020

arXiv:1907.01262 [pdf, ps, other]

doi 10.1117/12.2531198

Dual Network Architecture for Few-view CT -- Trained on ImageNet Data and Transferred for Medical Imaging

Authors: Huidong Xie, Hongming Shan, Wenxiang Cong, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Ge Wang

Abstract: X-ray computed tomography (CT) reconstructs cross-sectional images from projection data. However, ionizing X-ray radiation associated with CT scanning might induce cancer and genetic damage. Therefore, the reduction of radiation dose has attracted major attention. Few-view CT image reconstruction is an important topic to reduce the radiation dose. Recently, data-driven algorithms have shown great… ▽ More X-ray computed tomography (CT) reconstructs cross-sectional images from projection data. However, ionizing X-ray radiation associated with CT scanning might induce cancer and genetic damage. Therefore, the reduction of radiation dose has attracted major attention. Few-view CT image reconstruction is an important topic to reduce the radiation dose. Recently, data-driven algorithms have shown great potential to solve the few-view CT problem. In this paper, we develop a dual network architecture (DNA) for reconstructing images directly from sinograms. In the proposed DNA method, a point-based fully-connected layer learns the backprojection process requesting significantly less memory than the prior arts do. Proposed method uses O(C*N*N_c) parameters where N and N_c denote the dimension of reconstructed images and number of projections respectively. C is an adjustable parameter that can be set as low as 1. Our experimental results demonstrate that DNA produces a competitive performance over the other state-of-the-art methods. Interestingly, natural images can be used to pre-train DNA to avoid overfitting when the amount of real patient images is limited. △ Less

Submitted 12 September, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: 11 pages, 5 figures, 2019 SPIE Optical Engineering + Applications

arXiv:1811.08075

Scene Graph Generation via Conditional Random Fields

Authors: Weilin Cong, William Wang, Wang-Chien Lee

Abstract: Despite the great success object detection and segmentation models have achieved in recognizing individual objects in images, performance on cognitive tasks such as image caption, semantic image retrieval, and visual QA is far from satisfactory. To achieve better performance on these cognitive tasks, merely recognizing individual object instances is insufficient. Instead, the interactions between… ▽ More Despite the great success object detection and segmentation models have achieved in recognizing individual objects in images, performance on cognitive tasks such as image caption, semantic image retrieval, and visual QA is far from satisfactory. To achieve better performance on these cognitive tasks, merely recognizing individual object instances is insufficient. Instead, the interactions between object instances need to be captured in order to facilitate reasoning and understanding of the visual scenes in an image. Scene graph, a graph representation of images that captures object instances and their relationships, offers a comprehensive understanding of an image. However, existing techniques on scene graph generation fail to distinguish subjects and objects in the visual scenes of images and thus do not perform well with real-world datasets where exist ambiguous object instances. In this work, we propose a novel scene graph generation model for predicting object instances and its corresponding relationships in an image. Our model, SG-CRF, learns the sequential order of subject and object in a relationship triplet, and the semantic compatibility of object instance nodes and relationship nodes in a scene graph efficiently. Experiments empirically show that SG-CRF outperforms the state-of-the-art methods, on three different datasets, i.e., CLEVR, VRD, and Visual Genome, raising the Recall@100 from 24.99% to 49.95%, from 41.92% to 50.47%, and from 54.69% to 54.77%, respectively. △ Less

Submitted 23 January, 2024; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: Need to withdraw this draft as requested by collaborators

arXiv:1808.04256 [pdf, other]

doi 10.1109/TMI.2019.2922960

CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble(GAN-CIRCLE)

Authors: Chenyu You, Guang Li, Yi Zhang, Xiaoliu Zhang, Hongming Shan, Shenghong Ju, Zhen Zhao, Zhuiyang Zhang, Wenxiang Cong, Michael W. Vannier, Punam K. Saha, Ge Wang

Abstract: Computed tomography (CT) is widely used in screening, diagnosis, and image-guided therapy for both clinical and research purposes. Since CT involves ionizing radiation, an overarching thrust of related technical research is development of novel methods enabling ultrahigh quality imaging with fine structural details while reducing the X-ray radiation. In this paper, we present a semi-supervised dee… ▽ More Computed tomography (CT) is widely used in screening, diagnosis, and image-guided therapy for both clinical and research purposes. Since CT involves ionizing radiation, an overarching thrust of related technical research is development of novel methods enabling ultrahigh quality imaging with fine structural details while reducing the X-ray radiation. In this paper, we present a semi-supervised deep learning approach to accurately recover high-resolution (HR) CT images from low-resolution (LR) counterparts. Specifically, with the generative adversarial network (GAN) as the building block, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised and deblurred HR outputs. We also include the joint constraints in the loss function to facilitate structural preservation. In this deep imaging process, we incorporate deep convolutional neural network (CNN), residual learning, and network in network techniques for feature extraction and restoration. In contrast to the current trend of increasing network depth and complexity to boost the CT imaging performance, which limit its real-world applications by imposing considerable computational and memory overheads, we apply a parallel $1\times1$ CNN to compress the output of the hidden layer and optimize the number of layers and the number of filters for each convolutional layer. Quantitative and qualitative evaluations demonstrate that our proposed model is accurate, efficient and robust for super-resolution (SR) image restoration from noisy LR input images. In particular, we validate our composite SR networks on three large-scale CT datasets, and obtain promising results as compared to the other state-of-the-art methods. △ Less

Submitted 6 September, 2018; v1 submitted 10 August, 2018; originally announced August 2018.

Report number: TMI-2019-0250

Journal ref: IEEE Transactions on Medical Imaging 2019

arXiv:1805.00587 [pdf, other]

doi 10.1109/ACCESS.2018.2858196

Structure-sensitive Multi-scale Deep Neural Network for Low-Dose CT Denoising

Authors: Chenyu You, Qingsong Yang, Hongming Shan, Lars Gjesteby, Guang Li, Shenghong Ju, Zhuiyang Zhang, Zhen Zhao, Yi Zhang, Wenxiang Cong, Ge Wang

Abstract: Computed tomography (CT) is a popular medical imaging modality in clinical applications. At the same time, the x-ray radiation dose associated with CT scans raises public concerns due to its potential risks to the patients. Over the past years, major efforts have been dedicated to the development of Low-Dose CT (LDCT) methods. However, the radiation dose reduction compromises the signal-to-noise r… ▽ More Computed tomography (CT) is a popular medical imaging modality in clinical applications. At the same time, the x-ray radiation dose associated with CT scans raises public concerns due to its potential risks to the patients. Over the past years, major efforts have been dedicated to the development of Low-Dose CT (LDCT) methods. However, the radiation dose reduction compromises the signal-to-noise ratio (SNR), leading to strong noise and artifacts that down-grade CT image quality. In this paper, we propose a novel 3D noise reduction method, called Structure-sensitive Multi-scale Generative Adversarial Net (SMGAN), to improve the LDCT image quality. Specifically, we incorporate three-dimensional (3D) volumetric information to improve the image quality. Also, different loss functions for training denoising models are investigated. Experiments show that the proposed method can effectively preserve structural and texture information from normal-dose CT (NDCT) images, and significantly suppress noise and artifacts. Qualitative visual assessments by three experienced radiologists demonstrate that the proposed method retrieves more detailed information, and outperforms competing methods. △ Less

Submitted 10 August, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

Comments: IEEE Access 2018

arXiv:1802.05656 [pdf, other]

doi 10.1109/TMI.2018.2832217

3D Convolutional Encoder-Decoder Network for Low-Dose CT via Transfer Learning from a 2D Trained Network

Authors: Hongming Shan, Yi Zhang, Qingsong Yang, Uwe Kruger, Mannudeep K. Kalra, Ling Sun, Wenxiang Cong, Ge Wang

Abstract: Low-dose computed tomography (CT) has attracted a major attention in the medical imaging field, since CT-associated x-ray radiation carries health risks for patients. The reduction of CT radiation dose, however, compromises the signal-to-noise ratio, and may compromise the image quality and the diagnostic performance. Recently, deep-learning-based algorithms have achieved promising results in low-… ▽ More Low-dose computed tomography (CT) has attracted a major attention in the medical imaging field, since CT-associated x-ray radiation carries health risks for patients. The reduction of CT radiation dose, however, compromises the signal-to-noise ratio, and may compromise the image quality and the diagnostic performance. Recently, deep-learning-based algorithms have achieved promising results in low-dose CT denoising, especially convolutional neural network (CNN) and generative adversarial network (GAN). This article introduces a Contracting Path-based Convolutional Encoder-decoder (CPCE) network in 2D and 3D configurations within the GAN framework for low-dose CT denoising. A novel feature of our approach is that an initial 3D CPCE denoising model can be directly obtained by extending a trained 2D CNN and then fine-tuned to incorporate 3D spatial information from adjacent slices. Based on the transfer learning from 2D to 3D, the 3D network converges faster and achieves a better denoising performance than that trained from scratch. By comparing the CPCE with recently published methods based on the simulated Mayo dataset and the real MGH dataset, we demonstrate that the 3D CPCE denoising model has a better performance, suppressing image noise and preserving subtle structures. △ Less

Submitted 29 April, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

Comments: To be published in the IEEE TMI

Journal ref: IEEE Transactions on Medical Imaging 37(6) (2018) 1522-1534

Showing 1–50 of 55 results for author: Cong, W