-
EGOR: Efficient Generated Objects Replay for incremental object detection
Authors:
Zijia An,
Boyu Diao,
Libo Huang,
Ruiqi Liu,
Zhulin An,
Yongjun Xu
Abstract:
Incremental object detection aims to simultaneously maintain old-class accuracy and detect emerging new-class objects in incremental data. Most existing distillation-based methods underperform when unlabeled old-class objects are absent in the incremental dataset. While the absence can be mitigated by generating old-class samples, it also incurs high computational costs. In this paper, we argue th…
▽ More
Incremental object detection aims to simultaneously maintain old-class accuracy and detect emerging new-class objects in incremental data. Most existing distillation-based methods underperform when unlabeled old-class objects are absent in the incremental dataset. While the absence can be mitigated by generating old-class samples, it also incurs high computational costs. In this paper, we argue that the extra computational cost stems from the inconsistency between the detector and the generative model, along with redundant generation. To overcome this problem, we propose Efficient Generated Object Replay (EGOR). Specifically, we generate old-class samples by inversing the original detectors, thus eliminating the necessity of training and storing additional generative models. We also propose augmented replay to reuse the objects in generated samples, thereby reducing the redundant generation. In addition, we propose high-response knowledge distillation focusing on the knowledge related to the old class, which transfers the knowledge in generated objects to the incremental detector. With the addition of the generated objects and losses, we observe a bias towards old classes in the detector. We balance the losses for old and new classes to alleviate the bias, thereby increasing the overall detection accuracy. Extensive experiments conducted on MS COCO 2017 demonstrate that our method can efficiently improve detection performance in the absence of old-class objects.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network
Authors:
RuiQi Liu,
Boyu Diao,
Libo Huang,
Zhulin An,
Yongjun Xu
Abstract:
Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample se…
▽ More
Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample selection, E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints, all while minimizing processing time. In E2Net, we propose Representative Network Distillation to identify the representative core subnet by assessing parameter quantity and output similarity with the working network, distilling analogous subnets within the working network to mitigate reliance on rehearsal buffers and facilitating knowledge transfer across previous tasks. To enhance storage resource utilization, we then propose Subnet Constraint Experience Replay to optimize rehearsal efficiency through a sample storage strategy based on the structures of representative networks. Extensive experiments conducted predominantly on cloud environments with diverse datasets and also spanning the edge environment demonstrate that E2Net consistently outperforms state-of-the-art methods. In addition, our method outperforms competitors in terms of both storage and computational requirements.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
CLIP-KD: An Empirical Study of CLIP Model Distillation
Authors:
Chuanguang Yang,
Zhulin An,
Libo Huang,
Junyu Bi,
Xinqiang Yu,
Han Yang,
Boyu Diao,
Yongjun Xu
Abstract:
Contrastive Language-Image Pre-training (CLIP) has become a promising language-supervised visual pre-training framework. This paper aims to distill small CLIP models supervised by a large teacher CLIP model. We propose several distillation strategies, including relation, feature, gradient and contrastive paradigms, to examine the effectiveness of CLIP-Knowledge Distillation (KD). We show that a si…
▽ More
Contrastive Language-Image Pre-training (CLIP) has become a promising language-supervised visual pre-training framework. This paper aims to distill small CLIP models supervised by a large teacher CLIP model. We propose several distillation strategies, including relation, feature, gradient and contrastive paradigms, to examine the effectiveness of CLIP-Knowledge Distillation (KD). We show that a simple feature mimicry with Mean Squared Error loss works surprisingly well. Moreover, interactive contrastive learning across teacher and student encoders is also effective in performance improvement. We explain that the success of CLIP-KD can be attributed to maximizing the feature similarity between teacher and student. The unified method is applied to distill several student models trained on CC3M+12M. CLIP-KD improves student CLIP models consistently over zero-shot ImageNet classification and cross-modal retrieval benchmarks. When using ViT-L/14 pretrained on Laion-400M as the teacher, CLIP-KD achieves 57.5\% and 55.4\% zero-shot top-1 ImageNet accuracy over ViT-B/16 and ResNet-50, surpassing the original CLIP without KD by 20.5\% and 20.1\% margins, respectively. Our code is released on https://github.com/winycg/CLIP-KD.
△ Less
Submitted 7 May, 2024; v1 submitted 24 July, 2023;
originally announced July 2023.
-
eTag: Class-Incremental Learning with Embedding Distillation and Task-Oriented Generation
Authors:
Libo Huang,
Yan Zeng,
Chuanguang Yang,
Zhulin An,
Boyu Diao,
Yongjun Xu
Abstract:
Class-Incremental Learning (CIL) aims to solve the neural networks' catastrophic forgetting problem, which refers to the fact that once the network updates on a new task, its performance on previously-learned tasks drops dramatically. Most successful CIL methods incrementally train a feature extractor with the aid of stored exemplars, or estimate the feature distribution with the stored prototypes…
▽ More
Class-Incremental Learning (CIL) aims to solve the neural networks' catastrophic forgetting problem, which refers to the fact that once the network updates on a new task, its performance on previously-learned tasks drops dramatically. Most successful CIL methods incrementally train a feature extractor with the aid of stored exemplars, or estimate the feature distribution with the stored prototypes. However, the stored exemplars would violate the data privacy concerns, while the stored prototypes might not reasonably be consistent with a proper feature distribution, hindering the exploration of real-world CIL applications. In this paper, we propose a method of \textit{e}mbedding distillation and \textit{Ta}sk-oriented \textit{g}eneration (\textit{eTag}) for CIL, which requires neither the exemplar nor the prototype. Instead, eTag achieves a data-free manner to train the neural networks incrementally. To prevent the feature extractor from forgetting, eTag distills the embeddings of the network's intermediate blocks. Additionally, eTag enables a generative network to produce suitable features, fitting the needs of the top incremental classifier. Experimental results confirmed that our proposed eTag considerably outperforms the state-of-the-art methods on CIFAR-100 and ImageNet-sub\footnote{Our code is available in the Supplementary Materials.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration
Authors:
LingFei Dai,
Boyu Diao,
Chao Li,
Yongjun Xu
Abstract:
Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of communication overhead. Gradient compression is an effective method to reduce communication overhead. In synchronization SGD compression methods, many Top-k sparsif…
▽ More
Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of communication overhead. Gradient compression is an effective method to reduce communication overhead. In synchronization SGD compression methods, many Top-k sparsification based gradient compression methods have been proposed to reduce the communication. However, the centralized method based on the parameter servers has the single point of failure problem and limited scalability, while the decentralized method with global parameter exchanging may reduce the convergence rate of training. In contrast with Top-$k$ based methods, we proposed a gradient compression method with globe gradient vector sketching, which uses the Count-Sketch structure to store the gradients to reduce the loss of the accuracy in the training process, named global-sketching SGD (gs-SGD). The gs-SGD has better convergence efficiency on deep learning models and a communication complexity of O($\log d*\log P$), where $d$ is the number of model parameters and P is the number of workers. We conducted experiments on GPU clusters to verify that our method has better convergence efficiency than global Top-$k$ and Sketching-based methods. In addition, gs-SGD achieves 1.3-3.1x higher throughput compared with gTop-$k$, and 1.1-1.2x higher throughput compared with original Sketched-SGD.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
A Channel-Aware Routing Protocol With Nearest Neighbor Regression For Underwater Sensor Networks
Authors:
Boyu Diao,
Chao Li,
Qi Wang,
Zhulin An,
Yongjun Xu
Abstract:
The underwater acoustic channel is one of the most challenging communication channels. Due to periodical tidal and daily climatic variation, underwater noise is periodically fluctuating, which result in the periodical changing of acoustic channel quality in long-term. Also, time-variant channel quality leads to routing failure. Routing protocols with acoustic channel estimation, namely underwater…
▽ More
The underwater acoustic channel is one of the most challenging communication channels. Due to periodical tidal and daily climatic variation, underwater noise is periodically fluctuating, which result in the periodical changing of acoustic channel quality in long-term. Also, time-variant channel quality leads to routing failure. Routing protocols with acoustic channel estimation, namely underwater channel-aware routing protocols are recently proposed to maintain the routing performance. However, channel estimation algorithms for these routing protocols are mostly linear and rarely consider periodicity of acoustic channels. In this paper, we introduce acoustic channel estimation based on nearest neighbor regression for underwater acoustic networks. We extend nearest neighbor regression for SNR (Signal-to-Noise Ratio) time series prediction, providing an outstanding prediction accuracy for intricately periodical and fluctuating received SNR time series. Moreover, we propose a quick search algorithm and use statistical storage compression to optimize the time and space complexity of the algorithm. In contrast with linear methods, this algorithm significantly improves channel prediction accuracy (over three times at most) on both simulation and sea trial data sets. With this channel estimation method, we then propose a Depth-Based Channel-Aware Routing protocol (DBCAR). Taking advantage of depth-greedy forwarding and channel-aware reliable communication, DBCAR has an outstanding network performance on packet delivery ratio, average energy consumption and average transmission delay which is validated through extensive simulations.
△ Less
Submitted 14 August, 2021; v1 submitted 11 August, 2021;
originally announced August 2021.
-
PFGDF: Pruning Filter via Gaussian Distribution Feature for Deep Neural Networks Acceleration
Authors:
Jianrong Xu,
Boyu Diao,
Bifeng Cui,
Kang Yang,
Chao Li,
Yongjun Xu
Abstract:
Deep learning has achieved impressive results in many areas, but the deployment of edge intelligent devices is still very slow. To solve this problem, we propose a novel compression and acceleration method based on data distribution characteristics for deep neural networks, namely Pruning Filter via Gaussian Distribution Feature (PFGDF). Compared with previous advanced pruning methods, PFGDF compr…
▽ More
Deep learning has achieved impressive results in many areas, but the deployment of edge intelligent devices is still very slow. To solve this problem, we propose a novel compression and acceleration method based on data distribution characteristics for deep neural networks, namely Pruning Filter via Gaussian Distribution Feature (PFGDF). Compared with previous advanced pruning methods, PFGDF compresses the model by filters with insignificance in distribution, regardless of the contribution and sensitivity information of the convolution filter. PFGDF is significantly different from weight sparsification pruning because it does not require the special accelerated library to process the sparse weight matrix and introduces no more extra parameters. The pruning process of PFGDF is automated. Furthermore, the model compressed by PFGDF can restore the same performance as the uncompressed model. We evaluate PFGDF through extensive experiments, on CIFAR-10, PFGDF compresses the convolution filter on VGG-16 by 66.62% with more than 90% parameter reduced, while the inference time is accelerated by 83.73% on Huawei MATE 10.
△ Less
Submitted 26 May, 2022; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Multi-Objective Pruning for CNNs Using Genetic Algorithm
Authors:
Chuanguang Yang,
Zhulin An,
Chao Li,
Boyu Diao,
Yongjun Xu
Abstract:
In this work, we propose a heuristic genetic algorithm (GA) for pruning convolutional neural networks (CNNs) according to the multi-objective trade-off among error, computation and sparsity. In our experiments, we apply our approach to prune pre-trained LeNet across the MNIST dataset, which reduces 95.42% parameter size and achieves 16$\times$ speedups of convolutional layer computation with tiny…
▽ More
In this work, we propose a heuristic genetic algorithm (GA) for pruning convolutional neural networks (CNNs) according to the multi-objective trade-off among error, computation and sparsity. In our experiments, we apply our approach to prune pre-trained LeNet across the MNIST dataset, which reduces 95.42% parameter size and achieves 16$\times$ speedups of convolutional layer computation with tiny accuracy loss by laying emphasis on sparsity and computation, respectively. Our empirical study suggests that GA is an alternative pruning approach for obtaining a competitive compression performance. Additionally, compared with state-of-the-art approaches, GA is capable of automatically pruning CNNs based on the multi-objective importance by a pre-defined fitness function.
△ Less
Submitted 4 July, 2019; v1 submitted 2 June, 2019;
originally announced June 2019.
-
Mean-Field Games for Marriage
Authors:
Dario Bauso,
Ben Mansour Dia,
Boualem Djehiche,
Hamidou Tembine,
Raul Tempone
Abstract:
This article examines mean-field games for marriage. The results support the argument that optimizing the long-term well-being through effort and social feeling state distribution (mean-field) will help to stabilize marriage. However, if the cost of effort is very high, the couple fluctuates in a bad feeling state or the marriage breaks down. We then examine the influence of society on a couple us…
▽ More
This article examines mean-field games for marriage. The results support the argument that optimizing the long-term well-being through effort and social feeling state distribution (mean-field) will help to stabilize marriage. However, if the cost of effort is very high, the couple fluctuates in a bad feeling state or the marriage breaks down. We then examine the influence of society on a couple using mean field sentimental games. We show that, in mean-field equilibrium, the optimal effort is always higher than the one-shot optimal effort. We illustrate numerically the influence of the couple's network on their feeling states and their well-being.
△ Less
Submitted 13 April, 2014;
originally announced April 2014.