Search | arXiv e-print repository

arXiv:2407.20187 [pdf, ps, other]

Eliminating Majority Illusion is Easy

Authors: Jack Dippel, Max Dupré la Tour, April Niu, Sanjukta Roy, Adrian Vetta

Abstract: Majority Illusion is a phenomenon in social networks wherein the decision by the majority of the network is not the same as one's personal social circle's majority, leading to an incorrect perception of the majority in a large network. In this paper, we present polynomial-time algorithms which can eliminate majority illusion in a network by altering as few connections as possible. Additionally, we… ▽ More Majority Illusion is a phenomenon in social networks wherein the decision by the majority of the network is not the same as one's personal social circle's majority, leading to an incorrect perception of the majority in a large network. In this paper, we present polynomial-time algorithms which can eliminate majority illusion in a network by altering as few connections as possible. Additionally, we prove that the more general problem of ensuring all neighbourhoods in the network are at least a $p$-fraction of the majority is NP-hard for most values of $p$. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2403.19150 [pdf, other]

Towards Understanding Dual BN In Hybrid Adversarial Training

Authors: Chenshuang Zhang, Chaoning Zhang, Kang Zhang, Axi Niu, Junmo Kim, In So Kweon

Abstract: There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial and clean samples are from two different domains, a common practice in prior works is to adopt Dual BN, where BN and BN are used for adversarial and clean branches… ▽ More There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial and clean samples are from two different domains, a common practice in prior works is to adopt Dual BN, where BN and BN are used for adversarial and clean branches, respectively. A popular belief for motivating Dual BN is that estimating normalization statistics of this mixture distribution is challenging and thus disentangling it for normalization achieves stronger robustness. In contrast to this belief, we reveal that disentangling statistics plays a less role than disentangling affine parameters in model training. This finding aligns with prior work (Rebuffi et al., 2023), and we build upon their research for further investigations. We demonstrate that the domain gap between adversarial and clean samples is not very large, which is counter-intuitive considering the significant influence of adversarial perturbation on the model accuracy. We further propose a two-task hypothesis which serves as the empirical foundation and a unified framework for Hybrid-AT improvement. We also investigate Dual BN in test-time and reveal that affine parameters characterize the robustness during inference. Overall, our work sheds new light on understanding the mechanism of Dual BN in Hybrid-AT and its underlying justification. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted at TMLR

arXiv:2312.14721 [pdf, other]

Gerrymandering Planar Graphs

Authors: Jack Dippel, Max Dupré la Tour, April Niu, Sanjukta Roy, Adrian Vetta

Abstract: We study the computational complexity of the map redistricting problem (gerrymandering). Mathematically, the electoral district designer (gerrymanderer) attempts to partition a weighted graph into $k$ connected components (districts) such that its candidate (party) wins as many districts as possible. Prior work has principally concerned the special cases where the graph is a path or a tree. Our fo… ▽ More We study the computational complexity of the map redistricting problem (gerrymandering). Mathematically, the electoral district designer (gerrymanderer) attempts to partition a weighted graph into $k$ connected components (districts) such that its candidate (party) wins as many districts as possible. Prior work has principally concerned the special cases where the graph is a path or a tree. Our focus concerns the realistic case where the graph is planar. We prove that the gerrymandering problem is solvable in polynomial time in $λ$-outerplanar graphs, when the number of candidates and $λ$ are constants and the vertex weights (voting weights) are polynomially bounded. In contrast, the problem is NP-complete in general planar graphs even with just two candidates. This motivates the study of approximation algorithms for gerrymandering planar graphs. However, when the number of candidates is large, we prove it is hard to distinguish between instances where the gerrymanderer cannot win a single district and instances where the gerrymanderer can win at least one district. This immediately implies that the redistricting problem is inapproximable in polynomial time in planar graphs, unless P=NP. This conclusion appears terminal for the design of good approximation algorithms -- but it is not. The inapproximability bound can be circumvented as it only applies when the maximum number of districts the gerrymanderer can win is extremely small, say one. Indeed, for a fixed number of candidates, our main result is that there is a constant factor approximation algorithm for redistricting unweighted planar graphs, provided the optimal value is a large enough constant. △ Less

Submitted 7 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2311.18508 [pdf, other]

DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

Authors: Axi Niu, Kang Zhang, Joshua Tian Jin Tee, Trung X. Pham, Jinqiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

Abstract: It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To… ▽ More It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To address this problem, we propose a simple but non-travel diffusion-style data augmentation scheme for current GAN-based SR methods, known as DifAugGAN. It involves adapting the diffusion process in generative diffusion models for improving the calibration of the discriminator during training motivated by the successes of data augmentation schemes in the field to achieve good calibration. Our DifAugGAN can be a Plug-and-Play strategy for current GAN-based SISR methods to improve the calibration of the discriminator and thus improve SR performance. Extensive experimental evaluations demonstrate the superiority of DifAugGAN over state-of-the-art GAN-based SISR methods across both synthetic and real-world datasets, showcasing notable advancements in both qualitative and quantitative results. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.02572 [pdf, other]

Multiple Object Tracking based on Occlusion-Aware Embedding Consistency Learning

Authors: Yaoqi Hu, Axi Niu, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang

Abstract: The Joint Detection and Embedding (JDE) framework has achieved remarkable progress for multiple object tracking. Existing methods often employ extracted embeddings to re-establish associations between new detections and previously disrupted tracks. However, the reliability of embeddings diminishes when the region of the occluded object frequently contains adjacent objects or clutters, especially i… ▽ More The Joint Detection and Embedding (JDE) framework has achieved remarkable progress for multiple object tracking. Existing methods often employ extracted embeddings to re-establish associations between new detections and previously disrupted tracks. However, the reliability of embeddings diminishes when the region of the occluded object frequently contains adjacent objects or clutters, especially in scenarios with severe occlusion. To alleviate this problem, we propose a novel multiple object tracking method based on visual embedding consistency, mainly including: 1) Occlusion Prediction Module (OPM) and 2) Occlusion-Aware Association Module (OAAM). The OPM predicts occlusion information for each true detection, facilitating the selection of valid samples for consistency learning of the track's visual embedding. The OAAM leverages occlusion cues and visual embeddings to generate two separate embeddings for each track, guaranteeing consistency in both unoccluded and occluded detections. By integrating these two modules, our method is capable of addressing track interruptions caused by occlusion in online tracking scenarios. Extensive experimental results demonstrate that our approach achieves promising performance levels in both unoccluded and occluded tracking scenarios. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2309.11711 [pdf, other]

MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Authors: Fei Pan, Xu Yin, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon

Abstract: Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint,… ▽ More Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint, we design a \textbf{Mo}tion-guided \textbf{D}omain \textbf{A}daptive semantic segmentation framework (MoDA). MoDA harnesses the self-supervised object motion cues to facilitate cross-domain alignment for segmentation task. First, we present an object discovery module to localize and segment target moving objects using object motion information. Then, we propose a semantic mining module that takes the object masks to refine the pseudo labels in the target domain. Subsequently, these high-quality pseudo labels are used in the self-training loop to bridge the cross-domain gap. On domain adaptive video and image segmentation experiments, MoDA shows the effectiveness utilizing object motion as guidance for domain alignment compared with optical flow information. Moreover, MoDA exhibits versatility as it can complement existing state-of-the-art UDA approaches. Code at https://github.com/feipanir/MoDA. △ Less

Submitted 15 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: CVPR 2024 Workshop on Learning with Limited Labelled Data for Image and Video Understanding. Best Paper Award

arXiv:2307.00781 [pdf, other]

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

Authors: Axi Niu, Pham Xuan Trung, Kang Zhang, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

Abstract: Diffusion models have gained significant popularity in the field of image-to-image translation. Previous efforts applying diffusion models to image super-resolution (SR) have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterat… ▽ More Diffusion models have gained significant popularity in the field of image-to-image translation. Previous efforts applying diffusion models to image super-resolution (SR) have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterative refinement process comes with the drawback of low inference speed, which strongly limits its applications. To speed up inference and further enhance the performance, our research revisits diffusion models in image super-resolution and proposes a straightforward yet significant diffusion model-based super-resolution method called ACDMSR (accelerated conditional diffusion model for image super-resolution). Specifically, our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process. Our study also highlights the effectiveness of using a pre-trained SR model to provide the conditional image of the given low-resolution (LR) image to achieve superior high-resolution results. We demonstrate that our method surpasses previous attempts in qualitative and quantitative results through extensive experiments conducted on benchmark datasets such as Set5, Set14, Urban100, BSD100, and Manga109. Moreover, our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.12831

arXiv:2305.18547 [pdf, other]

Learning from Multi-Perception Features for Real-Word Image Super-resolution

Authors: Axi Niu, Kang Zhang, Trung X. Pham, Pei Wang, Jinqiu Sun, In So Kweon, Yanning Zhang

Abstract: Currently, there are two popular approaches for addressing real-world image super-resolution problems: degradation-estimation-based and blind-based methods. However, degradation-estimation-based methods may be inaccurate in estimating the degradation, making them less applicable to real-world LR images. On the other hand, blind-based methods are often limited by their fixed single perception infor… ▽ More Currently, there are two popular approaches for addressing real-world image super-resolution problems: degradation-estimation-based and blind-based methods. However, degradation-estimation-based methods may be inaccurate in estimating the degradation, making them less applicable to real-world LR images. On the other hand, blind-based methods are often limited by their fixed single perception information, which hinders their ability to handle diverse perceptual characteristics. To overcome this limitation, we propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images. Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information and a series of newly-designed Cross-Perception Blocks (CPB) to combine this information for effective super-resolution reconstruction. Additionally, we introduce a contrastive regularization term (CR) that improves the model's learning capability by using newly generated HR and LR images as positive and negative samples for ground truth HR. Experimental results on challenging real-world SR datasets demonstrate that our approach significantly outperforms existing state-of-the-art methods in both qualitative and quantitative measures. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2302.14557 [pdf, other]

GRAN: Ghost Residual Attention Network for Single Image Super Resolution

Authors: Axi Niu, Pei Wang, Yu Zhu, Jinqiu Sun, Qingsen Yan, Yanning Zhang

Abstract: Recently, many works have designed wider and deeper networks to achieve higher image super-resolution performance. Despite their outstanding performance, they still suffer from high computational resources, preventing them from directly applying to embedded devices. To reduce the computation resources and maintain performance, we propose a novel Ghost Residual Attention Network (GRAN) for efficien… ▽ More Recently, many works have designed wider and deeper networks to achieve higher image super-resolution performance. Despite their outstanding performance, they still suffer from high computational resources, preventing them from directly applying to embedded devices. To reduce the computation resources and maintain performance, we propose a novel Ghost Residual Attention Network (GRAN) for efficient super-resolution. This paper introduces Ghost Residual Attention Block (GRAB) groups to overcome the drawbacks of the standard convolutional operation, i.e., redundancy of the intermediate feature. GRAB consists of the Ghost Module and Channel and Spatial Attention Module (CSAM) to alleviate the generation of redundant features. Specifically, Ghost Module can reveal information underlying intrinsic features by employing linear operations to replace the standard convolutions. Reducing redundant features by the Ghost Module, our model decreases memory and computing resource requirements in the network. The CSAM pays more comprehensive attention to where and what the feature extraction is, which is critical to recovering the image details. Experiments conducted on the benchmark datasets demonstrate the superior performance of our method in both qualitative and quantitative. Compared to the baseline models, we achieve higher performance with lower computational resources, whose parameters and FLOPs have decreased by more than ten times. △ Less

Submitted 1 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.12831 [pdf, other]

CDPMSR: Conditional Diffusion Probabilistic Models for Single Image Super-Resolution

Authors: Axi Niu, Kang Zhang, Trung X. Pham, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

Abstract: Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation to generate high-quality images. Prior attempts at applying the DPM to image super-resolution (SR) have shown that iteratively refining a pure Gaussian noise with a conditional image using a U-Net trained on denoising at various-level noises can help obtain a satisfied high-resolution image for the low-reso… ▽ More Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation to generate high-quality images. Prior attempts at applying the DPM to image super-resolution (SR) have shown that iteratively refining a pure Gaussian noise with a conditional image using a U-Net trained on denoising at various-level noises can help obtain a satisfied high-resolution image for the low-resolution one. To further improve the performance and simplify current DPM-based super-resolution methods, we propose a simple but non-trivial DPM-based super-resolution post-process framework,i.e., cDPMSR. After applying a pre-trained SR model on the to-be-test LR image to provide the conditional input, we adapt the standard DPM to conduct conditional image generation and perform super-resolution through a deterministic iterative denoising process. Our method surpasses prior attempts on both qualitative and quantitative results and can generate more photo-realistic counterparts for the low-resolution images with various benchmark datasets including Set5, Set14, Urban100, BSD100, and Manga109. Code will be published after accepted. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 4 pages, 4 figures

arXiv:2211.09861 [pdf, other]

Self-Supervised Visual Representation Learning via Residual Momentum

Authors: Trung X. Pham, Axi Niu, Zhang Kang, Sultan Rizky Madjid, Ji Woo Hong, Daehyeok Kim, Joshua Tian Jin Tee, Chang D. Yoo

Abstract: Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher), which hinder… ▽ More Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher), which hinders performance on downstream tasks. This paper is the first to investigate and identify this invisible gap as a bottleneck that has been overlooked in the existing SSL frameworks, potentially preventing the models from learning good representation. To solve this problem, we propose "residual momentum" to directly reduce this gap to encourage the student to learn the representation as close to that of the teacher as possible, narrow the performance gap with the teacher, and significantly improve the existing SSL. Our method is straightforward, easy to implement, and can be easily plugged into other SSL frameworks. Extensive experimental results on numerous benchmark datasets and diverse network architectures have demonstrated the effectiveness of our method over the state-of-the-art contrastive learning baselines. △ Less

Submitted 21 November, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 18 pages, 16 figures

arXiv:2208.05744 [pdf, other]

On the Pros and Cons of Momentum Encoder in Self-Supervised Visual Representation Learning

Authors: Trung Pham, Chaoning Zhang, Axi Niu, Kang Zhang, Chang D. Yoo

Abstract: Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well… ▽ More Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well understood. We find that its success can be at least partly attributed to the stability effect. In the first attempt, we analyze how EMA affects each part of the encoder and reveal that the portion near the encoder's input plays an insignificant role while the latter parts have much more influence. By monitoring the gradient of the overall loss with respect to the output of each block in the encoder, we observe that the final layers tend to fluctuate much more than other layers during backpropagation, i.e. less stability. Interestingly, we show that using EMA to the final part of the SSL encoder, i.e. projector, instead of the whole deep network encoder can give comparable or preferable performance. Our proposed projector-only momentum helps maintain the benefit of EMA but avoids the double forward computation. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Comments: 35 pages

arXiv:2207.10899 [pdf, other]

Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness

Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Axi Niu, Jiu Feng, Chang D. Yoo, In So Kweon

Abstract: Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields. Integrating AT into SSL, multiple prior works have accomplished a highly significant yet challenging task: learning robust representation without labels. A widely used framework is adversarial contrastive learning which couples AT… ▽ More Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields. Integrating AT into SSL, multiple prior works have accomplished a highly significant yet challenging task: learning robust representation without labels. A widely used framework is adversarial contrastive learning which couples AT and SSL, and thus constitute a very complex optimization problem. Inspired by the divide-and-conquer philosophy, we conjecture that it might be simplified as well as improved by solving two sub-problems: non-robust SSL and pseudo-supervised AT. This motivation shifts the focus of the task from seeking an optimal integrating strategy for a coupled problem to finding sub-solutions for sub-problems. With this said, this work discards prior practices of directly introducing AT to SSL frameworks and proposed a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL). Extensive experimental results demonstrate that our DeACL achieves SOTA self-supervised adversarial robustness while significantly reducing the training time, which validates its effectiveness and efficiency. Moreover, our DeACL constitutes a more explainable solution, and its success also bridges the gap with semi-supervised AT for exploiting unlabeled samples for robust representation learning. The code is publicly accessible at https://github.com/pantheon5100/DeACL. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022 oral presentation

arXiv:2204.06438 [pdf, other]

Fair Algorithm Design: Fair and Efficacious Machine Scheduling

Authors: April Niu, Agnes Totschnig, Adrian Vetta

Abstract: Motivated by a plethora of practical examples where bias is induced by automated-decision making algorithms, there has been strong recent interest in the design of fair algorithms. However, there is often a dichotomy between fairness and efficacy: fair algorithms may proffer low social welfare solutions whereas welfare optimizing algorithms may be very unfair. This issue is exemplified in the mach… ▽ More Motivated by a plethora of practical examples where bias is induced by automated-decision making algorithms, there has been strong recent interest in the design of fair algorithms. However, there is often a dichotomy between fairness and efficacy: fair algorithms may proffer low social welfare solutions whereas welfare optimizing algorithms may be very unfair. This issue is exemplified in the machine scheduling problem where, for $n$ jobs, the social welfare of any fair solution may be a factor $Ω(n)$ worse than the optimal welfare. In this paper, we prove that this dichotomy between fairness and efficacy can be overcome if we allow for a negligible amount of bias: there exist algorithms that are both "almost perfectly fair" and have a constant factor efficacy ratio, that is, are guaranteed to output solutions that have social welfare within a constant factor of optimal welfare. Specifically, for any $ε>0$, there exist mechanisms with efficacy ratio $Θ(\frac{1}ε)$ and where no agent is more than an $ε$ fraction worse off than they are in the fairest possible solution (given by an algorithm that does not use personal or type data). Moreover, these bicriteria guarantees are tight and apply to both the single machine case and the multiple machine case. The key to our results are the use of Pareto scheduling mechanisms. These mechanisms, by the judicious use of personal or type data, are able to exploit Pareto improvements that benefit every individual; such Pareto improvements would typically be forbidden by fair scheduling algorithms designed to satisfy standard statistical measures of group fairness. We anticipate this paradigm, the judicious use of personal data by a fair algorithm to greatly improve performance at the cost of negligible bias, has wider application. △ Less

Submitted 9 July, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

arXiv:2203.17248 [pdf, other]

Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

Authors: Chaoning Zhang, Kang Zhang, Trung X. Pham, Axi Niu, Zhinan Qiao, Chang D. Yoo, In So Kweon

Abstract: Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most pop… ▽ More Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most popular one is MoCo family. In essence, MoCo adopts a momentum-based queue dictionary, for which we perform a fine-grained analysis of its size and consistency. We point out that InfoNCE loss used in MoCo implicitly attract anchors to their corresponding positive sample with various strength of penalties and identify such inter-anchor hardness-awareness property as a major reason for the necessity of a large dictionary. Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. Based on an InfoNCE with the proposed dual temperature, our simplified frameworks, SimMoCo and SimCo, outperform MoCo v2 by a visible margin. Moreover, our work bridges the gap between CL and non-CL frameworks, contributing to a more unified understanding of these two mainstream frameworks in SSL. Code is available at: https://bit.ly/3LkQbaT. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR2022

arXiv:2202.05488 [pdf, other]

Fast Adversarial Training with Noise Augmentation: A Unified Perspective on RandStart and GradAlign

Authors: Axi Niu, Kang Zhang, Chaoning Zhang, Chenshuang Zhang, In So Kweon, Chang D. Yoo, Yanning Zhang

Abstract: PGD-based and FGSM-based are two popular adversarial training (AT) approaches for obtaining adversarially robust models. Compared with PGD-based AT, FGSM-based one is significantly faster but fails with catastrophic overfitting (CO). For mitigating CO in such Fast AT, there are two popular existing strategies: random start (RandStart) and Gradient Alignment (GradAlign). The former works only for a… ▽ More PGD-based and FGSM-based are two popular adversarial training (AT) approaches for obtaining adversarially robust models. Compared with PGD-based AT, FGSM-based one is significantly faster but fails with catastrophic overfitting (CO). For mitigating CO in such Fast AT, there are two popular existing strategies: random start (RandStart) and Gradient Alignment (GradAlign). The former works only for a relatively small perturbation 8/255 with the l_\infty constraint, and GradAlign improves it by extending the perturbation size to 16/255 (with the l_\infty constraint) but at the cost of being 3 to 4 times slower. How to avoid CO in Fast AT for a large perturbation size but without increasing the computation overhead remains as an unsolved issue, for which our work provides a frustratingly simple (yet effective) solution. Specifically, our solution lies in just noise augmentation (NoiseAug) which is a non-trivial byproduct of simplifying GradAlign. By simplifying GradAlign we have two findings: (i) aligning logit instead of gradient in GradAlign requires half the training time but achieves higher performance than GradAlign; (ii) the alignment operation can also be removed by only keeping noise augmentation (NoiseAug). Simplified from GradAlign, our NoiseAug has a surprising resemblance with RandStart except that we inject noise on the image instead of perturbation. To understand why injecting noise to input prevents CO, we verify that this is caused not by data augmentation effect (inject noise on image) but by improved local linearity. We provide an intuitive explanation for why NoiseAug improves local linearity without explicit regularization. Extensive results demonstrate that our NoiseAug achieves SOTA results in FGSM AT. The code will be released after accepted. △ Less

Submitted 5 October, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

arXiv:2101.06021 [pdf, other]

doi 10.1016/j.patcog.2021.108082

Non-uniform Motion Deblurring with Blurry Component Divided Guidance

Authors: Pei Wang, Wei Sun, Qingsen Yan, Axi Niu, Rui Li, Yu Zhu, Jinqiu Sun, Yanning Zhang

Abstract: Blind image deblurring is a fundamental and challenging computer vision problem, which aims to recover both the blur kernel and the latent sharp image from only a blurry observation. Despite the superiority of deep learning methods in image deblurring have displayed, there still exists major challenge with various non-uniform motion blur. Previous methods simply take all the image features as the… ▽ More Blind image deblurring is a fundamental and challenging computer vision problem, which aims to recover both the blur kernel and the latent sharp image from only a blurry observation. Despite the superiority of deep learning methods in image deblurring have displayed, there still exists major challenge with various non-uniform motion blur. Previous methods simply take all the image features as the input to the decoder, which handles different degrees (e.g. large blur, small blur) simultaneously, leading to challenges for sharp image generation. To tackle the above problems, we present a deep two-branch network to deal with blurry images via a component divided module, which divides an image into two components based on the representation of blurry degree. Specifically, two component attentive blocks are employed to learn attention maps to exploit useful deblurring feature representations on both large and small blurry regions. Then, the blur-aware features are fed into two-branch reconstruction decoders respectively. In addition, a new feature fusion mechanism, orientation-based feature fusion, is proposed to merge sharp features of the two branches. Both qualitative and quantitative experimental results show that our method performs favorably against the state-of-the-art approaches. △ Less

Submitted 15 January, 2021; originally announced January 2021.

Journal ref: Pattern Recognition,Volume 120, December 2021, 108082

Showing 1–17 of 17 results for author: Niu, A