Avoiding catastrophic overfitting in fast adversarial training with adaptive similarity step size

Jie-Chao Zhao; Jin Ding; Yong-Zhi Sun; Ping Tan; Ji-En Ma; You-Tong Fang

doi:10.1371/journal.pone.0317023

Avoiding catastrophic overfitting in fast adversarial training with adaptive similarity step size

PLoS One. 2025 Jan 7;20(1):e0317023. doi: 10.1371/journal.pone.0317023. eCollection 2025.

Authors

Jie-Chao Zhao¹, Jin Ding^{1

2}, Yong-Zhi Sun¹, Ping Tan¹, Ji-En Ma³, You-Tong Fang³

Affiliations

¹ School of Automation and Electrical Engineering & Key Institute of Robotics of Zhejiang Province, Zhejiang University of Science and Technology, Hangzhou, China.
² State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China.
³ School of Electrical Engineering, Zhejiang University, Hangzhou, China.

PMID: 39774503
DOI: 10.1371/journal.pone.0317023

Abstract

Adversarial training has become a primary method for enhancing the robustness of deep learning models. In recent years, fast adversarial training methods have gained widespread attention due to their lower computational cost. However, since fast adversarial training uses single-step adversarial attacks instead of multi-step attacks, the generated adversarial examples lack diversity, making models prone to catastrophic overfitting and loss of robustness. Existing methods to prevent catastrophic overfitting have certain shortcomings, such as poor robustness due to insufficient strength of generated adversarial examples, and low accuracy caused by excessive total perturbation. To address these issues, this paper proposes a fast adversarial training method-fast adversarial training with adaptive similarity step size (ATSS). In this method, random noise is first added to the input clean samples, and the model then calculates the gradient for each input sample. The perturbation step size for each sample is determined based on the similarity between the input noise and the gradient direction. Finally, adversarial examples are generated based on the step size and gradient for adversarial training. We conduct various adversarial attack tests on ResNet18 and VGG19 models using the CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. The experimental results demonstrate that our method effectively avoids catastrophic overfitting. And compared to other fast adversarial training methods, ATSS achieves higher robustness accuracy and clean accuracy, with almost no additional training cost.

Copyright: © 2025 Zhao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Deep Learning*
Humans
Neural Networks, Computer