Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

Chyou, Chien Cheng; Su, Hung-Ting; Hsu, Winston H.

Computer Science > Machine Learning

arXiv:2308.03243 (cs)

[Submitted on 7 Aug 2023]

Title:Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

Authors:Chien Cheng Chyou, Hung-Ting Su, Winston H. Hsu

View PDF

Abstract:Adversarial robustness poses a critical challenge in the deployment of deep learning models for real-world applications. Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data, which is often impractical. Existing unsupervised adversarial detection methods identify whether the target model works properly, but they suffer from bad accuracies owing to the use of common cross-entropy training loss, which relies on unnecessary features and strengthens adversarial attacks. We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks. The detection rate (true positive rate) against all given white-box attacks is above 93.9% except for attacks without limits (DF($\infty$)), while the false positive rate is barely 2.5%. The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.

Comments:	AdvML in ICML 2023 code:this https URL
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2308.03243 [cs.LG]
	(or arXiv:2308.03243v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2308.03243

Submission history

From: Chien-Cheng Chyou [view email]
[v1] Mon, 7 Aug 2023 01:41:21 UTC (325 KB)

Computer Science > Machine Learning

Title:Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators