SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Zheng, Mengxin; Xue, Jiaqi; Chen, Xun; Jiang, Lei; Lou, Qian

Computer Science > Cryptography and Security

arXiv:2303.09079v1 (cs)

[Submitted on 16 Mar 2023 (this version), latest version 16 Jul 2024 (v3)]

Title:SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Authors:Mengxin Zheng, Jiaqi Xue, Xun Chen, Lei Jiang, Qian Lou

View PDF

Abstract:Self-supervised learning (SSL) is a commonly used approach to learning and encoding data representations. By using a pre-trained SSL image encoder and training a downstream classifier on top of it, impressive performance can be achieved on various tasks with very little labeled data. The increasing usage of SSL has led to an uptick in security research related to SSL encoders and the development of various Trojan attacks. The danger posed by Trojan attacks inserted in SSL encoders lies in their ability to operate covertly and spread widely among various users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This is because downstream tasks are not always known, dataset labels are not available, and even the original training dataset is not accessible during the SSL encoder Trojan detection. This paper presents an innovative technique called SSL-Cleanse that is designed to detect and mitigate backdoor attacks in SSL encoders. We evaluated SSL-Cleanse on various datasets using 300 models, achieving an average detection success rate of 83.7% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.24% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse.

Comments:	10 pages, 6 figures
Subjects:	Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2303.09079 [cs.CR]
	(or arXiv:2303.09079v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2303.09079

Submission history

From: Jiaqi Xue [view email]
[v1] Thu, 16 Mar 2023 04:45:06 UTC (1,707 KB)
[v2] Sun, 8 Oct 2023 01:17:14 UTC (9,382 KB)
[v3] Tue, 16 Jul 2024 23:07:24 UTC (727 KB)

Computer Science > Cryptography and Security

Title:SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators