Combing for Credentials: Active Pattern Extraction from Smart Reply

Jayaraman, Bargav; Ghosh, Esha; Chase, Melissa; Roy, Sambuddha; Dai, Wei; Evans, David

Computer Science > Cryptography and Security

arXiv:2207.10802 (cs)

[Submitted on 14 Jul 2022 (v1), last revised 2 Sep 2023 (this version, v3)]

Title:Combing for Credentials: Active Pattern Extraction from Smart Reply

Authors:Bargav Jayaraman, Esha Ghosh, Melissa Chase, Sambuddha Roy, Wei Dai, David Evans

View PDF

Abstract:Pre-trained large language models, such as GPT\nobreakdash-2 and BERT, are often fine-tuned to achieve state-of-the-art performance on a downstream task. One natural example is the ``Smart Reply'' application where a pre-trained model is tuned to provide suggested responses for a given query message. Since the tuning data is often sensitive data such as emails or chat transcripts, it is important to understand and mitigate the risk that the model leaks its tuning data. We investigate potential information leakage vulnerabilities in a typical Smart Reply pipeline. We consider a realistic setting where the adversary can only interact with the underlying model through a front-end interface that constrains what types of queries can be sent to the model. Previous attacks do not work in these settings, but require the ability to send unconstrained queries directly to the model. Even when there are no constraints on the queries, previous attacks typically require thousands, or even millions, of queries to extract useful information, while our attacks can extract sensitive data in just a handful of queries. We introduce a new type of active extraction attack that exploits canonical patterns in text containing sensitive data. We show experimentally that it is possible for an adversary to extract sensitive user information present in the training data, even in realistic settings where all interactions with the model must go through a front-end that limits the types of queries. We explore potential mitigation strategies and demonstrate empirically how differential privacy appears to be a reasonably effective defense mechanism to such pattern extraction attacks.

Subjects:	Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2207.10802 [cs.CR]
	(or arXiv:2207.10802v3 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2207.10802

Submission history

From: Bargav Jayaraman [view email]
[v1] Thu, 14 Jul 2022 05:03:56 UTC (359 KB)
[v2] Wed, 7 Sep 2022 22:10:50 UTC (554 KB)
[v3] Sat, 2 Sep 2023 22:33:09 UTC (951 KB)

Computer Science > Cryptography and Security

Title:Combing for Credentials: Active Pattern Extraction from Smart Reply

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Combing for Credentials: Active Pattern Extraction from Smart Reply

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators