Search | arXiv e-print repository

Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents

Authors: Avital Shafran, Roei Schuster, Vitaly Shmatikov

Abstract: Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with potentially untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``… ▽ More Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database, then generating an answer by applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with potentially untrusted content are vulnerable to a new class of denial-of-service attacks we call jamming. An adversary can add a single ``blocker'' document to the database that will be retrieved in response to a specific query and, furthermore, result in the RAG system not answering the query - ostensibly because it lacks the information or because the answer is unsafe. We describe and analyze several methods for generating blocker documents, including a new method based on black-box optimization that does not require the adversary to know the embedding or LLM used by the target RAG system, nor access to an auxiliary LLM to generate blocker documents. We measure the efficacy of the considered methods against several LLMs and embeddings, and demonstrate that the existing safety metrics for LLMs do not capture their vulnerability to jamming. We then discuss defenses against blocker documents. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2310.01959 [pdf, other]

Beyond Labeling Oracles: What does it mean to steal ML models?

Authors: Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot

Abstract: Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. Machine Learning (ML) models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extracti… ▽ More Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. Machine Learning (ML) models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extraction commonly claims or presumes that the attacker is able to save on both data acquisition and labeling costs. We thoroughly evaluate this assumption and find that the attacker often does not. This is because current attacks implicitly rely on the adversary being able to sample from the victim model's data distribution. We thoroughly research factors influencing the success of model extraction. We discover that prior knowledge of the attacker, i.e., access to in-distribution data, dominates other factors like the attack policy the adversary follows to choose which queries to make to the victim model API. Our findings urge the community to redefine the adversarial goals of ME attacks as current evaluation methods misinterpret the ME performance. △ Less

Submitted 13 June, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2102.07762 [pdf, other]

Membership Inference Attacks are Easier on Difficult Problems

Authors: Avital Shafran, Shmuel Peleg, Yedid Hoshen

Abstract: Membership inference attacks (MIA) try to detect if data samples were used to train a neural network model, e.g. to detect copyright abuses. We show that models with higher dimensional input and output are more vulnerable to MIA, and address in more detail models for image translation and semantic segmentation, including medical image segmentation. We show that reconstruction-errors can lead to ve… ▽ More Membership inference attacks (MIA) try to detect if data samples were used to train a neural network model, e.g. to detect copyright abuses. We show that models with higher dimensional input and output are more vulnerable to MIA, and address in more detail models for image translation and semantic segmentation, including medical image segmentation. We show that reconstruction-errors can lead to very effective MIA attacks as they are indicative of memorization. Unfortunately, reconstruction error alone is less effective at discriminating between non-predictable images used in training and easy to predict images that were never seen before. To overcome this, we propose using a novel predictability error that can be computed for each sample, and its computation does not require a training set. Our membership error, obtained by subtracting the predictability error from the reconstruction error, is shown to achieve high MIA accuracy on an extensive number of benchmarks. △ Less

Submitted 18 August, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

arXiv:1911.12322 [pdf, other]

Crypto-Oriented Neural Architecture Design

Authors: Avital Shafran, Gil Segev, Shmuel Peleg, Yedid Hoshen

Abstract: As neural networks revolutionize many applications, significant privacy conflicts between model users and providers emerge. The cryptography community developed a variety of techniques for secure computation to address such privacy issues. As generic techniques for secure computation are typically prohibitively ineffective, many efforts focus on optimizing their underlying cryptographic tools. Dif… ▽ More As neural networks revolutionize many applications, significant privacy conflicts between model users and providers emerge. The cryptography community developed a variety of techniques for secure computation to address such privacy issues. As generic techniques for secure computation are typically prohibitively ineffective, many efforts focus on optimizing their underlying cryptographic tools. Differently, we propose to optimize the initial design of crypto-oriented neural architectures and provide a novel Partial Activation layer. The proposed layer is much faster for secure computation. Evaluating our method on three state-of-the-art architectures (SqueezeNet, ShuffleNetV2, and MobileNetV2) demonstrates significant improvement to the efficiency of secure inference on common evaluation metrics. △ Less

Submitted 16 February, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: Full version (shorter version published in ICASSP'21)

Showing 1–4 of 4 results for author: Shafran, A