Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning
Abstract
A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the structures and parameters of the target model, which is not always viable in practice. To overcome the above shortcomings, this paper proposes a novel Distributional Black-Box Model Inversion (DBB-MI) attack by constructing the probabilistic latent space for searching the target privacy data. Specifically, DBB-MI does not need the target model parameters or specialized GAN training. Instead, it finds the latent probability distribution by combining the output of the target model with multi-agent reinforcement learning techniques. Then, it randomly chooses latent codes from the latent probability distribution for recovering the private data. As the latent probability distribution closely aligns with the target privacy data in latent space, the recovered data will leak the privacy of training samples of the target model significantly. Abundant experiments conducted on diverse datasets and networks show that the present DBB-MI has better performance than state-of-the-art in attack accuracy, K-nearest neighbor feature distance, and Peak Signal-to-Noise Ratio.
Index Terms:
Distributional model inversion attack, deep learning, multi-agent reinforcement learning, black-box attack.I Introduction
Artificial Intelligence (AI) technology is rapidly advancing and widely applied in diverse domains, including facial recognition [1], autonomous driving[2], smart homes[3], drone applications[4], etc. Although AI has undoubtedly brought substantial convenience to both work and life, it is vulnerable to various attacks, such as adversarial attacks [5, 6], data poisoning [7], [8] and Model Inversion (MI) attacks [9], [10]. Fredrikson et al. [9] demonstrated that MI attacks pose a significant risk of privacy leakage for machine learning (ML), as attackers can expose sensitive training data by only accessing the ML model itself.
Recently, GAN-based MI attacks have emerged as an attractive way to attack complex ML models. Zhang et al. [11] introduced the first GAN-based MI attack, shifting the focus from the algorithm-centered numerical reconstruction of sensitive data to an optimization problem through searching latent code from GAN’s latent space. They utilized GAN to extract prior knowledge from publicly available datasets and searched the latent space of the GAN to recreate privacy. GAN-based MI attacks always involve the following steps. Step 1: GAN training. It trains a GAN using the publicly available dataset that shares a similar distribution to the private dataset used by the target network. For example, if the private dataset includes facial images, the public dataset should also contain facial images. Step 2: latent code searching. It identifies the suitable latent code to generate images that could reveal private information when passed through the trained GAN.
Chen et al. [12] argued that previous GAN-based MI attacks are limited to one-to-one privacy recovery via the exploration of latent code. To achieve many-to-one privacy recovery, they introduced the distributional attack to reconstruct multiple privacy data instances that correspond to a single label. They proposed the Knowledge-Enriched Distributional Model Inversion (KED-MI), which initially generates pseudo-labels for a publicly available dataset using the target model. Subsequently, a GAN is trained to discriminate generated images as part of the loss function for further optimization. Finally, the trained GAN and optimized latent distribution based on the white-box setting are employed to attack the target and recreate confidential information. Yuan et al. [13] developed the Pseudo Label-Guided MI (PLG-MI) to enhance the training method for GAN. In the GAN training, they only used photos that have a greater level of confidence in certain classes. They exclusively utilized images with higher confidence for specific classes in GAN training, enhancing the information contained within the GAN to generate images with particular labels. This will improve the GAN’s ability to narrow down the search space. Although these distributional white-box MI attacks demonstrated satisfactory performance in step 1, they still have several limitations:
-
•
Requiring large-scale dataset. These attacks rely heavily on the discriminative ability of the GAN and leverage the target model to label the dataset, thus enhancing the GAN’s ability to differentiate. PLG-MI, in particular, needs to examine a large amount of datasets to ensure that there is enough data for each category to train GAN. This over-reliance on prior knowledge may lead to the misuse of the target model and also increase the difficulty of dataset collection.
-
•
Over-accessing the target model. KED-MI and PLG-MI assume that the attacker can freely access the parameters of the target model, enabling them to constrain the latent distribution and facilitate the identification of an appropriate latent distribution. Nevertheless, it is challenging to implement this strong assumption in real attacks.
-
•
Underexplored latent distribution. The latent distribution contains rich information that significantly enhances the efficacy of MI attacks. However, KED-MI and PLG-MI rely heavily on GAN, rather than latent distribution, for targeting. Thus, the under-searched latent distribution shall narrow down the attack performance of KED-MI and PLG-MI.
To overcome the above difficulties, we propose a novel Distributional Black-Box Model Inversion (DBB-MI) attack. A GAN is trained to assign labels to datasets using a randomly chosen dataset without annotation. In the context of black-box settings, the latent distribution is optimized to effectively tackle the issue of target model over-access by utilizing Multi-Agent Reinforcement Learning (MARL) techniques. This enhances the relevance of the attacks to real-life situations. This paper presents the primary contributions as follows:
-
•
We propose the Distributional Black-Box Model Inversion (DBB-MI) Attack, which is the first exploration of a distributional MI attack in black-box settings.
-
•
In black-box settings, DBB-MI leverages the Multi-Agent Reinforcement Learning (MARL) algorithm to thoroughly explore the appropriate latent distributions for specific categories, extracting latent privacy features in GAN.
-
•
Extensive experiments have demonstrated the superior attack performance of DBB-MI compared with state-of-the-art black-box MI attacks. For example, its highest success rate has experienced a notable boost of 33.6% on CelebA. Additionally, it also achieves a 100% attack success rate on MNIST.
The rest of this paper is arranged as follows. Section II introduces some related work. Section III gives the challenges of searching for latent distribution under the black-box setting and provides a detailed description of DBB-MI. Section IV presents and analyzes experimental results. Section V concludes this work.
II Related Work
This section introduces the background knowledge of Multi-Agent Reinforcement Learning (MARL) and Model Inversion (MI) Attacks.
II-A Multi-Agent Reinforcement Learning
Single-agent reinforcement learning employs the Markov decision process model, whereas multi-agent reinforcement learning (MARL) incorporates stochastic games. The joint actions formed by multiple agents have a significant impact on the transition and updating of the environmental state. Additionally, these actions play a crucial role in determining the reward feedback received by the agents, as depicted in Figure 1. The agents can be categorized into three groups based on their relationships: totally cooperative, fully competitive, and semi-cooperative semi-competitive.
In the totally cooperative MARL [14, 15], all agents are dedicated to jointly achieving a shared objective by maximizing the overall reward through collaboration, without considering their individual rewards. For example, each agent possesses its own local value function in the Value-Decomposition Networks (VDN) [15] algorithm. After each agent makes a decision, the local value is calculated and then aggregated to obtain the global value to achieve globally optimal choices.
In fully competitive MARL [16, 17], all agents see each other as competitors, and each agent only focuses on maximizing its utility, disregarding the impact of other agents. One illustrative instance is the Independent Q-Learning (IQL) algorithm [17], wherein agents independently engage in Q-learning. Although it may produce satisfactory outcomes in certain contexts, it often exhibits instability and difficulties in achieving convergence due to the influence of other actors on the surrounding environment. Therefore, it is typically only suitable for relatively simple scenarios.
In semi-cooperative semi-competitive MARL [18, 19], agents can obtain greater benefits through collaboration, while simultaneously experiencing potential gains or losses due to a certain level of competition. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [19] algorithm enables each agent not only to learn its policy knowledge but also to observe the behaviors of other agents during the learning process to improve their strategy further. These algorithms are more suitable for handling complex multi-agent tasks because they allow agents to simultaneously assess cooperative and competitive connections.
Through the above analysis, it is easy to get that different MARL approaches have different advantages and adapted environments. When utilizing MARL, it is crucial to choose the appropriate MARL according to the surroundings and the particular activity in order to accomplish the goals more effectively.
II-B Model Inversion Attacks
According to attack strategies, existing MI attacks can be divided into direct reconstruction-based and GAN-based.
Early direct reconstruction-based MI attacks predominantly concentrated on white-box MI attacks, in which an attacker can access all data related to the model, including architecture, parameters, and others. Fredrikson et al. [9] developed the first MI attack, targeting the information regression model by inputting specific features. However, its effectiveness diminishes when the feature space increases. Later, Fredrikson et al. [20] achieved the inversion of a face dataset with a larger feature space by minimizing the confidence loss. Although white-box MI attacks can disclose the privacy of the target model, they assume that an attacker can access anything about the target model, which is the opposite of reality. Therefore, some researchers focused on black-box MI attacks, in which an adversary only possesses the outputs or labels of the model rather than all the information. Yang et al. [21] assumed that the adversary has access to a vast database that far exceeds the training data of the target network. They employed this data as auxiliary information for an attack. Additionally, Salem et al. [22] attacked the newly added training data by comparing different outputs on the same data before and after the target model was updated. Zhang et al. [23] enhanced face reconstruction accuracy by fully exploiting predicted vectors. Nevertheless, direct reconstruction-based MI attacks can only recover grayscale images with substantial information loss on simple networks.
To attack deep networks, Zhang et al. [11] proposed the GAN-based MI attack, seeking potential private data within the latent space of GAN to gain the target network’s privacy. Currently, GAN-based MI attacks can be further divided into two subcategories: optimizing latent code and optimizing latent distribution. MI attacks based on optimizing latent code are essentially black-box attacks. An et al. [24] utilized genetic algorithms to implement GAN-based MI attacks, reconstructing high-fidelity private face images within deep networks. Han et al. [25] achieved impressive MI attacks by utilizing reinforcement learning algorithms to search for latent codes within the latent space of GANs. Zhu et al. [26] utilized the error rate of the target model to explore decision boundaries, reconstructing representative samples. Kahla et al. [27] proposed the Boundary-Repelling Model Inversion Attack (BERP-MI), which only uses GAN to generate images with a target label and extracts the latent space of images to gather sufficient data for estimating the gradient direction. Due to the limited private information in the latent code, GAN-based MI attacks that rely on optimizing latent code have challenges in accurately recovering results.
To obtain more private information, some researchers focus on MI attacks based on optimizing latent distribution, which are actually white-box attacks. Chen et al. [12] proposed the Knowledge-Enriched Distributional Model Inversion Attack (KED-MI), which leverages the target network to generate pseudo-labels for GAN training to boost the attack rates significantly. Yuan et al. [13] further refined the training methodology of GAN by selecting more representative data from public datasets, thereby enhancing the capability of GAN and narrowing the search space within the latent space. Moreover, meticulously chosen datasets contain a greater abundance of privacy features, consequently further improving the attack performance.
Since these MI attacks based on optimizing latent distribution are all white-box attacks, their requirements on the dataset and the assumption of using the target model to label the dataset are still too unrealistic. In this paper, we attempt to develop a distributional black-box MI attack that does not require a super attacker and the elaborate training of GANs.
III The Proposed Approach: DBB-MI
In this section, we introduce the challenges of searching for latent distribution under the black-box setting and provide a detailed description of DBB-MI.
III-A Problem Formulation
III-A1 Attack model
This work primarily examines black-box MI attacks, where the attacker can neither access the private data of the black-box target network nor obtain any knowledge about the model’s structure, hyper-parameters, etc. The black-box target model is trained to recognize different identities using a private dataset . The model produces a probability distribution according to the following way:
(1) |
where represents the probabilities of being classified into each of the identities.
We can only obtain the corresponding by inputting an image into the target model for discrimination in the black-box model without obtaining any intermediate parameters from any of the models. The attacker aims to obtain sensitive data associated with a specific label from the target network. We chose the face recognition classifier model as the attack target to make our attack more realistic. This model identifies individuals’ identities in images and assigns the corresponding labels. Thus, the private facial images of any specific identity are constructed by utilizing the soft and hard labels provided by the black-box model.
It is essential to satisfy the following conditions to expose the privacy of the target by reconstructing data : 1) The probability of the target label in the prediction probability distribution of model on input must be maximized, i.e., . 2) The confidence level of the label should be as large as possible, i.e., maximizing .
III-A2 Stochastic Game for Latent Distribution Search
In GANs, the variation of latent code in the latent space is continuous. Hence, searching for latent code can be regarded as a Markov decision process (MDP). However, searching for latent distribution cannot be considered an MDP. Firstly, latent distribution has a more complex state space than latent code, and the parameters constituting the distribution, such as mean and variance , entail more uncertainty and interaction. Therefore, searching for latent distribution should be viewed as a stochastic game.
When dealing with stochastic game problems, multi-agent reinforcement learning is often a good choice as it can effectively address interactions and competitions. Therefore, we select two agents to optimize the mean and variance constituting the latent distribution, respectively. In the context of distributional MI attacks, these two agents aim to optimize the appropriate latent distribution for selecting appropriate latent code to reconstruct more privacy-preserving images. Even so, we cannot classify this task as entirely cooperative. A certain degree of competition exists between and . Maintaining this competitive dynamic enables them to enhance their performance while continuously striving for global optimality. This competitive relationship fosters flexibility in the optimization process, ultimately leading to improved MI attack performance. Therefore, we choose the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [19] as the MARL agent for searching appropriate latent distribution.
III-A3 Overview
DBB-MI consists of three steps. Firstly, we train a GAN, where the GAN is initially trained on the public dataset . It’s important to note that the public dataset does not overlap with the private dataset used to train the target black-box model . We neither need to select the images carefully nor to use the target model for additional labelling of images in . Next, we optimize the initial random latent distribution to approximate the real latent distribution. Finally, we sample the latent code from the optimized high-dimensional latent distribution and input it into the GAN to reconstruct private data. The overall structure of DBB-MI is exhibited in Fig.2.
III-B MADDPG for Searching Latent Space Distribution
III-B1 MADDPG
The MADDPG [19] agent has two fundamental components: the actor and critic networks, as depicted in Fig.3. The actor network determines the actions to be executed by the agent based on the information observed by the agent as well as the current state of the system. The critic network is responsible for judging the value of actions and providing feedback (i.e., a reward signal) to the actor. This process enables the actor network to update its policies. Through iterations of the above operations, the agent gradually learns to optimize parameters and to minimize the discrepancy between the optimized and real latent distribution.
III-B2 Action
The actor network aggregates all the data the agent has observed, including the environment’s current state, other agents’ observations, and other pertinent information. Based on this information, the actor network makes decisions regarding the agent’s actions to optimize the latent distribution parameters and . We independently model the latent distribution of each dimension of the latent code and then sample each dimension independently from these latent distributions. Fig.2 shows that and are sampled from the standard normal distribution to form the initial random distribution . In addition, actions and are selected by and according to the initial parameters.
(2) |
The following technique is employed to update actions:
(3) |
Previous research [25, 28] has demonstrated that exploring latent space impacts the diversity and accuracy of reconstructed images. So, we introduce a parameter to balance accuracy and diversity. A small value is employed in the early stages of searching to encourage the agent to optimize the distribution, broadening the search scope. As the training process advances, the latent distribution optimized by agents eventually approaches the real latent space distribution. The is gradually increased to mitigate the diversity of the generated images. The optimization procedure enables agents to refine the latent distribution further to generate a latent distribution closely related to natural images.
III-B3 Reward
The critic network evaluates the value and utility of the actions by measuring their consistency with the target task. When optimizing the latent distribution, the critic network provides feedback or rewards to motivate agents to take actions to enable it toward the real latent space. Actions that steer the latent distribution toward the real latent space will be rewarded more significantly; otherwise, only a lower or no reward will be given.
As evidenced in Fig.3, the critic network considers the effect of actions on the state of the environment. It is updated by incorporating environmental feedback, actual rewards, and estimated action rewards. These help to improve the estimation of rewards by the critic networks. The agents move closer to the true latent distribution by optimizing actor and critic networks. When the current distribution is closer to the real latent distribution, the images generated from that distribution will have higher confidence in the target network . Therefore, the reward can be calculated as follows:
(4) |
(5) |
where denotes the score of images generated from the latent space of the new distribution after performing actions, and is the reward that agents receive after performing actions.
In optimizing the latent distribution, it is necessary to compute rewards individually for each action based on its effectiveness and impact on the target. The reward calculation way helps improve the precision of dynamic adjustments in the optimization process, as formulated in the following:
(6) |
(7) |
We also introduce the penalty factor to penalize instances where the generated image is irrelevant to the target category. This could help agents perform actions to improve the quality of the generated image while reducing interference from non-target categories. The penalty term can assist agents in optimizing the distribution so that the reconstructed images are closely related to the target category, yielding superior-quality images. The penalty factor is defined below.
(8) |
The threshold is introduced to prevent agents from obtaining additional rewards. When the negative log probability () of the target label of the generated image exceeds the specific threshold, an additional penalty is imposed. This penalty factor ensures that the generated image is more relevant to the target category and that images are distinguishable enough to avoid confusion with non-target categories.
To sum up, we can calculate rewards for agentμ and agentσ as follows:
(9) |
(10) |
where wn represents the weight of rn.
III-B4 Distribution optimization
As shown in Fig.4, the search is conducted on the high-dimensional latent distribution using MADDPG. Specifically, the and are sampled from an -dimensional normal distribution. After that, they are paired together to form the initial -dimensional high-dimensional latent distribution as follows:
(11) |
The MADDPG algorithm, involving and , optimizes the initial high-dimensional latent distribution. and select actions based on the current initial distribution, and these actions are formulated below.
(12) |
(13) |
The chosen actions facilitate the latent distribution towards the real latent distribution, resulting in a highly optimized latent distribution . The latent code is ultimately extracted from .
(14) |
where obeys the distribution .
The above optimization method of high-dimensional latent distribution can avoid the limitation caused by directly searching the latent distribution space. Through MARL methods, an effective search of the latent distribution can be achieved with only limited model outputs, without the need for any additional information about the model. Consequently, in a black-box setting, it broadens the search range in the latent distribution space, thereby enhancing the ability to identify the real latent space of the target and ultimately improving the accuracy of the reconstructed sensitive data.
III-C MADDPG for Agent Training
The MADDPG trains two agents, enabling them to cooperate and compete in a predefined image generation environment. Agents select specific actions to optimize the initial latent distribution toward the real latent distribution. The key lies in training the agents, which specifically involves the following steps:
-
•
Step 1: After observing the randomly constructed initial distribution , the agents select corresponding actions to execute;
-
•
Step 2: The rewards for the decisions made by the agent are computed and stored in the replay buffer along with the agent’s observation information;
-
•
Step 3: When there are enough experiences in the replay buffer , a batch is sampled to update the Agent, and the updated Agent is returned.
The private data hidden in the target network is reconstructed. Details about agent training are provided in Algorithm 1.
IV Experiment
In this section, we primarily analyze the attack performance of DBB-MI on different datasets and target networks. Additionally, we analyze the distributional attack and investigate some factors that may affect the attack performance.
IV-A Experimental setting
IV-A1 Dataset
Four distinct face datasets that represent a variety of situations are used to evaluate the effectiveness and breadth of DBB-MI. Additionally, we conducted experiments on the MNIST dataset to assess the applicability of our approach to other types of datasets.
-
•
CelebFaces Attributes Dataset (CelebA) [29]. It contains 202,599 photos of 10,177 different celebrities.
-
•
FaceScrub [30]. It includes 106,863 images of 530 individuals with an even gender distribution.
-
•
Pubfig83 [31]. It consists of 13,600 images of 83 individuals. These images were taken in regulated real-world environments with significant variations in lighting, expressions, and other attributes.
-
•
Flickr-Faces-HQ (FFHQ) [32]. It comprises 70,000 high-quality face images with significant age, expression, and ethnicity variations.
-
•
MNIST. It encompasses 70,000 handwritten digits (0 through 9), having different structures and features from facial datasets.
CelebA, FaceScrub, Pubfig83, and MNIST are divided into two parts: public and private datasets. The public dataset is utilized to train GAN, while the private dataset is employed to train the target classification model. It should be emphasized that there is no overlap of the same identities or images between public and private datasets. Thus, it can be assumed that the trained GAN does not directly contain any original private information. Additionally, since public and private datasets in the same dataset have similar statistical properties, the FFHQ is used as an independent extra dataset to evaluate the performance of MI attacks under various distribution conditions. It allows for a comprehensive assessment of MI attacks’ robustness and generalization capabilities.
IV-A2 Target Models
Like previous studies [12, 11, 27, 25], we utilize three popular face recognition networks for evaluation; namely FaceNet64 [33], ResNet-152 [34], and VGG16 [35]. These networks are employed to assess the impact of MI attacks on models with different architectures. The generalization and robustness of DBB-MI are better evaluated using varied face recognition models.
IV-A3 Baselines
We select some representative state-of-the-art white-box and black-box MI attacks as baselines for comparison. Specifically, we choose the Generative Model Inversion (GMI) attacks [11] and Knowledge-Enriched Distributional Model Inversion (KED-MI) [12] attacks as white-box MI attacks. GMI is the first MI attack for deep networks, while KED-MI is a distributional MI attack. Meanwhile, we employ the Reinforcement Learning-based Black-box Model Inversion (RLB-MI) attacks [25] and Model Inversion for deep learning Network (MIRROR) [24], representing the advanced black-box MI attack. We also select the Boundary-Repelling Model Inversion (BERP-MI) attacks [27], the only and most advanced label-only MI attack. These black-box MI attacks represent the current state-of-the-art (SOTA) in GAN-based MI attacks that directly search for latent code.
All models undergo identical dataset training, and the same evaluation models assess all experimental results to ensure fair comparisons. GMI, RLB-MI, MIRROR, and BERP-MI utilize the same GAN as DBB-MI. In addition, the GAN is trained for KED-MI using the specified requirements and an identical dataset to that of DBB-MI. It allows a fair and objective comparison between KED-MI and DBB-MI.
IV-A4 Implementation details
The same hyperparameters are used to train the GAN and target network like previous studies [12, 11, 27, 25]. For MADDPG, some important parameters are set as follows:
-
•
learning rate: 1e-3
-
•
discount factor: 0.99
-
•
target network update rate: 5×
-
•
experience replay buffer size: 1×
-
•
batch size: 256
-
•
training episodes: 4×
IV-A5 Evaluation metrics
Like prior work [11], the effectiveness of MI attacks is evaluated using quantitative criteria, including attack accuracy (ACC) and K-nearest neighbor feature distance (KNN Dist). Furthermore, the Peak Signal-to-Noise Ratio (PSNR) is utilized to assess the resemblance between reconstructed and original images.
Attack Accuracy:This metric measures the probability of successfully reconstructing private data through an attack. The key with [12, 25] difference is that we only consider the attack successful when both the target model and the additional discriminative model agree that the generated image belongs to the target class. This approach enhances the accuracy of attack success rate assessment, ensuring that the generated images deceive the target model and exhibit high-quality facial features, reducing cases where noisy images are mistakenly identified as the target class.
KNN Dist: This metric measures the similarity of features between the generated reconstructed images and real private images. To calculate the KNN Dist, features are first extracted from the fully connected layer of the evaluation classifier for both the generated reconstructed images and the real private images. Then, their similarity in the feature space is assessed by calculating the L2 distance between these two sets of features.
PSNR: This metric measures the difference between two images. It evaluates the quality and similarity between the reconstructed and real private images by calculating their PSNR value. A higher PSNR value indicates less difference between the two images, implying a higher similarity between the reconstructed and real private images.
Model | Typ | Method | ACC | PSNR | KNN Dist |
---|---|---|---|---|---|
VGG16 | White-box | GMI | 0.194 | 12.3 | 1521.05 |
KED-MI | 0.684 | 14.59 | 1258.65 | ||
Black-box | MIRROR | 0.452 | 14.21 | 1358.20 | |
BERP-MI | 0.562 | 13.36 | 1872.48 | ||
RLB-MI | 0.642 | 15.80 | 1262.34 | ||
DBB-MI | 0.858 | 20.66 | 1180.63 | ||
FaceNet64 | White-box | GMI | 0.298 | 15.78 | 1584.24 |
KED-MI | 0.766 | 16.35 | 1411.56 | ||
Black-box | MIRROR | 0.528 | 15.09 | 1308.40 | |
BERP-MI | 0.734 | 13.69 | 1685.29 | ||
RLB-MI | 0.804 | 16.27 | 1354.86 | ||
DBB-MI | 0.916 | 18.06 | 1091.15 | ||
ResNet-152 | White-box | GMI | 0.340 | 15.36 | 1752.15 |
KED-MI | 0.826 | 16.52 | 1130.05 | ||
Black-box | MIRROR | 0.640 | 15.91 | 1254.90 | |
BERP-MI | 0.754 | 13.17 | 1745.73 | ||
RLB-MI | 0.812 | 15.23 | 1308.69 | ||
DBB-MI | 0.898 | 17.37 | 1063.38 |
IV-B Comparison with state-of-the-art MI attacks
IV-B1 Performance evaluation on different target models
Table I shows the experimental results of our method and baselines under different target models. As seen in Table I, DBB-MI exhibits a notable superiority compared to state-of-the-art white-box and black-box MI attacks regarding ACC, KNN Dist, and PSNR. Using the target model VGG16 as an example, the ACC of DBB-MI improves 25.4% over KED-MI and 33.6% over RLB-MI. This is because DBB-MI fully explores the latent space by optimizing the latent distribution and obtaining more private data about the target. The experimental results in Table I demonstrate that DBB-MI is more effective in targeting different target models and poses more severe privacy leakage risks. This indicates that our method outperforms white-box distributional attacks and achieves SOTA black-box attack performance.
In addition, we also compare the images reconstructed by DBB-MI with those rebuilt by baselines. As displayed in Fig.5, the images recovered by DBB-MI are closer to the original ones compared to those recovered by baselines; they have similar details and colors as the original. It is attributed to the efficiency of DBB-MI in searching the latent space, allowing it to capture more private information. The above experimental results prove that DBB-MI outperforms the state-of-the-art white-box and black-box MI attacks for various target models in terms of multiple performance evaluation metrics and visualization.
Dataset | Typ | Method | ACC | PSNR | KNN Dist |
---|---|---|---|---|---|
CelebA | White-box | GMI | 0.298 | 15.78 | 1584.24 |
KED-MI | 0.766 | 16.35 | 1411.56 | ||
Black-box | MIRROR | 0.528 | 15.09 | 1308.40 | |
BERP-MI | 0.734 | 13.69 | 1685.29 | ||
RLB-MI | 0.804 | 16.27 | 1354.86 | ||
DBB-MI | 0.916 | 18.06 | 1091.15 | ||
FaceScurb | White-box | GMI | 0.080 | 17.05 | 2729.06 |
KED-MI | 0.355 | 20.31 | 2682.69 | ||
Black-box | MIRROR | 0.325 | 18.86 | 2710.11 | |
BERP-MI | 0.305 | 20.81 | 2684.91 | ||
RLB-MI | 0.420 | 19.23 | 2693.85 | ||
DBB-MI | 0.375 | 23.03 | 2661.82 | ||
Pubfig83 | White-box | GMI | 0.100 | 10.26 | 2580.71 |
KED-MI | 0.380 | 15.25 | 2363.12 | ||
Black-box | MIRROR | 0.300 | 13.25 | 2410.62 | |
BERP-MI | 0.400 | 13.10 | 2492.99 | ||
RLB-MI | 0.400 | 16.37 | 2349.84 | ||
DBB-MI | 0.560 | 17.04 | 2342.47 |
IV-B2 Performance evaluation on different dataset
Table II presents the experimental results of DBB-MI and baselines under different datasets. It can be observed from Table II that DBB-MI beats baselines in all performance evaluation metrics. Taking CelebA as a case study, DBB-MI’s ACC is 19.5% and 13.9% higher than KED-MI and RLB-MI, respectively.
The experimental outcomes obtained by all MI attacks on CelebA are superior to those on FaceScrub and Pubfig83. This is because CelebA contains more identity categories and data information, giving the trained model stronger classification capabilities and, therefore, more vulnerability to attacks. Additionally, DBB-MI outperforms all baselines on all datasets except FaceScrub. On FaceScrub, the ACC of DBB-MI is slightly lower than that of RLB-MI, but the PSNR and KNN Dist of DBB-MI are better than those of RLB-MI. Thus, DBB-MI outperforms RLB-MI in most performance evaluation metrics.
Dataset | Typ | Method | ACC | PSNR | KNN Dist |
---|---|---|---|---|---|
CelebA | White-box | GMI | 0.114 | 15.35 | 1431.45 |
KED-MI | 0.408 | 17.52 | 1035.31 | ||
Black-box | MIRROR | 0.286 | 16.66 | 1242.91 | |
BERP-MI | 0.398 | 15.01 | 1331.25 | ||
RLB-MI | 0.402 | 16.76 | 925.61 | ||
DBB-MI | 0.532 | 16.04 | 1002.51 | ||
FaceScrub | White-box | GMI | 0.150 | 11.33 | 2892.42 |
KED-MI | 0.315 | 15.28 | 2856.36 | ||
Black-box | MIRROR | 0.285 | 15.52 | 2878.36 | |
BERP-MI | 0.295 | 15.97 | 2859.94 | ||
RLB-MI | 0.355 | 16.56 | 2866.43 | ||
DBB-MI | 0.490 | 17.04 | 2817.85 | ||
PubFig83 | White-box | GMI | 0.080 | 11.16 | 2684.99 |
KED-MI | 0.340 | 14.59 | 2415.76 | ||
Black-box | MIRROR | 0.300 | 14.27 | 2459.85 | |
BERP-MI | 0.380 | 15.36 | 2391.67 | ||
RLB-MI | 0.360 | 16.05 | 2402.5 | ||
DBB-MI | 0.700 | 16.55 | 2387.56 |
IV-B3 Performance evaluation on cross-dataset
In previous experiments, we utilized a dataset with similar statistical properties and feature distributions as the dataset used to train the target model to train GAN. However, obtaining a dataset with similar distributions to the target dataset in real-world settings is challenging. Therefore, it is imperative to train GAN using an extra dataset.
We train GAN on the extra dataset, FFHQ. Meanwhile, we employ the FaceNet64 trained under CelebA, FaceScrub, and Pubfig83 as the target models and utilize the FaceNet trained on these datasets as the evaluation models. Table III presents the experimental results of DBB-MI and baselines across datasets. It is easy to see that DBB-MI has superior performance compared to baselines across most datasets. The reason for this can be linked to the comprehensive exploration of the latent space in DBB-MI, which has resulted in the acquisition of more private data about the target. In addition, DBB-MI has the highest ACC on CelebA, while its PSNR and KNN Dist are slightly worse than the best. This implies that DBB-MI still has room for performance improvement.
In summary, DBB-MI exhibits superior performance across diverse target models, datasets, and cross-dataset scenarios. It outperforms state-of-the-art white-box and black-box MI attacks regarding ACC, KNN Dist, PSNR, and visualization. These findings confirm the effectiveness of the latent distribution exploration in DBB-MI, which is vital for improving model security and privacy protection.
IV-B4 Performance evaluation on MNIST
To demonstrate that our approach is practical for face datasets, we also evaluated it on the MNIST dataset. For the MNIST dataset, we attacked models of different depths, including ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, and ResNet-SIM. ResNet-SIM consists of two convolutional layers, one max-pooling layer, two residual blocks, and one fully connected layer, while the remaining network structures conform to [34]. We achieved a 100% attack success rate across different networks, and features of different digits could be reconstructed. Specific reconstruction results are shown in Fig 6.
IV-C Analysis of distributional attacks
IV-C1 Analysis of the changes of the latent distribution
Fig. 7a depicts the latent distribution before attacking, in which each point represents a specific dimension of the latent distribution. During an attack, the latent distribution of each dimension is optimized by the MADDPG to explore the latent space effectively. Fig.7b illustrates the latent distribution after attacking, in which each dimension of the latent distribution displays a distinct pattern. This implies that distinct privacy features are evident in each dimension of latent distribution. Moreover, it also proves the rationality of DBB-MI, i.e., optimizing each dimension of the latent distribution independently to enable the latent distribution to approach the true one.
IV-C2 Evaluating Distribution Accuracy
We conducted 500 random samples from the optimized latent distribution to obtain reconstructed samples. We calculated the proportion of recreated samples that matched the target labels to measure the accuracy of the optimized distribution. The accuracy of the optimized distribution was assessed by calculating the fraction of rebuilt samples that corresponded to the target labels, which is referred to as distributional accuracy. 68% of all latent distributions have an accuracy exceeding 0.5, indicating that more than half of the randomly sampled samples can be deemed successful reconstructions. As displayed in Fig.8, the specific results indicate that the optimized latent distribution demonstrates strong performance in effectively exploring the target label information.
IV-C3 Evaluating actual attack performance under different distributional accuracy levels
To depict the real attack accuracy, we performed 10,000 random samples for all optimized latent distributions. These samples were then tested for top-1 accuracy, which indicates successful reconstruction, as well as top-5 accuracy, which signifies the target label ranking among the top 5 out of all 1,000 classes. When the latent distribution accuracy exceeds 24%, our latent distribution’s top-5 accuracy exceeds 50%. The observed top-1 accuracy closely aligns with the accuracy of the tested distribution, suggesting that the optimized distribution demonstrates consistent performance in all reconstruction tasks. More details as shown in Fig.9.
IV-C4 Evaluating sample reconstruction confidence under varying distributional accuracies
For testing, we selected samples from 10,000 random samplings of optimized latent distributions with varying accuracies. We evaluated the reconstructed samples using the target model to determine their corresponding label confidences, as depicted in Fig.10. When the distributional accuracy reached 0.632, it was observed that 50.59% of the rebuilt samples achieved a confidence level of 0.5, while 36.10% exhibited a confidence level of 0.75. However, upon attaining a distributional accuracy of 0.962, it was seen that 90.27% of the samples demonstrated a confidence level of 0.5, 78.58% exhibited a confidence level of 0.75, and 58.40% demonstrated a confidence level beyond 0.9. Despite the restricted distributional accuracy of 0.580, a significant proportion of the samples, specifically 47.51%, surpassed a confidence level of 0.5. Although there is an improvement in the accuracy of reconstructed samples as the distributional accuracy improves, it is important to note that certain samples still exhibit very low confidence levels. This may be attributed to the inadequate training of the GAN model employed in our study.
IV-D Analysis of factors affecting attack performance
IV-D1 Evaluating the impact of latent distribution dimensions
This work primarily concerns improving the performance of MI attacks by optimizing the latent distribution in the high-latitude latent space. Therefore, it is necessary to investigate the effect of varying latent distribution dimensions on the MI attack’s ACC. Fig.11 displays the ACC obtained by DBB-MI, with variations in latent distribution dimensions. As shown in this figure, the ACC of DBB-MI rises gradually with the increase of latent distribution dimensions. This suggests that searching high-dimensional latent distributions can explore the latent space more comprehensively. Hence, MI attacks need to choose the appropriate latent distribution dimension.
IV-D2 Evaluating the impact of training episodes
To further investigate the factors influencing the performance of MI attacks, we assess the median number of iterations required for generating images to reach the specific test classification model’s complexity and the dataset’s size. Experimental findings were obtained by training various target models on CelebA. An MI attack has the highest search difficulty to target ResNet-152, the most complex target model. More iterations are required to achieve the test accuracy obtained on FaceNet64 and VGG16, as shown in Fig.12a. Fig.12b exhibits the experimental results obtained by utilizing a GAN trained on FFHQ to attack FaceNet64 trained on different datasets. An MI attack exhibits the lowest search difficulty and can quickly reach a specific test accuracy on PubFig83, while it needs a higher number of iterations on other datasets. Therefore, we can conclude that as the complexity of the model and the size of the dataset increase, the search difficulty of an MI attack increases, ultimately leading to a decrease in the attack performance.
Agent | ACC | PSNR | KNN Dist |
---|---|---|---|
IQL | 0.341 | 14.21 | 1728.39 |
VDN | 0.462 | 16.35 | 1443.61 |
MADDPG | 0.858 | 20.66 | 1180.63 |
IV-D3 Evaluating various reinforcement learning agents
DBB-MI heavily relies on the MADDPG to optimize the latent distribution of GAN and obtain more private information about the target. To verify the rationality of using MADDPG, IQL [17], as a form of fully competitive MARL, and VDN [15], as a form of fully cooperative MARL, are used to search for the latent distribution for model inversion. Meanwhile, the VGG16 model trained on CelebA is employed as the target network. Table IV lists the experimental results of MI attacks using various reinforcement learning agents. This table illustrates that the performance of MADDPG surpasses that of IQL and VDN. For example, MADDPG’s ACC is 150% higher than IQL and 85.7% higher than VDN, respectively. This is why DBB-MI utilizes MADDPG to optimize GAN’s high-dimensional latent space distribution. Furthermore, this also underscores that searching for suitable latent distributions from the latent space of GANs should be regarded as a semi-competitive, semi-cooperative form of MARL.
IV-D4 RL agent rewards
To further assess the performance difference between various agents in GAN-based MI attacks, we compare the reward changes during their training, as shown in Fig.13. As can be seen from the figure, both VDN [15], and IQN [17] exhibit consistently modest rewards, with fluctuations occurring around this baseline amount. In contrast, MADDPG [19] can achieve higher reward convergence, yielding more gratifying results. Therefore, MADDPG is more suitable for hidden space search in GAN-based MI attacks.
IV-D5 Evaluating the diverse image reconstruction
It is expected to find several images associated with the same label, depicting various stages or conditions of the same object. DBB-MI can reconstruct several diverse images for a given label, as displayed in Fig.14. We sample multiple latent codes from the finally optimized latent distribution to generate multiple images with different privacy attributes. Here, variations in facial expressions, hair, lighting circumstances, and other variables highlight diverse privacy features within the same label. The diverse image reconstruction capabilities of DBB-MI are very important for studying privacy protection. It exposes how an attacker can reconstruct different state information of a target, which also implies how to defend against this attack effectively.
V Conclusion
In this paper, we present a novel and effective Distributional Black-Box Model Inversion (DBB-MI) attack that does not require elaborate training of GAN. In a black-box setting, DBB-MI systematically explores the latent space of GAN with limited knowledge to identify the appropriate latent distribution. This is achieved through the utilization of a multi-agent reinforcement learning-based approach. It can accurately reconstruct the private data of the target model. A comprehensive assessment of the attack performance and generalization of DBB-MI is conducted through a series of experiments. The experimental results demonstrate that DBB-MI attains a level of performance comparable to the most advanced black-box attacks. Additionally, these results further validate the efficacy of distributional attacks in comparison to state-of-the-art MI attacks based on optimizing latent code.
References
- [1] X. An, J. Deng, J. Guo, Z. Feng, X. Zhu, J. Yang, and T. Liu, “Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial fc,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4042–4051.
- [2] Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang et al., “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17 853–17 862.
- [3] J. Li, K. Sun, B. S. Huff, A. M. Bierley, Y. Kim, F. Schaub, and K. Fawaz, ““it’s up to the consumer to be smart”: Understanding the security and privacy attitudes of smart home users on reddit,” in Proceedings of the IEEE Symposium on Security and Privacy (SP), 2023, pp. 380–396.
- [4] L. Zhou, G. Huang, Y. Mao, S. Wang, and M. Kaess, “Edplvo: Efficient direct point-line visual odometry,” in Proceedings of the International Conference on Robotics and Automation (ICRA), 2022, pp. 7559–7565.
- [5] D. Cao, K. Wei, Y. Wu, J. Zhang, B. Feng, and J. Chen, “Fepn: A robust feature purification network to defend against adversarial examples,” Computers & Security, vol. 134, p. 103427, 2023.
- [6] Z. Wei, J. Chen, M. Goldblum, Z. Wu, T. Goldstein, and Y.-G. Jiang, “Towards transferable adversarial attacks on vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 3, 2022, pp. 2668–2676.
- [7] Y. Chen, C. Shen, Y. Shen, C. Wang, and Y. Zhang, “Amplifying membership exposure via data poisoning,” Advances in Neural Information Processing Systems (NIPS), vol. 35, pp. 29 830–29 844, 2022.
- [8] A. Tejankar, M. Sanjabi, Q. Wang, S. Wang, H. Firooz, H. Pirsiavash, and L. Tan, “Defending against patch-based backdoor attacks on self-supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 12 239–12 249.
- [9] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart, “Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in Proceedings of the 23rd USENIX security symposium (USENIX Security 14), 2014, pp. 17–32.
- [10] Y. Yin, X. Zhang, H. Zhang, F. Li, Y. Yu, X. Cheng, and P. Hu, “Ginver: Generative model inversion attacks against collaborative inference,” in Proceedings of the ACM Web Conference (WWW), 2023, pp. 2122–2131.
- [11] Y. Zhang, R. Jia, H. Pei, W. Wang, B. Li, and D. Song, “The secret revealer: Generative model-inversion attacks against deep neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 253–261.
- [12] S. Chen, M. Kahla, R. Jia, and G.-J. Qi, “Knowledge-enriched distributional model inversion attacks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 16 178–16 187.
- [13] X. Yuan, K. Chen, J. Zhang, W. Zhang, N. Yu, and Y. Zhang, “Pseudo label-guided model inversion attack via conditional generative adversarial network,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 37, no. 3, 2023, pp. 3349–3357.
- [14] J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Proceedings of the Autonomous Agents and Multiagent Systems (AAMAS Workshops). Springer, 2017, pp. 66–83.
- [15] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2018, p. 2085–2087.
- [16] Y. Zhu and D. Zhao, “Online minimax q network learning for two-player zero-sum markov games,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 3, pp. 1228–1241, 2020.
- [17] A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017.
- [18] J. Hu and M. P. Wellman, “Nash q-learning for general-sum stochastic games,” Journal of machine learning research, vol. 4, no. Nov, pp. 1039–1069, 2003.
- [19] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in Neural Information Processing Systems (NIPS), 2017.
- [20] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS), 2015, pp. 1322–1333.
- [21] Z. Yang, J. Zhang, E.-C. Chang, and Z. Liang, “Neural network inversion in adversarial setting via background knowledge alignment,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019, pp. 225–240.
- [22] A. Salem, A. Bhattacharya, M. Backes, M. Fritz, and Y. Zhang, “Updates-leak: Data set inference and reconstruction attacks in online learning,” in Proceedings of the 29th USENIX security symposium (USENIX Security 20), 2020, pp. 1291–1308.
- [23] Z. Zhang, X. Wang, J. Huang, and S. Zhang, “Analysis and utilization of hidden information in model inversion attacks,” IEEE Transactions on Information Forensics and Security, 2023.
- [24] S. An, G. Tao, Q. Xu, Y. Liu, G. Shen, Y. Yao, J. Xu, and X. Zhang, “Mirror: Model inversion for deep learning network with high fidelity,” in Proceedings of the 29th Network and Distributed System Security Symposium (NDSS), 2022.
- [25] G. Han, J. Choi, H. Lee, and J. Kim, “Reinforcement learning-based black-box model inversion attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 20 504–20 513.
- [26] T. Zhu, D. Ye, S. Zhou, B. Liu, and W. Zhou, “Label-only model inversion attacks: Attack with the least information,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 991–1005, 2022.
- [27] M. Kahla, S. Chen, H. A. Just, and R. Jia, “Label-only model inversion attacks via boundary repulsion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15 045–15 053.
- [28] K.-C. Wang, Y. Fu, K. Li, A. Khisti, R. Zemel, and A. Makhzani, “Variational model inversion attacks,” Advances in Neural Information Processing Systems (NIPS), vol. 34, pp. 9706–9719, 2021.
- [29] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3730–3738.
- [30] H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face datasets,” in Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), 2014, pp. 343–347.
- [31] N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE, 2011, pp. 35–42.
- [32] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410.
- [33] Y. Cheng, J. Zhao, Z. Wang, Y. Xu, K. Jayashree, S. Shen, and J. Feng, “Know you at one glance: A compact vector representation for low-shot learning,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2017, pp. 1924–1932.
- [34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- [35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.