Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

Huan Bao, Kaimin Wei, Yongdong Wu, Jin Qian, Robert H. Deng H. Bao, K. Wei, Y. Wu, and J. Qian are with the College of Cyber Security, Jinan University, Guangzhou 510632, China (e-mail: [email protected], [email protected], [email protected], [email protected]).R. H. Deng are with the School of Information Systems, Singapore Management University, Singapore 178902, Singapore (e-mail: [email protected]).
Abstract

A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the structures and parameters of the target model, which is not always viable in practice. To overcome the above shortcomings, this paper proposes a novel Distributional Black-Box Model Inversion (DBB-MI) attack by constructing the probabilistic latent space for searching the target privacy data. Specifically, DBB-MI does not need the target model parameters or specialized GAN training. Instead, it finds the latent probability distribution by combining the output of the target model with multi-agent reinforcement learning techniques. Then, it randomly chooses latent codes from the latent probability distribution for recovering the private data. As the latent probability distribution closely aligns with the target privacy data in latent space, the recovered data will leak the privacy of training samples of the target model significantly. Abundant experiments conducted on diverse datasets and networks show that the present DBB-MI has better performance than state-of-the-art in attack accuracy, K-nearest neighbor feature distance, and Peak Signal-to-Noise Ratio.

Index Terms:
Distributional model inversion attack, deep learning, multi-agent reinforcement learning, black-box attack.

I Introduction

Artificial Intelligence (AI) technology is rapidly advancing and widely applied in diverse domains, including facial recognition [1], autonomous driving[2], smart homes[3], drone applications[4], etc. Although AI has undoubtedly brought substantial convenience to both work and life, it is vulnerable to various attacks, such as adversarial attacks [5, 6], data poisoning [7], [8] and Model Inversion (MI) attacks [9], [10]. Fredrikson et al. [9] demonstrated that MI attacks pose a significant risk of privacy leakage for machine learning (ML), as attackers can expose sensitive training data by only accessing the ML model itself.

Recently, GAN-based MI attacks have emerged as an attractive way to attack complex ML models. Zhang et al. [11] introduced the first GAN-based MI attack, shifting the focus from the algorithm-centered numerical reconstruction of sensitive data to an optimization problem through searching latent code from GAN’s latent space. They utilized GAN to extract prior knowledge from publicly available datasets and searched the latent space of the GAN to recreate privacy. GAN-based MI attacks always involve the following steps. Step 1: GAN training. It trains a GAN using the publicly available dataset that shares a similar distribution to the private dataset used by the target network. For example, if the private dataset includes facial images, the public dataset should also contain facial images. Step 2: latent code searching. It identifies the suitable latent code to generate images that could reveal private information when passed through the trained GAN.

Chen et al. [12] argued that previous GAN-based MI attacks are limited to one-to-one privacy recovery via the exploration of latent code. To achieve many-to-one privacy recovery, they introduced the distributional attack to reconstruct multiple privacy data instances that correspond to a single label. They proposed the Knowledge-Enriched Distributional Model Inversion (KED-MI), which initially generates pseudo-labels for a publicly available dataset using the target model. Subsequently, a GAN is trained to discriminate generated images as part of the loss function for further optimization. Finally, the trained GAN and optimized latent distribution based on the white-box setting are employed to attack the target and recreate confidential information. Yuan et al. [13] developed the Pseudo Label-Guided MI (PLG-MI) to enhance the training method for GAN. In the GAN training, they only used photos that have a greater level of confidence in certain classes. They exclusively utilized images with higher confidence for specific classes in GAN training, enhancing the information contained within the GAN to generate images with particular labels. This will improve the GAN’s ability to narrow down the search space. Although these distributional white-box MI attacks demonstrated satisfactory performance in step 1, they still have several limitations:

  • Requiring large-scale dataset. These attacks rely heavily on the discriminative ability of the GAN and leverage the target model to label the dataset, thus enhancing the GAN’s ability to differentiate. PLG-MI, in particular, needs to examine a large amount of datasets to ensure that there is enough data for each category to train GAN. This over-reliance on prior knowledge may lead to the misuse of the target model and also increase the difficulty of dataset collection.

  • Over-accessing the target model. KED-MI and PLG-MI assume that the attacker can freely access the parameters of the target model, enabling them to constrain the latent distribution and facilitate the identification of an appropriate latent distribution. Nevertheless, it is challenging to implement this strong assumption in real attacks.

  • Underexplored latent distribution. The latent distribution contains rich information that significantly enhances the efficacy of MI attacks. However, KED-MI and PLG-MI rely heavily on GAN, rather than latent distribution, for targeting. Thus, the under-searched latent distribution shall narrow down the attack performance of KED-MI and PLG-MI.

To overcome the above difficulties, we propose a novel Distributional Black-Box Model Inversion (DBB-MI) attack. A GAN is trained to assign labels to datasets using a randomly chosen dataset without annotation. In the context of black-box settings, the latent distribution is optimized to effectively tackle the issue of target model over-access by utilizing Multi-Agent Reinforcement Learning (MARL) techniques. This enhances the relevance of the attacks to real-life situations. This paper presents the primary contributions as follows:

  • We propose the Distributional Black-Box Model Inversion (DBB-MI) Attack, which is the first exploration of a distributional MI attack in black-box settings.

  • In black-box settings, DBB-MI leverages the Multi-Agent Reinforcement Learning (MARL) algorithm to thoroughly explore the appropriate latent distributions for specific categories, extracting latent privacy features in GAN.

  • Extensive experiments have demonstrated the superior attack performance of DBB-MI compared with state-of-the-art black-box MI attacks. For example, its highest success rate has experienced a notable boost of 33.6% on CelebA. Additionally, it also achieves a 100% attack success rate on MNIST.

The rest of this paper is arranged as follows. Section II introduces some related work. Section III gives the challenges of searching for latent distribution under the black-box setting and provides a detailed description of DBB-MI. Section IV presents and analyzes experimental results. Section V concludes this work.

II Related Work

This section introduces the background knowledge of Multi-Agent Reinforcement Learning (MARL) and Model Inversion (MI) Attacks.

Refer to caption
Figure 1: The overview of multi-agent reinforcement learning.

II-A Multi-Agent Reinforcement Learning

Single-agent reinforcement learning employs the Markov decision process model, whereas multi-agent reinforcement learning (MARL) incorporates stochastic games. The joint actions formed by multiple agents have a significant impact on the transition and updating of the environmental state. Additionally, these actions play a crucial role in determining the reward feedback received by the agents, as depicted in Figure 1. The agents can be categorized into three groups based on their relationships: totally cooperative, fully competitive, and semi-cooperative semi-competitive.

In the totally cooperative MARL [14, 15], all agents are dedicated to jointly achieving a shared objective by maximizing the overall reward through collaboration, without considering their individual rewards. For example, each agent possesses its own local value function in the Value-Decomposition Networks (VDN) [15] algorithm. After each agent makes a decision, the local value is calculated and then aggregated to obtain the global value to achieve globally optimal choices.

In fully competitive MARL [16, 17], all agents see each other as competitors, and each agent only focuses on maximizing its utility, disregarding the impact of other agents. One illustrative instance is the Independent Q-Learning (IQL) algorithm [17], wherein agents independently engage in Q-learning. Although it may produce satisfactory outcomes in certain contexts, it often exhibits instability and difficulties in achieving convergence due to the influence of other actors on the surrounding environment. Therefore, it is typically only suitable for relatively simple scenarios.

In semi-cooperative semi-competitive MARL [18, 19], agents can obtain greater benefits through collaboration, while simultaneously experiencing potential gains or losses due to a certain level of competition. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [19] algorithm enables each agent not only to learn its policy knowledge but also to observe the behaviors of other agents during the learning process to improve their strategy further. These algorithms are more suitable for handling complex multi-agent tasks because they allow agents to simultaneously assess cooperative and competitive connections.

Through the above analysis, it is easy to get that different MARL approaches have different advantages and adapted environments. When utilizing MARL, it is crucial to choose the appropriate MARL according to the surroundings and the particular activity in order to accomplish the goals more effectively.

II-B Model Inversion Attacks

According to attack strategies, existing MI attacks can be divided into direct reconstruction-based and GAN-based.

Early direct reconstruction-based MI attacks predominantly concentrated on white-box MI attacks, in which an attacker can access all data related to the model, including architecture, parameters, and others. Fredrikson et al. [9] developed the first MI attack, targeting the information regression model by inputting specific features. However, its effectiveness diminishes when the feature space increases. Later, Fredrikson et al. [20] achieved the inversion of a face dataset with a larger feature space by minimizing the confidence loss. Although white-box MI attacks can disclose the privacy of the target model, they assume that an attacker can access anything about the target model, which is the opposite of reality. Therefore, some researchers focused on black-box MI attacks, in which an adversary only possesses the outputs or labels of the model rather than all the information. Yang et al. [21] assumed that the adversary has access to a vast database that far exceeds the training data of the target network. They employed this data as auxiliary information for an attack. Additionally, Salem et al. [22] attacked the newly added training data by comparing different outputs on the same data before and after the target model was updated. Zhang et al. [23] enhanced face reconstruction accuracy by fully exploiting predicted vectors. Nevertheless, direct reconstruction-based MI attacks can only recover grayscale images with substantial information loss on simple networks.

To attack deep networks, Zhang et al. [11] proposed the GAN-based MI attack, seeking potential private data within the latent space of GAN to gain the target network’s privacy. Currently, GAN-based MI attacks can be further divided into two subcategories: optimizing latent code and optimizing latent distribution. MI attacks based on optimizing latent code are essentially black-box attacks. An et al. [24] utilized genetic algorithms to implement GAN-based MI attacks, reconstructing high-fidelity private face images within deep networks. Han et al. [25] achieved impressive MI attacks by utilizing reinforcement learning algorithms to search for latent codes within the latent space of GANs. Zhu et al. [26] utilized the error rate of the target model to explore decision boundaries, reconstructing representative samples. Kahla et al. [27] proposed the Boundary-Repelling Model Inversion Attack (BERP-MI), which only uses GAN to generate images with a target label and extracts the latent space of images to gather sufficient data for estimating the gradient direction. Due to the limited private information in the latent code, GAN-based MI attacks that rely on optimizing latent code have challenges in accurately recovering results.

To obtain more private information, some researchers focus on MI attacks based on optimizing latent distribution, which are actually white-box attacks. Chen et al. [12] proposed the Knowledge-Enriched Distributional Model Inversion Attack (KED-MI), which leverages the target network to generate pseudo-labels for GAN training to boost the attack rates significantly. Yuan et al. [13] further refined the training methodology of GAN by selecting more representative data from public datasets, thereby enhancing the capability of GAN and narrowing the search space within the latent space. Moreover, meticulously chosen datasets contain a greater abundance of privacy features, consequently further improving the attack performance.

Since these MI attacks based on optimizing latent distribution are all white-box attacks, their requirements on the dataset and the assumption of using the target model to label the dataset are still too unrealistic. In this paper, we attempt to develop a distributional black-box MI attack that does not require a super attacker and the elaborate training of GANs.

III The Proposed Approach: DBB-MI

In this section, we introduce the challenges of searching for latent distribution under the black-box setting and provide a detailed description of DBB-MI.

III-A Problem Formulation

Refer to caption
Figure 2: The overview of DBB-MI. To search for the target latent space distribution, two agents are employed to optimize the target distribution’s μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ, respectively.
Refer to caption
Figure 3: The overall collaboration and competition between two agents. The actor network makes decisions by observing the environmental state, while the critic network feeds feedback to the actor network according to its global observations.

III-A1 Attack model

This work primarily examines black-box MI attacks, where the attacker can neither access the private data Dprivsubscript𝐷𝑝𝑟𝑖𝑣D_{priv}italic_D start_POSTSUBSCRIPT italic_p italic_r italic_i italic_v end_POSTSUBSCRIPT of the black-box target network T𝑇Titalic_T nor obtain any knowledge about the model’s structure, hyper-parameters, etc. The black-box target model T𝑇Titalic_T is trained to recognize k𝑘kitalic_k different identities using a private dataset Dprivsubscript𝐷𝑝𝑟𝑖𝑣D_{priv}italic_D start_POSTSUBSCRIPT italic_p italic_r italic_i italic_v end_POSTSUBSCRIPT. The model T𝑇Titalic_T produces a probability distribution according to the following way:

T(x)[0,1]k𝑇𝑥superscript01𝑘T(x)\rightarrow[0,1]^{k}italic_T ( italic_x ) → [ 0 , 1 ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT (1)

where [0,1]ksuperscript01𝑘[0,1]^{k}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT represents the probabilities of x𝑥xitalic_x being classified into each of the k𝑘kitalic_k identities.

We can only obtain the corresponding T(x)𝑇𝑥T(x)italic_T ( italic_x ) by inputting an image x𝑥xitalic_x into the target model for discrimination in the black-box model without obtaining any intermediate parameters from any of the models. The attacker aims to obtain sensitive data x𝑥xitalic_x associated with a specific label y𝑦yitalic_y from the target network. We chose the face recognition classifier model T𝑇Titalic_T as the attack target to make our attack more realistic. This model identifies individuals’ identities in images and assigns the corresponding labels. Thus, the private facial images of any specific identity are constructed by utilizing the soft and hard labels provided by the black-box model.

It is essential to satisfy the following conditions to expose the privacy of the target y𝑦yitalic_y by reconstructing data x𝑥xitalic_x: 1) The probability of the target label y𝑦yitalic_y in the prediction probability distribution of model T𝑇Titalic_T on input x𝑥xitalic_x must be maximized, i.e., argmaxy(T(x)y)=y𝑎𝑟𝑔𝑚𝑎subscript𝑥𝑦𝑇subscript𝑥𝑦𝑦argmax_{y}(T(x)_{y})=yitalic_a italic_r italic_g italic_m italic_a italic_x start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_T ( italic_x ) start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) = italic_y. 2) The confidence level of the label y𝑦yitalic_y should be as large as possible, i.e., maximizing T(x)y𝑇subscript𝑥𝑦T(x)_{y}italic_T ( italic_x ) start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT.

III-A2 Stochastic Game for Latent Distribution Search

In GANs, the variation of latent code in the latent space is continuous. Hence, searching for latent code can be regarded as a Markov decision process (MDP). However, searching for latent distribution cannot be considered an MDP. Firstly, latent distribution has a more complex state space than latent code, and the parameters constituting the distribution, such as mean μ𝜇\muitalic_μ and variance σ𝜎\sigmaitalic_σ, entail more uncertainty and interaction. Therefore, searching for latent distribution should be viewed as a stochastic game.

When dealing with stochastic game problems, multi-agent reinforcement learning is often a good choice as it can effectively address interactions and competitions. Therefore, we select two agents to optimize the mean μ𝜇\muitalic_μ and variance σ𝜎\sigmaitalic_σ constituting the latent distribution, respectively. In the context of distributional MI attacks, these two agents aim to optimize the appropriate latent distribution for selecting appropriate latent code to reconstruct more privacy-preserving images. Even so, we cannot classify this task as entirely cooperative. A certain degree of competition exists between μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ. Maintaining this competitive dynamic enables them to enhance their performance while continuously striving for global optimality. This competitive relationship fosters flexibility in the optimization process, ultimately leading to improved MI attack performance. Therefore, we choose the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) [19] as the MARL agent for searching appropriate latent distribution.

III-A3 Overview

DBB-MI consists of three steps. Firstly, we train a GAN, where the GAN G𝐺Gitalic_G is initially trained on the public dataset Dpubsubscript𝐷𝑝𝑢𝑏D_{pub}italic_D start_POSTSUBSCRIPT italic_p italic_u italic_b end_POSTSUBSCRIPT. It’s important to note that the public dataset Dpubsubscript𝐷𝑝𝑢𝑏D_{pub}italic_D start_POSTSUBSCRIPT italic_p italic_u italic_b end_POSTSUBSCRIPT does not overlap with the private dataset Dprivsubscript𝐷𝑝𝑟𝑖𝑣D_{priv}italic_D start_POSTSUBSCRIPT italic_p italic_r italic_i italic_v end_POSTSUBSCRIPT used to train the target black-box model T𝑇Titalic_T. We neither need to select the images carefully nor to use the target model T𝑇Titalic_T for additional labelling of images in Dpubsubscript𝐷𝑝𝑢𝑏D_{pub}italic_D start_POSTSUBSCRIPT italic_p italic_u italic_b end_POSTSUBSCRIPT. Next, we optimize the initial random latent distribution to approximate the real latent distribution. Finally, we sample the latent code from the optimized high-dimensional latent distribution and input it into the GAN G𝐺Gitalic_G to reconstruct private data. The overall structure of DBB-MI is exhibited in Fig.2.

III-B MADDPG for Searching Latent Space Distribution

III-B1 MADDPG

The MADDPG [19] agent has two fundamental components: the actor and critic networks, as depicted in Fig.3. The actor network determines the actions to be executed by the agent based on the information observed by the agent as well as the current state of the system. The critic network is responsible for judging the value of actions and providing feedback (i.e., a reward signal) to the actor. This process enables the actor network to update its policies. Through iterations of the above operations, the agent gradually learns to optimize parameters μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ to minimize the discrepancy between the optimized and real latent distribution.

III-B2 Action

The actor network aggregates all the data the agent has observed, including the environment’s current state, other agents’ observations, and other pertinent information. Based on this information, the actor network makes decisions regarding the agent’s actions to optimize the latent distribution parameters μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ. We independently model the latent distribution of each dimension of the latent code and then sample each dimension independently from these latent distributions. Fig.2 shows that μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ are sampled from the standard normal distribution to form the initial random distribution N(μ,σ)𝑁𝜇𝜎N(\mu,\sigma)italic_N ( italic_μ , italic_σ ). In addition, actions actionμ𝑎𝑐𝑡𝑖𝑜subscript𝑛𝜇action_{\mu}italic_a italic_c italic_t italic_i italic_o italic_n start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and actionσ𝑎𝑐𝑡𝑖𝑜subscript𝑛𝜎action_{\sigma}italic_a italic_c italic_t italic_i italic_o italic_n start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT are selected by Actorμ𝐴𝑐𝑡𝑜subscript𝑟𝜇Actor_{\mu}italic_A italic_c italic_t italic_o italic_r start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and Actorσ𝐴𝑐𝑡𝑜subscript𝑟𝜎Actor_{\sigma}italic_A italic_c italic_t italic_o italic_r start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT according to the initial parameters.

μa=Actor(N(μ,σ2))subscript𝜇𝑎𝐴𝑐𝑡𝑜𝑟𝑁𝜇superscript𝜎2\mu_{a}=Actor(N(\mu,\sigma^{2}))italic_μ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = italic_A italic_c italic_t italic_o italic_r ( italic_N ( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) (2)

The following technique is employed to update actions:

μt+1=αμt+(1α)μasubscript𝜇𝑡1𝛼subscript𝜇𝑡1𝛼subscript𝜇𝑎\mu_{t+1}=\alpha\mu_{t}+(1-\alpha)\mu_{a}italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_α italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α ) italic_μ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT (3)

Previous research [25, 28] has demonstrated that exploring latent space impacts the diversity and accuracy of reconstructed images. So, we introduce a parameter α𝛼\alphaitalic_α to balance accuracy and diversity. A small α𝛼\alphaitalic_α value is employed in the early stages of searching to encourage the agent to optimize the distribution, broadening the search scope. As the training process advances, the latent distribution optimized by agents eventually approaches the real latent space distribution. The α𝛼\alphaitalic_α is gradually increased to mitigate the diversity of the generated images. The optimization procedure enables agents to refine the latent distribution further to generate a latent distribution closely related to natural images.

III-B3 Reward

The critic network evaluates the value and utility of the actions by measuring their consistency with the target task. When optimizing the latent distribution, the critic network provides feedback or rewards to motivate agents to take actions to enable it toward the real latent space. Actions that steer the latent distribution toward the real latent space will be rewarded more significantly; otherwise, only a lower or no reward will be given.

As evidenced in Fig.3, the critic network considers the effect of actions on the state of the environment. It is updated by incorporating environmental feedback, actual rewards, and estimated action rewards. These help to improve the estimation of rewards by the critic networks. The agents move closer to the true latent distribution by optimizing actor and critic networks. When the current distribution is closer to the real latent distribution, the images generated from that distribution will have higher confidence in the target network T𝑇Titalic_T. Therefore, the reward can be calculated as follows:

rt+1=log[Tl(G(zt+1N(μt+1,σt+12)))]subscript𝑟𝑡1𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺subscript𝑧𝑡1𝑁subscript𝜇𝑡1superscriptsubscript𝜎𝑡12r_{t+1}=log[T_{l}(G(z_{t+1}~{}N(\mu_{t+1},{\sigma_{t+1}}^{2})))]italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT italic_N ( italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) ) ] (4)
ra=log[Tl(G(zaN(μa,σa2)))]subscript𝑟𝑎𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺subscript𝑧𝑎𝑁subscript𝜇𝑎superscriptsubscript𝜎𝑎2r_{a}=log[T_{l}(G(z_{a}~{}N(\mu_{a},{\sigma_{a}}^{2})))]italic_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_N ( italic_μ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) ) ] (5)

where rt+1subscript𝑟𝑡1r_{t+1}italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT denotes the score of images generated from the latent space of the new distribution after performing actions, and rasubscript𝑟𝑎r_{a}italic_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the reward that agents receive after performing actions.

In optimizing the latent distribution, it is necessary to compute rewards individually for each action based on its effectiveness and impact on the target. The reward calculation way helps improve the precision of dynamic adjustments in the optimization process, as formulated in the following:

rμ=log[Tl(G(zμN(μt+1,σt2)))]subscript𝑟𝜇𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺subscript𝑧𝜇𝑁subscript𝜇𝑡1subscriptsuperscript𝜎2𝑡r_{\mu}=log[T_{l}(G(z_{\mu}~{}N(\mu_{t+1},\sigma^{2}_{t})))]italic_r start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT = italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT italic_N ( italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ) ] (6)
rσ=log[Tl(G(zσN(μt,σt+12)))]subscript𝑟𝜎𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺subscript𝑧𝜎𝑁subscript𝜇𝑡subscriptsuperscript𝜎2𝑡1r_{\sigma}=log[T_{l}(G(z_{\sigma}~{}N(\mu_{t},\sigma^{2}_{t+1})))]italic_r start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT italic_N ( italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) ) ] (7)

We also introduce the penalty factor rcsubscript𝑟𝑐r_{c}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT to penalize instances where the generated image is irrelevant to the target category. This could help agents perform actions to improve the quality of the generated image while reducing interference from non-target categories. The penalty term can assist agents in optimizing the distribution so that the reconstructed images are closely related to the target category, yielding superior-quality images. The penalty factor rcsubscript𝑟𝑐r_{c}italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is defined below.

rc=max(ε,pl)subscript𝑟𝑐𝑚𝑎𝑥𝜀subscript𝑝𝑙r_{c}=max(\varepsilon,-p_{l})italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_m italic_a italic_x ( italic_ε , - italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) (8)
Refer to caption
Figure 4: The overall steps of one-dimensional latent distribution search. From random latent distribution to real latent distribution.

The threshold ε𝜀\varepsilonitalic_ε is introduced to prevent agents from obtaining additional rewards. When the negative log probability (plsubscript𝑝𝑙-p_{l}- italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT) of the target label of the generated image exceeds the specific threshold, an additional penalty is imposed. This penalty factor ensures that the generated image is more relevant to the target category and that images are distinguishable enough to avoid confusion with non-target categories.

To sum up, we can calculate rewards for agentμ and agentσ as follows:

Rμ=w1rt+1+w2ra+w3rμ+w4rcsubscript𝑅𝜇subscript𝑤1subscript𝑟𝑡1subscript𝑤2subscript𝑟𝑎subscript𝑤3subscript𝑟𝜇subscript𝑤4subscript𝑟𝑐R_{\mu}=w_{1}r_{t+1}+w_{2}r_{a}+w_{3}r_{\mu}+w_{4}r_{c}italic_R start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT (9)
Rσ=w1rt+1+w2ra+w3rσ+w4rcsubscript𝑅𝜎subscript𝑤1subscript𝑟𝑡1subscript𝑤2subscript𝑟𝑎subscript𝑤3subscript𝑟𝜎subscript𝑤4subscript𝑟𝑐R_{\sigma}=w_{1}r_{t+1}+w_{2}r_{a}+w_{3}r_{\sigma}+w_{4}r_{c}italic_R start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT (10)

where wn represents the weight of rn.

III-B4 Distribution optimization

As shown in Fig.4, the search is conducted on the high-dimensional latent distribution using MADDPG. Specifically, the μ={μ1,μ2,,μn}𝜇subscript𝜇1subscript𝜇2subscript𝜇𝑛\mu=\{\mu_{1},\mu_{2},...,\mu_{n}\}italic_μ = { italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } and σ={σ1,σ2,,σn}𝜎subscript𝜎1subscript𝜎2subscript𝜎𝑛\sigma=\{\sigma_{1},\sigma_{2},...,\sigma_{n}\}italic_σ = { italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } are sampled from an n𝑛nitalic_n-dimensional normal distribution. After that, they are paired together to form the initial n𝑛nitalic_n-dimensional high-dimensional latent distribution as follows:

L=N(μ,σ2);μN(0,I),σN(0,I)formulae-sequence𝐿𝑁𝜇superscript𝜎2formulae-sequencesimilar-to𝜇𝑁0𝐼similar-to𝜎𝑁0𝐼L=N(\mu,\sigma^{2});\mu\sim N(0,I),\sigma\sim N(0,I)italic_L = italic_N ( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ; italic_μ ∼ italic_N ( 0 , italic_I ) , italic_σ ∼ italic_N ( 0 , italic_I ) (11)

The MADDPG algorithm, involving Agentμ𝐴𝑔𝑒𝑛subscript𝑡𝜇Agent_{\mu}italic_A italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and Agentσ𝐴𝑔𝑒𝑛subscript𝑡𝜎Agent_{\sigma}italic_A italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, optimizes the initial high-dimensional latent distribution. Agentμ𝐴𝑔𝑒𝑛subscript𝑡𝜇Agent_{\mu}italic_A italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and Agentσ𝐴𝑔𝑒𝑛subscript𝑡𝜎Agent_{\sigma}italic_A italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT select actions based on the current initial distribution, and these actions are formulated below.

μa={μa1,μa2,,μan}subscript𝜇𝑎subscript𝜇𝑎1subscript𝜇𝑎2subscript𝜇𝑎𝑛\mu_{a}=\{\mu_{a1},\mu_{a2},...,\mu_{an}\}italic_μ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = { italic_μ start_POSTSUBSCRIPT italic_a 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_a 2 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_a italic_n end_POSTSUBSCRIPT } (12)
σa2={σa12,σa22,,σan2}superscriptsubscript𝜎𝑎2superscriptsubscript𝜎𝑎12superscriptsubscript𝜎𝑎22superscriptsubscript𝜎𝑎𝑛2\sigma_{a}^{2}=\{\sigma_{a1}^{2},\sigma_{a2}^{2},\ldots,\sigma_{an}^{2}\}italic_σ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = { italic_σ start_POSTSUBSCRIPT italic_a 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT italic_a 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_a italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } (13)

The chosen actions facilitate the latent distribution L𝐿Litalic_L towards the real latent distribution, resulting in a highly optimized latent distribution L=N(μ,σ2)superscript𝐿𝑁superscript𝜇superscriptsuperscript𝜎2L^{\prime}=N(\mu^{\prime},{\sigma^{\prime}}^{2})italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_N ( italic_μ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). The latent code zsuperscript𝑧z^{\prime}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is ultimately extracted from Lsuperscript𝐿L^{\prime}italic_L start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

z={z1,z2,,zn}superscript𝑧subscriptsuperscript𝑧1subscriptsuperscript𝑧2subscriptsuperscript𝑧𝑛z^{\prime}=\{z^{\prime}_{1},z^{\prime}_{2},...,z^{\prime}_{n}\}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } (14)

where zisubscriptsuperscript𝑧𝑖z^{\prime}_{i}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT obeys the distribution N(μi,σi2)𝑁subscriptsuperscript𝜇𝑖superscriptsubscriptsuperscript𝜎𝑖2N(\mu^{\prime}_{i},{\sigma^{\prime}_{i}}^{2})italic_N ( italic_μ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

The above optimization method of high-dimensional latent distribution can avoid the limitation caused by directly searching the latent distribution space. Through MARL methods, an effective search of the latent distribution can be achieved with only limited model outputs, without the need for any additional information about the model. Consequently, in a black-box setting, it broadens the search range in the latent distribution space, thereby enhancing the ability to identify the real latent space of the target and ultimately improving the accuracy of the reconstructed sensitive data.

III-C MADDPG for Agent Training

The MADDPG trains two agents, enabling them to cooperate and compete in a predefined image generation environment. Agents select specific actions to optimize the initial latent distribution toward the real latent distribution. The key lies in training the agents, which specifically involves the following steps:

  • Step 1: After observing the randomly constructed initial distribution N(μ,σ2)𝑁𝜇superscript𝜎2N(\mu,\sigma^{2})italic_N ( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), the agents select corresponding actions to execute;

  • Step 2: The rewards for the decisions made by the agent are computed and stored in the replay buffer B𝐵Bitalic_B along with the agent’s observation information;

  • Step 3: When there are enough experiences in the replay buffer B𝐵Bitalic_B, a batch is sampled to update the Agent, and the updated Agent is returned.

The private data hidden in the target network is reconstructed. Details about agent training are provided in Algorithm 1.

Algorithm 1 MADDPG for Agent Training
0:  Target Model: T𝑇Titalic_T, Target Label: l𝑙litalic_l, GANs: G𝐺Gitalic_G
0:  Trained agents: agentμ𝑎𝑔𝑒𝑛subscript𝑡𝜇agent_{\mu}italic_a italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, agentσ𝑎𝑔𝑒𝑛subscript𝑡𝜎agent_{\sigma}italic_a italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT
1:  Initialize new agentμ𝑎𝑔𝑒𝑛subscript𝑡𝜇agent_{\mu}italic_a italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, agentσ𝑎𝑔𝑒𝑛subscript𝑡𝜎agent_{\sigma}italic_a italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, replay buffer B𝐵Bitalic_B
2:  for round=1𝑟𝑜𝑢𝑛𝑑1round=1italic_r italic_o italic_u italic_n italic_d = 1 to max_rounds𝑚𝑎𝑥_𝑟𝑜𝑢𝑛𝑑𝑠max\_roundsitalic_m italic_a italic_x _ italic_r italic_o italic_u italic_n italic_d italic_s do
3:     Initialize n𝑛nitalic_n dims vector μtsubscript𝜇𝑡\mu_{t}italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
4:     for all ω𝜔\omegaitalic_ω in μ,σ𝜇𝜎\mu,\sigmaitalic_μ , italic_σ do
5:        ωaActorω(N(μt,σt2))subscript𝜔𝑎𝐴𝑐𝑡𝑜subscript𝑟𝜔𝑁subscript𝜇𝑡superscriptsubscript𝜎𝑡2\omega_{a}\leftarrow Actor_{\omega}(N(\mu_{t},\sigma_{t}^{2}))italic_ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ← italic_A italic_c italic_t italic_o italic_r start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_N ( italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) )
6:        ωr+1αωt+(1α)ωasubscript𝜔𝑟1𝛼subscript𝜔𝑡1𝛼subscript𝜔𝑎\omega_{r+1}\leftarrow\alpha\omega_{t}+(1-\alpha)\omega_{a}italic_ω start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT ← italic_α italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α ) italic_ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
7:        rt+1log[Tl(G(zt+1N(μt+1,σt+12)))]subscript𝑟𝑡1𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺similar-tosubscript𝑧𝑡1𝑁subscript𝜇𝑡1superscriptsubscript𝜎𝑡12r_{t+1}\leftarrow log[T_{l}(G(z_{t+1}\sim N(\mu_{t+1},\sigma_{t+1}^{2})))]italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ← italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∼ italic_N ( italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) ) ]
8:        ralog[Tl(G(zaN(μa,σa2)))]subscript𝑟𝑎𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺similar-tosubscript𝑧𝑎𝑁subscript𝜇𝑎superscriptsubscript𝜎𝑎2r_{a}\leftarrow log[T_{l}(G(z_{a}\sim N(\mu_{a},\sigma_{a}^{2})))]italic_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ← italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∼ italic_N ( italic_μ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) ) ]
9:        // Obtaining zωsubscript𝑧𝜔z_{\omega}italic_z start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT under different ω𝜔\omegaitalic_ω
10:        //zμN(μt+1,σt2),zσN(μt,σt+12)//z_{\mu}\sim N(\mu_{t+1},\sigma_{t}^{2}),z_{\sigma}\sim N(\mu_{t},\sigma_{t+1% }^{2})/ / italic_z start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ∼ italic_N ( italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_z start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∼ italic_N ( italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).
11:        rωlog[Tl(G(zω))]subscript𝑟𝜔𝑙𝑜𝑔delimited-[]subscript𝑇𝑙𝐺subscript𝑧𝜔r_{\omega}\leftarrow log[T_{l}(G(z_{\omega}))]italic_r start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ← italic_l italic_o italic_g [ italic_T start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_G ( italic_z start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) ) ]
12:        rcmax(ε,pl)subscript𝑟𝑐𝑚𝑎𝑥𝜀subscript𝑝𝑙r_{c}\leftarrow max(\varepsilon,-p_{l})italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ← italic_m italic_a italic_x ( italic_ε , - italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT )
13:        Rωw1rt+1+w2ra+w3rω+w4rcsubscript𝑅𝜔subscript𝑤1subscript𝑟𝑡1subscript𝑤2subscript𝑟𝑎subscript𝑤3subscript𝑟𝜔subscript𝑤4subscript𝑟𝑐R_{\omega}\leftarrow w_{1}r_{t+1}+w_{2}r_{a}+w_{3}r_{\omega}+w_{4}r_{c}italic_R start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ← italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
14:     end for
15:     Add (μtsubscript𝜇𝑡\mu_{t}italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT,μasubscript𝜇𝑎\mu_{a}italic_μ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT,μt+1subscript𝜇𝑡1\mu_{t+1}italic_μ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT,Rμsubscript𝑅𝜇R_{\mu}italic_R start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT,σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT,σasubscript𝜎𝑎\sigma_{a}italic_σ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT,σt+1subscript𝜎𝑡1\sigma_{t+1}italic_σ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT,Rσsubscript𝑅𝜎R_{\sigma}italic_R start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT) to B𝐵Bitalic_B.
16:     if len(B)>max_len𝑙𝑒𝑛𝐵𝑚𝑎𝑥_𝑙𝑒𝑛len(B)>max\_lenitalic_l italic_e italic_n ( italic_B ) > italic_m italic_a italic_x _ italic_l italic_e italic_n then
17:        Sample a random mini-batch from B𝐵Bitalic_B.
18:        Calculate the actor loss and critic loss.
19:        Update the actor and critic networks.
20:     end if
21:  end for
22:  return  agentμ𝑎𝑔𝑒𝑛subscript𝑡𝜇agent_{\mu}italic_a italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT,agentσ𝑎𝑔𝑒𝑛subscript𝑡𝜎agent_{\sigma}italic_a italic_g italic_e italic_n italic_t start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT

IV Experiment

In this section, we primarily analyze the attack performance of DBB-MI on different datasets and target networks. Additionally, we analyze the distributional attack and investigate some factors that may affect the attack performance.

IV-A Experimental setting

IV-A1 Dataset

Four distinct face datasets that represent a variety of situations are used to evaluate the effectiveness and breadth of DBB-MI. Additionally, we conducted experiments on the MNIST dataset to assess the applicability of our approach to other types of datasets.

  • CelebFaces Attributes Dataset (CelebA) [29]. It contains 202,599 photos of 10,177 different celebrities.

  • FaceScrub [30]. It includes 106,863 images of 530 individuals with an even gender distribution.

  • Pubfig83 [31]. It consists of 13,600 images of 83 individuals. These images were taken in regulated real-world environments with significant variations in lighting, expressions, and other attributes.

  • Flickr-Faces-HQ (FFHQ) [32]. It comprises 70,000 high-quality face images with significant age, expression, and ethnicity variations.

  • MNIST. It encompasses 70,000 handwritten digits (0 through 9), having different structures and features from facial datasets.

CelebA, FaceScrub, Pubfig83, and MNIST are divided into two parts: public and private datasets. The public dataset is utilized to train GAN, while the private dataset is employed to train the target classification model. It should be emphasized that there is no overlap of the same identities or images between public and private datasets. Thus, it can be assumed that the trained GAN does not directly contain any original private information. Additionally, since public and private datasets in the same dataset have similar statistical properties, the FFHQ is used as an independent extra dataset to evaluate the performance of MI attacks under various distribution conditions. It allows for a comprehensive assessment of MI attacks’ robustness and generalization capabilities.

IV-A2 Target Models

Like previous studies [12, 11, 27, 25], we utilize three popular face recognition networks for evaluation; namely FaceNet64 [33], ResNet-152 [34], and VGG16 [35]. These networks are employed to assess the impact of MI attacks on models with different architectures. The generalization and robustness of DBB-MI are better evaluated using varied face recognition models.

IV-A3 Baselines

We select some representative state-of-the-art white-box and black-box MI attacks as baselines for comparison. Specifically, we choose the Generative Model Inversion (GMI) attacks [11] and Knowledge-Enriched Distributional Model Inversion (KED-MI) [12] attacks as white-box MI attacks. GMI is the first MI attack for deep networks, while KED-MI is a distributional MI attack. Meanwhile, we employ the Reinforcement Learning-based Black-box Model Inversion (RLB-MI) attacks [25] and Model Inversion for deep learning Network (MIRROR) [24], representing the advanced black-box MI attack. We also select the Boundary-Repelling Model Inversion (BERP-MI) attacks [27], the only and most advanced label-only MI attack. These black-box MI attacks represent the current state-of-the-art (SOTA) in GAN-based MI attacks that directly search for latent code.

All models undergo identical dataset training, and the same evaluation models assess all experimental results to ensure fair comparisons. GMI, RLB-MI, MIRROR, and BERP-MI utilize the same GAN as DBB-MI. In addition, the GAN is trained for KED-MI using the specified requirements and an identical dataset to that of DBB-MI. It allows a fair and objective comparison between KED-MI and DBB-MI.

Refer to caption
Figure 5: The images reconstructed by different MI attacks under CelebA and VGG16. The top row displays the real images, the middle two rows show the images reconstructed by the white-box MI baselines, and the bottom four rows exhibit the images reconstructed by the black-box MI baselines and our method DBB-MI.

IV-A4 Implementation details

The same hyperparameters are used to train the GAN and target network like previous studies [12, 11, 27, 25]. For MADDPG, some important parameters are set as follows:

  • learning rate: 1e-3

  • discount factor: 0.99

  • target network update rate: 5×103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT

  • experience replay buffer size: 1×106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT

  • batch size: 256

  • training episodes: 4×104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT

IV-A5 Evaluation metrics

Like prior work [11], the effectiveness of MI attacks is evaluated using quantitative criteria, including attack accuracy (ACC) and K-nearest neighbor feature distance (KNN Dist). Furthermore, the Peak Signal-to-Noise Ratio (PSNR) is utilized to assess the resemblance between reconstructed and original images.

Attack Accuracy:This metric measures the probability of successfully reconstructing private data through an attack. The key with [12, 25] difference is that we only consider the attack successful when both the target model and the additional discriminative model agree that the generated image belongs to the target class. This approach enhances the accuracy of attack success rate assessment, ensuring that the generated images deceive the target model and exhibit high-quality facial features, reducing cases where noisy images are mistakenly identified as the target class.

KNN Dist: This metric measures the similarity of features between the generated reconstructed images and real private images. To calculate the KNN Dist, features are first extracted from the fully connected layer of the evaluation classifier for both the generated reconstructed images and the real private images. Then, their similarity in the feature space is assessed by calculating the L2 distance between these two sets of features.

PSNR: This metric measures the difference between two images. It evaluates the quality and similarity between the reconstructed and real private images by calculating their PSNR value. A higher PSNR value indicates less difference between the two images, implying a higher similarity between the reconstructed and real private images.

TABLE I: The experimental results of MI attacks on different target models trained CelebA. The symbols \uparrow and \downarrow denote that higher and lower scores give better attack performance, respectively. The best-performing attack metrics are marked in bold.
Model Typ Method ACC\uparrow PSNR\uparrow KNN Dist\downarrow
VGG16 White-box GMI 0.194 12.3 1521.05
KED-MI 0.684 14.59 1258.65
Black-box MIRROR 0.452 14.21 1358.20
BERP-MI 0.562 13.36 1872.48
RLB-MI 0.642 15.80 1262.34
DBB-MI 0.858 20.66 1180.63
FaceNet64 White-box GMI 0.298 15.78 1584.24
KED-MI 0.766 16.35 1411.56
Black-box MIRROR 0.528 15.09 1308.40
BERP-MI 0.734 13.69 1685.29
RLB-MI 0.804 16.27 1354.86
DBB-MI 0.916 18.06 1091.15
ResNet-152 White-box GMI 0.340 15.36 1752.15
KED-MI 0.826 16.52 1130.05
Black-box MIRROR 0.640 15.91 1254.90
BERP-MI 0.754 13.17 1745.73
RLB-MI 0.812 15.23 1308.69
DBB-MI 0.898 17.37 1063.38

IV-B Comparison with state-of-the-art MI attacks

IV-B1 Performance evaluation on different target models

Table I shows the experimental results of our method and baselines under different target models. As seen in Table I, DBB-MI exhibits a notable superiority compared to state-of-the-art white-box and black-box MI attacks regarding ACC, KNN Dist, and PSNR. Using the target model VGG16 as an example, the ACC of DBB-MI improves 25.4% over KED-MI and 33.6% over RLB-MI. This is because DBB-MI fully explores the latent space by optimizing the latent distribution and obtaining more private data about the target. The experimental results in Table I demonstrate that DBB-MI is more effective in targeting different target models and poses more severe privacy leakage risks. This indicates that our method outperforms white-box distributional attacks and achieves SOTA black-box attack performance.

In addition, we also compare the images reconstructed by DBB-MI with those rebuilt by baselines. As displayed in Fig.5, the images recovered by DBB-MI are closer to the original ones compared to those recovered by baselines; they have similar details and colors as the original. It is attributed to the efficiency of DBB-MI in searching the latent space, allowing it to capture more private information. The above experimental results prove that DBB-MI outperforms the state-of-the-art white-box and black-box MI attacks for various target models in terms of multiple performance evaluation metrics and visualization.

TABLE II: The experimental results of MI attacks on different datasets.
Dataset Typ Method ACC\uparrow PSNR\uparrow KNN Dist\downarrow
CelebA White-box GMI 0.298 15.78 1584.24
KED-MI 0.766 16.35 1411.56
Black-box MIRROR 0.528 15.09 1308.40
BERP-MI 0.734 13.69 1685.29
RLB-MI 0.804 16.27 1354.86
DBB-MI 0.916 18.06 1091.15
FaceScurb White-box GMI 0.080 17.05 2729.06
KED-MI 0.355 20.31 2682.69
Black-box MIRROR 0.325 18.86 2710.11
BERP-MI 0.305 20.81 2684.91
RLB-MI 0.420 19.23 2693.85
DBB-MI 0.375 23.03 2661.82
Pubfig83 White-box GMI 0.100 10.26 2580.71
KED-MI 0.380 15.25 2363.12
Black-box MIRROR 0.300 13.25 2410.62
BERP-MI 0.400 13.10 2492.99
RLB-MI 0.400 16.37 2349.84
DBB-MI 0.560 17.04 2342.47

IV-B2 Performance evaluation on different dataset

Table II presents the experimental results of DBB-MI and baselines under different datasets. It can be observed from Table II that DBB-MI beats baselines in all performance evaluation metrics. Taking CelebA as a case study, DBB-MI’s ACC is 19.5% and 13.9% higher than KED-MI and RLB-MI, respectively.

The experimental outcomes obtained by all MI attacks on CelebA are superior to those on FaceScrub and Pubfig83. This is because CelebA contains more identity categories and data information, giving the trained model stronger classification capabilities and, therefore, more vulnerability to attacks. Additionally, DBB-MI outperforms all baselines on all datasets except FaceScrub. On FaceScrub, the ACC of DBB-MI is slightly lower than that of RLB-MI, but the PSNR and KNN Dist of DBB-MI are better than those of RLB-MI. Thus, DBB-MI outperforms RLB-MI in most performance evaluation metrics.

TABLE III: The experimental results of MI attacks using the GAN trained on FFHQ.
Dataset Typ Method ACC\uparrow PSNR\uparrow KNN Dist\downarrow
CelebA White-box GMI 0.114 15.35 1431.45
KED-MI 0.408 17.52 1035.31
Black-box MIRROR 0.286 16.66 1242.91
BERP-MI 0.398 15.01 1331.25
RLB-MI 0.402 16.76 925.61
DBB-MI 0.532 16.04 1002.51
FaceScrub White-box GMI 0.150 11.33 2892.42
KED-MI 0.315 15.28 2856.36
Black-box MIRROR 0.285 15.52 2878.36
BERP-MI 0.295 15.97 2859.94
RLB-MI 0.355 16.56 2866.43
DBB-MI 0.490 17.04 2817.85
PubFig83 White-box GMI 0.080 11.16 2684.99
KED-MI 0.340 14.59 2415.76
Black-box MIRROR 0.300 14.27 2459.85
BERP-MI 0.380 15.36 2391.67
RLB-MI 0.360 16.05 2402.5
DBB-MI 0.700 16.55 2387.56
Refer to caption
Figure 6: The reconstruction results of MNIST.
Refer to caption
(a) Initial Distribution
Refer to caption
(b) Final Distribution
Figure 7: The initial and final latent space distribution. (a) depicts the initial latent distribution, and (b) shows the resulting latent distribution after optimization.
Refer to caption
Figure 8: Distribution of different accuracy levels. The results obtained from randomly attacking 200 target labels.

IV-B3 Performance evaluation on cross-dataset

In previous experiments, we utilized a dataset with similar statistical properties and feature distributions as the dataset used to train the target model to train GAN. However, obtaining a dataset with similar distributions to the target dataset in real-world settings is challenging. Therefore, it is imperative to train GAN using an extra dataset.

We train GAN on the extra dataset, FFHQ. Meanwhile, we employ the FaceNet64 trained under CelebA, FaceScrub, and Pubfig83 as the target models and utilize the FaceNet trained on these datasets as the evaluation models. Table III presents the experimental results of DBB-MI and baselines across datasets. It is easy to see that DBB-MI has superior performance compared to baselines across most datasets. The reason for this can be linked to the comprehensive exploration of the latent space in DBB-MI, which has resulted in the acquisition of more private data about the target. In addition, DBB-MI has the highest ACC on CelebA, while its PSNR and KNN Dist are slightly worse than the best. This implies that DBB-MI still has room for performance improvement.

In summary, DBB-MI exhibits superior performance across diverse target models, datasets, and cross-dataset scenarios. It outperforms state-of-the-art white-box and black-box MI attacks regarding ACC, KNN Dist, PSNR, and visualization. These findings confirm the effectiveness of the latent distribution exploration in DBB-MI, which is vital for improving model security and privacy protection.

IV-B4 Performance evaluation on MNIST

To demonstrate that our approach is practical for face datasets, we also evaluated it on the MNIST dataset. For the MNIST dataset, we attacked models of different depths, including ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, and ResNet-SIM. ResNet-SIM consists of two convolutional layers, one max-pooling layer, two residual blocks, and one fully connected layer, while the remaining network structures conform to [34]. We achieved a 100% attack success rate across different networks, and features of different digits could be reconstructed. Specific reconstruction results are shown in Fig 6.

IV-C Analysis of distributional attacks

IV-C1 Analysis of the changes of the latent distribution

Fig. 7a depicts the latent distribution before attacking, in which each point represents a specific dimension of the latent distribution. During an attack, the latent distribution of each dimension is optimized by the MADDPG to explore the latent space effectively. Fig.7b illustrates the latent distribution after attacking, in which each dimension of the latent distribution displays a distinct pattern. This implies that distinct privacy features are evident in each dimension of latent distribution. Moreover, it also proves the rationality of DBB-MI, i.e., optimizing each dimension of the latent distribution independently to enable the latent distribution to approach the true one.

IV-C2 Evaluating Distribution Accuracy

We conducted 500 random samples from the optimized latent distribution to obtain reconstructed samples. We calculated the proportion of recreated samples that matched the target labels to measure the accuracy of the optimized distribution. The accuracy of the optimized distribution was assessed by calculating the fraction of rebuilt samples that corresponded to the target labels, which is referred to as distributional accuracy. 68% of all latent distributions have an accuracy exceeding 0.5, indicating that more than half of the randomly sampled samples can be deemed successful reconstructions. As displayed in Fig.8, the specific results indicate that the optimized latent distribution demonstrates strong performance in effectively exploring the target label information.

Refer to caption
Figure 9: Actual accuracy levels.

IV-C3 Evaluating actual attack performance under different distributional accuracy levels

To depict the real attack accuracy, we performed 10,000 random samples for all optimized latent distributions. These samples were then tested for top-1 accuracy, which indicates successful reconstruction, as well as top-5 accuracy, which signifies the target label ranking among the top 5 out of all 1,000 classes. When the latent distribution accuracy exceeds 24%, our latent distribution’s top-5 accuracy exceeds 50%. The observed top-1 accuracy closely aligns with the accuracy of the tested distribution, suggesting that the optimized distribution demonstrates consistent performance in all reconstruction tasks. More details as shown in Fig.9.

Refer to caption
Figure 10: The relationship between distributional accuracy and reconstruction accuracy.

IV-C4 Evaluating sample reconstruction confidence under varying distributional accuracies

For testing, we selected samples from 10,000 random samplings of optimized latent distributions with varying accuracies. We evaluated the reconstructed samples using the target model to determine their corresponding label confidences, as depicted in Fig.10. When the distributional accuracy reached 0.632, it was observed that 50.59% of the rebuilt samples achieved a confidence level of 0.5, while 36.10% exhibited a confidence level of 0.75. However, upon attaining a distributional accuracy of 0.962, it was seen that 90.27% of the samples demonstrated a confidence level of 0.5, 78.58% exhibited a confidence level of 0.75, and 58.40% demonstrated a confidence level beyond 0.9. Despite the restricted distributional accuracy of 0.580, a significant proportion of the samples, specifically 47.51%, surpassed a confidence level of 0.5. Although there is an improvement in the accuracy of reconstructed samples as the distributional accuracy improves, it is important to note that certain samples still exhibit very low confidence levels. This may be attributed to the inadequate training of the GAN model employed in our study.

IV-D Analysis of factors affecting attack performance

Refer to caption
Figure 11: The effect of the distribution dimensions on ACC. The ACC is obtained using the target model VGG16 trained on CelebA.

IV-D1 Evaluating the impact of latent distribution dimensions

This work primarily concerns improving the performance of MI attacks by optimizing the latent distribution in the high-latitude latent space. Therefore, it is necessary to investigate the effect of varying latent distribution dimensions on the MI attack’s ACC. Fig.11 displays the ACC obtained by DBB-MI, with variations in latent distribution dimensions. As shown in this figure, the ACC of DBB-MI rises gradually with the increase of latent distribution dimensions. This suggests that searching high-dimensional latent distributions can explore the latent space more comprehensively. Hence, MI attacks need to choose the appropriate latent distribution dimension.

Refer to caption
(a) Various Structures
Refer to caption
(b) Various Datasets
Figure 12: The median number of iterations required for the generated images to first reach a specific test classification accuracy during the attack process. For (a), the dataset is defined as CelebA, and it presents experimental results for different network structures. For (b), the target network is defined as FaceNet64, and the GAN is trained on FFHQ and presents experimental results across various datasets.

IV-D2 Evaluating the impact of training episodes

To further investigate the factors influencing the performance of MI attacks, we assess the median number of iterations required for generating images to reach the specific test classification model’s complexity and the dataset’s size. Experimental findings were obtained by training various target models on CelebA. An MI attack has the highest search difficulty to target ResNet-152, the most complex target model. More iterations are required to achieve the test accuracy obtained on FaceNet64 and VGG16, as shown in Fig.12a. Fig.12b exhibits the experimental results obtained by utilizing a GAN trained on FFHQ to attack FaceNet64 trained on different datasets. An MI attack exhibits the lowest search difficulty and can quickly reach a specific test accuracy on PubFig83, while it needs a higher number of iterations on other datasets. Therefore, we can conclude that as the complexity of the model and the size of the dataset increase, the search difficulty of an MI attack increases, ultimately leading to a decrease in the attack performance.

Refer to caption
Figure 13: The reward variation of different agents under various episodes. IQL, VDN, and MADDPG represent three different RL agents.
TABLE IV: The experimental results of various reinforcement learning agents in MI attacks.
Agent ACC\uparrow PSNR\uparrow KNN Dist\downarrow
IQL 0.341 14.21 1728.39
VDN 0.462 16.35 1443.61
MADDPG 0.858 20.66 1180.63
Refer to caption
Figure 14: Comparison of original and reconstructed images. In each row, the green box represents the original images with the specific label, and the red box denotes the reconstructed images under the same specific label using DBB-MI. The numbers below the reconstructed images represent the corresponding softmax scores given by the evaluation classifier, indicating that these reconstructed images, to some extent, reveal the privacy information of the specific label.

IV-D3 Evaluating various reinforcement learning agents

DBB-MI heavily relies on the MADDPG to optimize the latent distribution of GAN and obtain more private information about the target. To verify the rationality of using MADDPG, IQL [17], as a form of fully competitive MARL, and VDN [15], as a form of fully cooperative MARL, are used to search for the latent distribution for model inversion. Meanwhile, the VGG16 model trained on CelebA is employed as the target network. Table IV lists the experimental results of MI attacks using various reinforcement learning agents. This table illustrates that the performance of MADDPG surpasses that of IQL and VDN. For example, MADDPG’s ACC is 150% higher than IQL and 85.7% higher than VDN, respectively. This is why DBB-MI utilizes MADDPG to optimize GAN’s high-dimensional latent space distribution. Furthermore, this also underscores that searching for suitable latent distributions from the latent space of GANs should be regarded as a semi-competitive, semi-cooperative form of MARL.

IV-D4 RL agent rewards

To further assess the performance difference between various agents in GAN-based MI attacks, we compare the reward changes during their training, as shown in Fig.13. As can be seen from the figure, both VDN [15], and IQN [17] exhibit consistently modest rewards, with fluctuations occurring around this baseline amount. In contrast, MADDPG [19] can achieve higher reward convergence, yielding more gratifying results. Therefore, MADDPG is more suitable for hidden space search in GAN-based MI attacks.

IV-D5 Evaluating the diverse image reconstruction

It is expected to find several images associated with the same label, depicting various stages or conditions of the same object. DBB-MI can reconstruct several diverse images for a given label, as displayed in Fig.14. We sample multiple latent codes from the finally optimized latent distribution to generate multiple images with different privacy attributes. Here, variations in facial expressions, hair, lighting circumstances, and other variables highlight diverse privacy features within the same label. The diverse image reconstruction capabilities of DBB-MI are very important for studying privacy protection. It exposes how an attacker can reconstruct different state information of a target, which also implies how to defend against this attack effectively.

V Conclusion

In this paper, we present a novel and effective Distributional Black-Box Model Inversion (DBB-MI) attack that does not require elaborate training of GAN. In a black-box setting, DBB-MI systematically explores the latent space of GAN with limited knowledge to identify the appropriate latent distribution. This is achieved through the utilization of a multi-agent reinforcement learning-based approach. It can accurately reconstruct the private data of the target model. A comprehensive assessment of the attack performance and generalization of DBB-MI is conducted through a series of experiments. The experimental results demonstrate that DBB-MI attains a level of performance comparable to the most advanced black-box attacks. Additionally, these results further validate the efficacy of distributional attacks in comparison to state-of-the-art MI attacks based on optimizing latent code.

References

  • [1] X. An, J. Deng, J. Guo, Z. Feng, X. Zhu, J. Yang, and T. Liu, “Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial fc,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4042–4051.
  • [2] Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang et al., “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17 853–17 862.
  • [3] J. Li, K. Sun, B. S. Huff, A. M. Bierley, Y. Kim, F. Schaub, and K. Fawaz, ““it’s up to the consumer to be smart”: Understanding the security and privacy attitudes of smart home users on reddit,” in Proceedings of the IEEE Symposium on Security and Privacy (S&\&&P), 2023, pp. 380–396.
  • [4] L. Zhou, G. Huang, Y. Mao, S. Wang, and M. Kaess, “Edplvo: Efficient direct point-line visual odometry,” in Proceedings of the International Conference on Robotics and Automation (ICRA), 2022, pp. 7559–7565.
  • [5] D. Cao, K. Wei, Y. Wu, J. Zhang, B. Feng, and J. Chen, “Fepn: A robust feature purification network to defend against adversarial examples,” Computers & Security, vol. 134, p. 103427, 2023.
  • [6] Z. Wei, J. Chen, M. Goldblum, Z. Wu, T. Goldstein, and Y.-G. Jiang, “Towards transferable adversarial attacks on vision transformers,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 3, 2022, pp. 2668–2676.
  • [7] Y. Chen, C. Shen, Y. Shen, C. Wang, and Y. Zhang, “Amplifying membership exposure via data poisoning,” Advances in Neural Information Processing Systems (NIPS), vol. 35, pp. 29 830–29 844, 2022.
  • [8] A. Tejankar, M. Sanjabi, Q. Wang, S. Wang, H. Firooz, H. Pirsiavash, and L. Tan, “Defending against patch-based backdoor attacks on self-supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 12 239–12 249.
  • [9] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart, “Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in Proceedings of the 23rd USENIX security symposium (USENIX Security 14), 2014, pp. 17–32.
  • [10] Y. Yin, X. Zhang, H. Zhang, F. Li, Y. Yu, X. Cheng, and P. Hu, “Ginver: Generative model inversion attacks against collaborative inference,” in Proceedings of the ACM Web Conference (WWW), 2023, pp. 2122–2131.
  • [11] Y. Zhang, R. Jia, H. Pei, W. Wang, B. Li, and D. Song, “The secret revealer: Generative model-inversion attacks against deep neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 253–261.
  • [12] S. Chen, M. Kahla, R. Jia, and G.-J. Qi, “Knowledge-enriched distributional model inversion attacks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 16 178–16 187.
  • [13] X. Yuan, K. Chen, J. Zhang, W. Zhang, N. Yu, and Y. Zhang, “Pseudo label-guided model inversion attack via conditional generative adversarial network,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 37, no. 3, 2023, pp. 3349–3357.
  • [14] J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Proceedings of the Autonomous Agents and Multiagent Systems (AAMAS Workshops).   Springer, 2017, pp. 66–83.
  • [15] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2018, p. 2085–2087.
  • [16] Y. Zhu and D. Zhao, “Online minimax q network learning for two-player zero-sum markov games,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 3, pp. 1228–1241, 2020.
  • [17] A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017.
  • [18] J. Hu and M. P. Wellman, “Nash q-learning for general-sum stochastic games,” Journal of machine learning research, vol. 4, no. Nov, pp. 1039–1069, 2003.
  • [19] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in Neural Information Processing Systems (NIPS), 2017.
  • [20] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS), 2015, pp. 1322–1333.
  • [21] Z. Yang, J. Zhang, E.-C. Chang, and Z. Liang, “Neural network inversion in adversarial setting via background knowledge alignment,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019, pp. 225–240.
  • [22] A. Salem, A. Bhattacharya, M. Backes, M. Fritz, and Y. Zhang, “Updates-leak: Data set inference and reconstruction attacks in online learning,” in Proceedings of the 29th USENIX security symposium (USENIX Security 20), 2020, pp. 1291–1308.
  • [23] Z. Zhang, X. Wang, J. Huang, and S. Zhang, “Analysis and utilization of hidden information in model inversion attacks,” IEEE Transactions on Information Forensics and Security, 2023.
  • [24] S. An, G. Tao, Q. Xu, Y. Liu, G. Shen, Y. Yao, J. Xu, and X. Zhang, “Mirror: Model inversion for deep learning network with high fidelity,” in Proceedings of the 29th Network and Distributed System Security Symposium (NDSS), 2022.
  • [25] G. Han, J. Choi, H. Lee, and J. Kim, “Reinforcement learning-based black-box model inversion attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 20 504–20 513.
  • [26] T. Zhu, D. Ye, S. Zhou, B. Liu, and W. Zhou, “Label-only model inversion attacks: Attack with the least information,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 991–1005, 2022.
  • [27] M. Kahla, S. Chen, H. A. Just, and R. Jia, “Label-only model inversion attacks via boundary repulsion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15 045–15 053.
  • [28] K.-C. Wang, Y. Fu, K. Li, A. Khisti, R. Zemel, and A. Makhzani, “Variational model inversion attacks,” Advances in Neural Information Processing Systems (NIPS), vol. 34, pp. 9706–9719, 2021.
  • [29] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3730–3738.
  • [30] H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face datasets,” in Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), 2014, pp. 343–347.
  • [31] N. Pinto, Z. Stone, T. Zickler, and D. Cox, “Scaling up biologically-inspired computer vision: A case study in unconstrained face recognition on facebook,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.   IEEE, 2011, pp. 35–42.
  • [32] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410.
  • [33] Y. Cheng, J. Zhao, Z. Wang, Y. Xu, K. Jayashree, S. Shen, and J. Feng, “Know you at one glance: A compact vector representation for low-shot learning,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2017, pp. 1924–1932.
  • [34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • [35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.