MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias

Guorun Wang
Imperial College London
[email protected]
&Lucia Specia
Imperial College London
[email protected]
Abstract

Text-to-image models are known to propagate social biases. For example when prompted to generate images of people in certain professions, these models tend to systematically generate specific genders or ethnicity. In this paper, we show that this bias is already present in the text encoder of the model and introduce a Mixture-of-Experts approach by identifying text-encoded bias in the latent space and then creating a bias-identification gate. More specifically, we propose MoESD (Mixture of Experts Stable Diffusion) with BiAs (Bias Adapters) to mitigate gender bias. We also demonstrate that a special token is essential during the mitigation process. With experiments focusing on gender bias, we demonstrate that our approach successfully mitigates gender bias while maintaining image quality.

MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias


Guorun Wang Imperial College London [email protected]                        Lucia Specia Imperial College London [email protected]


1 Introduction

In recent years, large language and vision models such as ChatGPT 4 OpenAI (2023), DALL·E Ramesh et al. (2021) and Stable Diffusion Rombach et al. (2022); Podell et al. (2023) have ushered the era of AI generated content. However, research has shown that these generative models often exhibit social biases during the content generation process, especially in text-to-image generation. For instance, models tends to generate more images of man when provided with prompts like “a successful CEO”, and more images of women when provided with prompts like “a paralegal” Friedrich et al. (2023); Luccioni et al. (2023).

To mitigate such biases, current methodologies can be broadly categorized into three debiasing paradigms:

  • Pre-processing the training data to remove bias before training.

  • Prompt-engineering to restrain the model generation at the deployment stage.

  • Enforcing fairness on model weights by introducing constraints on the learning objective during training.

In pre-processing methods, eliminating bias in the training corpus is a difficult challenge that offers no guarantees Hamidieh et al. (2023). For prompt-engineering, although leveraging specific prompts to instruct the model (e.g. “a photo of a female plummer”) Friedrich et al. (2023) can work to avoid biases, this is not how the average user prompts these models, do not a solution in practice. For changing model weights, the resource-intensive nature of re-training models poses challenges, requiring vast amounts of data (89k text-image pairs in Esposito et al. (2023)) and full fine-tuning of the model.

Our work identifies existing biases in pre-trained models and effectively mitigates them by parameter-efficient fine-tuning, which only requires a small amount of data (1.5K) and parameters (5.6%). Our contributions can be summarized as follows:

  • We measure the gender skew in text to assess gender bias in embeddings.

  • We introduce Mixture of Experts (MoE) to Stable Diffusion and fine-tune the Bias Adapters (BiAs) to effectively mitigate identified gender biases.

  • We apply special tokens to aid the BiAs in better understanding the biased data and demonstrate that it is essential for mitigation.

  • We successfully mitigate the gender bias in Stable Diffusion while maintaining image quality.

2 Related Work

Biases in multimodal settings have attracted increasing attention. Several studies have evaluated multimodal models and discovered that they inherit and propagate many biases Agarwal et al. (2021); Cho et al. (2023). Prompt learning and engineering have been widely utilized in vision-language models and generative models Berg et al. (2022); Friedrich et al. (2023). Specifically, Fair Diffusion Friedrich et al. (2023) addresses biases through prompt engineering: users insert prompts to generate fair images with the assistance of specific guidance on the SEGA model Brack et al. (2023). Subsequently, image generation is guided towards a fairer outcome through manual semantic editing in the latent space of biased concepts.

Approaches to offset bias representation are also increasingly popular: Seth et al. (2023) employs additive residual image representations to mitigate biased representations, while Esposito et al. (2023) focuses on fine-tuning the model to achieve fairness. Chuang et al. (2023) proposed a method to project out biased directions in text embeddings to create fair generative models. This approach leverages positive pairs of prompts to debias embeddings effectively. By generating embeddings of prompts such as “a photo of a [class name] with [spurious attribute]”, a calibrated projection matrix, as shown in Equations 1, is optimized. After projection, the embedding should only contain information about the “[class name]” with no spurious information (e.g., gender).

Equation 1 illustrates the regularization of the difference between the projected embeddings of the set of positive pairs S𝑆Sitalic_S, where (zi,zj)subscript𝑧𝑖subscript𝑧𝑗(z_{i},z_{j})( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) represents the embedding of prompt pair (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) in S𝑆Sitalic_S, which describes the same class but with different spurious attributes (e.g., gender). The loss function encourages the linear projection P𝑃Pitalic_P to be invariant to the difference between the spurious attributes (details in Appendix A.4).

minPPP02+λ|S|(i,j)SPziPzj2subscript𝑃superscriptnorm𝑃subscript𝑃02𝜆𝑆subscript𝑖𝑗𝑆superscriptnorm𝑃subscript𝑧𝑖𝑃subscript𝑧𝑗2\min_{P}\left\|P-P_{0}\right\|^{2}+\frac{\lambda}{\left|S\right|}\sum_{(i,j)% \in S}\left\|Pz_{i}-Pz_{j}\right\|^{2}roman_min start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ∥ italic_P - italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ italic_S end_POSTSUBSCRIPT ∥ italic_P italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_P italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1)

3 Method

In this section we describe our method. First, we explore the projection matrix from Chuang et al. (2023) (Equation 1) to assess inheriting gender biases in Stable Diffusion. Second, we introduce the Mixture of Experts (MoE) to Stable Diffusion and add fine-tuned BiAs (experts), which combine bias identification gates and special tokens to aid in understanding the biased data. This results in our MoESD-BiAs approach.

3.1 Identifying Biases in Stable Diffusion

Stable Diffusion, the text-to-image model in our work, generates images from text by switching diffusion process from pixel space to latent space to generate images. Given an image xH×W×3𝑥superscript𝐻𝑊3x\in\mathbb{R}^{H\times W\times 3}italic_x ∈ roman_ℝ start_POSTSUPERSCRIPT italic_H × italic_W × 3 end_POSTSUPERSCRIPT in RGB space, the encoder \mathcal{E}caligraphic_E encodes x𝑥xitalic_x into a latent representation z=(x)𝑧𝑥z=\mathcal{E}(x)italic_z = caligraphic_E ( italic_x ), and the decoder D𝐷Ditalic_D reconstructs the image from the latent, giving x=D(z)=D((x))superscript𝑥𝐷𝑧𝐷𝑥x^{\prime}=D(z)=D(\mathcal{E}(x))italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_D ( italic_z ) = italic_D ( caligraphic_E ( italic_x ) ), where zh×w×c𝑧superscript𝑤𝑐z\in\mathbb{R}^{h\times w\times c}italic_z ∈ roman_ℝ start_POSTSUPERSCRIPT italic_h × italic_w × italic_c end_POSTSUPERSCRIPT. A domain-specific encoder τθsubscript𝜏𝜃\tau_{\theta}italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT projects y𝑦yitalic_y (language prompt) to an intermediate representation τθ(y)M×dτsubscript𝜏𝜃𝑦superscript𝑀subscript𝑑𝜏\tau_{\theta}(y)\in\mathbb{R}^{M\times d_{\tau}}italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y ) ∈ roman_ℝ start_POSTSUPERSCRIPT italic_M × italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, which is then mapped to the intermediate layers of the U-Net via a cross-attention layer implementing to condition the latent z𝑧zitalic_z. Based on image-conditioning pairs, the conditional LDM is learned via Equation 2.

LLDM:=E(x),y,ϵ𝒩(0,1),t[ϵϵθ(zt,t,τθ(y))22]assignsubscript𝐿𝐿𝐷𝑀subscriptEformulae-sequencesimilar-to𝑥𝑦italic-ϵ𝒩01𝑡delimited-[]superscriptsubscriptnormitalic-ϵsubscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡subscript𝜏𝜃𝑦22L_{LDM}:=\mathrm{E}_{\mathcal{E}(x),y,\epsilon\sim\mathcal{N}(0,1),t}\left[% \left\|\epsilon-\epsilon_{\theta}(z_{t},t,\tau_{\theta}(y))\right\|_{2}^{2}\right]italic_L start_POSTSUBSCRIPT italic_L italic_D italic_M end_POSTSUBSCRIPT := roman_E start_POSTSUBSCRIPT caligraphic_E ( italic_x ) , italic_y , italic_ϵ ∼ caligraphic_N ( 0 , 1 ) , italic_t end_POSTSUBSCRIPT [ ∥ italic_ϵ - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (2)

However when employing prompts like “a photo of a [occupation]”, the model has been shown to exhibit significant gender bias with stereotypes Friedrich et al. (2023); Chuang et al. (2023), which manifests throughout the entire process.

While research efforts have been dedicated to addressing biases in the iterative image denoising process within U-Net Ronneberger et al. (2015), the convolutional network component of Stable Diffusion Esposito et al. (2023), our investigation has revealed a distinct bias emerging after the text embedding stage, when utilizing the text encoder of different versions of Stable Diffusion. This means that bias is amplified throughout the process.

Appendix A.2 shows a visualization of the dimensionality-reduced embeddings encoded by the text encoder of two versions of Stable Diffusion111https://huggingface.co/runwayml/stable-diffusion-v1-5 222https://huggingface.co/stabilityai/stable-diffusion-2-1-base using T-SNE, with prompts like “The photo of the face of a [occupation]”, where the [occupation] are from the top 8 male-biased and top 8 female-biased occupations from Fair Diffusion Friedrich et al. (2023)333https://github.com/ml-research/Fair-Diffusion/blob/main/results_fairface_generated_1-5.txt. There is a clear boundary between two gender-biases occupation encoded embeddings in both versions of Stable Diffusion. In other words, embeddings show gender biases regardless of whether they are encoded by CLIP or OpenCLIP Radford et al. (2021); Schuhmann et al. (2022); Cherti et al. (2023), indicating that the text encoder already contains gender biases, which then conditions the U-Net. Thus, it is not enough to mitigate biases in the latent space of the U-Net, we need to address the text-encoded bias as well.

3.2 Measuring Text-Encoded Bias

Issues of Projecting Prompt Embeddings for Fairness

Although Chuang et al. (2023) claims great success in measuring bias in text embeddings and projecting it to achieve fairness, we have identified an issue. When attempting to switch the original prompt into another direction to achieve balance, the projection may take it into another direction that can introduce other biases. In the Appendix A.6, we present failure cases of gender mitigation using Chuang et al. (2023)’s method, which resulted in only male faces being generated.

Reframing the Projection to Assess Gender Bias

We conduct zero-shot and unsupervised classification on embeddings, using prompts like “The photo of the face of a [occupation]” and a pretrained Stable Diffusion text encoder. Our goal is to measure gender bias within these prompts and their embeddings. Altering the weights of the pretrained text encoder is risky, as it could lead to the model forgetting information that is not related to gender, which has learned from extensive data, so the Stable Diffusion text encoder is frozen. Another challenge is the lack of clear labels, as we only have prompts, embeddings, and statistics for each occupation generated by Fair Diffusion. These statistics are based on 250 images from the original Stable Diffusion and cannot be quantitatively analyzed or treated as a simple binary classification task due to the small sample size. Moreover, the embeddings are subject to change based on the hyperparameters of the pretrained model, meaning the statistics can only serve as a benchmark for testing our approach rather than for model learning. Therefore, our analysis relies solely on prompts and model weights.

As shown in Equation 1, z𝑧zitalic_z is the prompt embedding, and P𝑃Pitalic_P is the projection process. Prompts z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT applying Calibration Matrix can be written as Pz0=P0(I+λ|S|(i,j)S(zizj)(zizj)T)1z0superscript𝑃subscript𝑧0subscript𝑃0superscript𝐼𝜆𝑆subscript𝑖𝑗𝑆subscript𝑧𝑖subscript𝑧𝑗superscriptsubscript𝑧𝑖subscript𝑧𝑗𝑇1subscript𝑧0P^{*}z_{0}=P_{0}\left(I+\frac{\lambda}{\left|S\right|}\sum_{(i,j)\in S}(z_{i}-% z_{j})(z_{i}-z_{j})^{T}\right)^{-1}z_{0}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_I + divide start_ARG italic_λ end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ italic_S end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, derived from Equation 1.

Here we first define the similarity between two prompts before and after applying the Calibration Matrix:

ΔS(z0,zt,P0,P)Δ𝑆subscript𝑧0subscript𝑧𝑡subscript𝑃0superscript𝑃\displaystyle\Delta S(z_{0},z_{t},P_{0},P^{*})roman_Δ italic_S ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) =Similarity(P0z0,P0zt)absentSimilaritysubscript𝑃0subscript𝑧0subscript𝑃0subscript𝑧𝑡\displaystyle=\text{Similarity}(P_{0}z_{0},P_{0}z_{t})= Similarity ( italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
Similarity(Pz0,P0zt)Similaritysuperscript𝑃subscript𝑧0subscript𝑃0subscript𝑧𝑡\displaystyle\quad-\text{Similarity}(P^{*}z_{0},P_{0}z_{t})- Similarity ( italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (3)

We then calculate the gender skew for prompt embedding z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as

𝒢(z0)𝒢subscript𝑧0\displaystyle\mathscr{G}(z_{0})script_G ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =ΔS(z0,zmale,P0,P)absentΔ𝑆subscript𝑧0subscript𝑧𝑚𝑎𝑙𝑒subscript𝑃0superscript𝑃\displaystyle=\Delta S(z_{0},z_{male},P_{0},P^{*})= roman_Δ italic_S ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
ΔS(z0,zfemale,P0,P)Δ𝑆subscript𝑧0subscript𝑧𝑓𝑒𝑚𝑎𝑙𝑒subscript𝑃0superscript𝑃\displaystyle\quad-\Delta S(z_{0},z_{female},P_{0},P^{*})- roman_Δ italic_S ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_f italic_e italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (4)

We assume the gender skew is male when 𝒢(z0)>0𝒢subscript𝑧00\mathscr{G}(z_{0})>0script_G ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) > 0 while the gender skew is female 𝒢(z0)<0𝒢subscript𝑧00\mathscr{G}(z_{0})<0script_G ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) < 0. If ΔS(z0,zt,P0,P)Δ𝑆subscript𝑧0subscript𝑧𝑡subscript𝑃0superscript𝑃\Delta S(z_{0},z_{t},P_{0},P^{*})roman_Δ italic_S ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is larger, then the disparity between two prompts-before and after applying the Calibration Matrix is also larger. This results in a greater divergence between the original embedding direction and the projected embeddings. Consequently, the strength of the redirected embeddings is increased, indicating a stronger gender preference before redirecting. In essence, if the redirection strength is substantial, it indicates a more forceful adjustment away from the original gender direction, which means higher gender skew.

3.3 Defining the Bias Identification Gate

We utilize the Pearson Correlation Coefficient to measure similarity and check statistical approximation of genders in Fair Diffusion occupations, with further details provided in Appendix A.7. If the male count for a particular occupation surpasses half of the total, we assume this occupation is male-skewed; otherwise, it is female-skewed. To assess the effectiveness of our approach when designing the Bias Identification Gate, we compare the frequency-based labels with our calculations derived from Equation 3.2 to compute accuracy. We achieve an accuracy rate of 79%, indicating that our bias measurement approximation, achieved through task reframing, aligns with both intuition and statistical trends.

The accuracy is lower when it comes to occupations such as insurance agents (details in Appendix A.9.1), as these are mostly likely at the boundary between male and female. In Table 1, we show mitigation of biases even when classifications of these challenging occupations are incorrect.

3.4 BiAs Experts: Bias Adapter Experts

The next step is to set up the experts for the MoE approach. For that, we implement a way to personalize Stable Diffusion to guide our model towards fairness.

The intuition is to guide the model to generate more female when the original embedding exhibits the male skew and vice versa. To achieve this, we divide our training text-image pairs into two groups: male and female. We then fine-tune the model separately to generate male-biased and female-biased experts. When the gate is activated, it guides the experts according to the following rules: male skew for female-biased experts and female skew for male-biased experts. This process is called bias fine-tuning.

Refer to caption
Figure 1: The Architecture of BiAs. We apply BiAs to the cross-attention of the U-Net. In the figure, the left BiAs represents the female expert, the right one represents the male expert, and the middle represents the original cross-attention. The conditional information is processed by the Bias Identification Gate to determine which experts to choose - either male, female, or none of them. In the end, the chosen BiAs will process the input and be added together with the original cross-attention. The detailed BiAs architecture includes BiAs of the Q, K, V, and output matrices (for simplicity, we omit them in the figure).

Adapters Hu et al. (2022); Houlsby et al. (2019); Karimi Mahabadi et al. (2021) provide a method to freeze the model and introduce a new, trainable weight matrix, which significantly reduces both the time and memory required for training.

We incorporate this method into creating bias experts, making the bias fine-tuning parameter-efficient. We freeze the parameters in the U-Net and add adapters on the cross-attention layers of the U-Net, as illustrated in Figure 1. We initialize the adapters following the principles of LoRA Hu et al. (2022): the weight parameters of the first matrix Wdownsuperscriptsubscript𝑊downW_{\text{down}}^{\prime}italic_W start_POSTSUBSCRIPT down end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are determined by a Gaussian function, while the parameters of the second matrix Wupsuperscriptsubscript𝑊upW_{\text{up}}^{\prime}italic_W start_POSTSUBSCRIPT up end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are initialized as a zero matrix. Consequently, the added pathway is initially zero during training and does not impact the result, resembling the original output.

Since the adapters are randomly initialized, they do not inherit biases from Stable Diffusion fine-tuning, which makes them more effective in achieving fairness, as detailed in Section 4.2.2. By using adapters as the bias experts, we achieve better fairness scores and reduce the trainable ratio to only 5.6%.

For bias fine-tuning, we generate biased images with specific male or female characteristics so that the bias expert will learn stereotypes from the data. We employ a technique from Dreambooth Ruiz et al. (2023), utilizing a special token to help BiAs to better understand the biased data. Dreambooth adds a special token, a unique identifier, to the prompt, and uses a few fine-tuning images of a subject as input to “personalize” the model. In our work, we use the special token to “personalize” the adapters of the biased information.

In theory, the special token should not impact the results, as it is only used to remind the experts about the bias. However, due to the context differences of prompt embeddings, minor differences in results might occur when different special tokens are used for different experts and prompts.

For fine-tuning data, we generate our training image-text pairs using Stable Diffusion-XL with the following prompt: “A [gender] + [race] + [occupation].” The prompts generate 1530 images with different genders, races and occupations to ensure variety, which then fine-tune the model using the special token. The results demonstrate that the special token yields better performance than ordinary fine-tuning, as detailed in Section 4.2.2.

Our bias fine-tuning can be formalised in the following optimization process:

Lbias:=E(x),y,ϵ𝒩(0,1),t[ϵϵθ(zt,t,τθ(s))22]assignsubscript𝐿𝑏𝑖𝑎𝑠subscriptEformulae-sequencesimilar-to𝑥𝑦italic-ϵ𝒩01𝑡delimited-[]superscriptsubscriptnormitalic-ϵsubscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡subscript𝜏𝜃𝑠22L_{bias}:=\mathrm{E}_{\mathcal{E}(x),y,\epsilon\sim\mathcal{N}(0,1),t}\left[% \left\|\epsilon-\epsilon_{\theta}(z_{t},t,\tau_{\theta}(s))\right\|_{2}^{2}\right]italic_L start_POSTSUBSCRIPT italic_b italic_i italic_a italic_s end_POSTSUBSCRIPT := roman_E start_POSTSUBSCRIPT caligraphic_E ( italic_x ) , italic_y , italic_ϵ ∼ caligraphic_N ( 0 , 1 ) , italic_t end_POSTSUBSCRIPT [ ∥ italic_ϵ - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_s ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (5)

where s𝑠sitalic_s is the prompt with the special token.

What distinguishes our work from that of Esposito et al. (2023) is that we only need to fine-tune a small set of data, rather than 89K prompts and 89K images, and fine-tune a small ratio of parameters, making it more training-efficient.

What distinguishes our work from Fair Diffusion Friedrich et al. (2023) is that we do not manually engage in prompt engineering (human intervention), in other words, we do not modify the prompt. The special token we add can be any token. Moreover, even without the special token, our approach still mitigates gender bias.

3.5 Mixture of Experts

MoE (Mixture of Experts) Jacobs et al. (1991); Eigen et al. (2013) is an ensemble learning technique that improves performance by weighting the predictions of different experts through gate mechanisms.

In a classic MoE system, each expert is independent, and demonstrates high performance within their area of expertise. A gating mechanism is learned to adjust the weight assigned to each expert based on the input data. In contrast, in our approach, the gates and experts, defined in Section 3.3 and 3.4, are not trained. We utilize pre-trained models, which can be flexibly replaced by other pre-trained models.

We show our MoE architecture in Figure 2. What makes our gating mechanism different from traditional MoE is that we do not learn it from the data, and for the experts, we utilize pre-trained models.

To summarise, our approach is as follows: we first conduct zero-shot and unsupervised classification. We do not use the labels (the gender statistics of the 250 images for each occupation generated by the vanilla Stable Diffusion, provided by Fair Diffusion) to train the model, but only for hyperparameter selection. In other words, we only use the prompt and pretrained CLIP encoder itself. At inference, the model takes the original prompt and special token as inputs. It judges the gender skew the prompt shows, with the male skew leading to a higher probability of calling the female expert, and vice versa.

Refer to caption
Figure 2: Architecture of our MoE system. We modify Rombach et al. (2022) by adding our Bias Identification Gate and Bias Adapter Experts. See Section 3.5 for details.

4 Experiments

4.1 Setup

We first use the statistical data of different occupations generated by Fair Diffusion (Stable Diffusion 1.5) to select the best hyperparameters for both versions of Stable Diffusion. If the general statistical count shows more males than females, we assume that it has a male skew, and vice versa. We found that 4000 is the optimal hyperparameter λ𝜆\lambdaitalic_λ in Equation 1 and the subsequent derivation for version 1.5, and 100 is optimal for version 2.1. There is a large difference in the best hyperparameter between the two versions, which we attribute to the differences in the text encoder between the two versions of Stable Diffusion, which introduces different embedding gender biases in the latent space.

Next, we use Stable Diffusion XL to generate fine-tuning images data and then fine-tune our MoESD with BiAs experts by adding a special token to the original prompts (set to “sks” as in Ruiz et al. (2023)). The prompt is “A photo of the sks face of the [occupation].”

After fine-tuning, we proceed to the inference steps: There are three experts in the system: one original, one male BiAs expert, and one female BiAs expert. When inputting a prompt, we add the same special token (“sks”) as in the fine-tuning stages. The system judges the skew, and if the skew is male, we allocate 10% to the male expert, 50% to the female expert, and 40% to the original expert. Conversely, if the skew is female, we allocate 10% to the female expert, 50% to the male expert, and 40% to the original expert. These are the optimal hyperparameters for conservative mitigation within a limited search range in our experiments since we do not want the BiAs to completely change the original weights.

In this manner, we generate 100 images based on the same prompt for each of the 153 occupations.

We then employ BLIP2 Li et al. (2023) VQA task to conduct fairness evaluation and Laion-aesthetic linear classifier Schuhmann (2022) to perform aesthetic evaluation, as we discuss in the following sections. We also employ human evaluation for the aesthetic and image-description relevance evaluation.

4.2 Fairness Evaluation

4.2.1 Metric

Stable Diffusion learns a conditional distribution P^(X|Z=z)^𝑃conditional𝑋𝑍𝑧\hat{P}(X|Z=z)over^ start_ARG italic_P end_ARG ( italic_X | italic_Z = italic_z ), where z𝑧zitalic_z represents the embedding of the prompt, and the biased nature of the dataset used to train the generative model and its architecture can impact the distribution P^^𝑃\hat{P}over^ start_ARG italic_P end_ARG. To quantify the bias in generative models, recent studies Choi et al. (2020); Teo et al. (2024); Chuang et al. (2023) propose using statistical parity. Specifically, given a classifier h:XA:𝑋𝐴h:X\rightarrow Aitalic_h : italic_X → italic_A for the gender attribute, the discrepancy of the generative distribution P^^𝑃\hat{P}over^ start_ARG italic_P end_ARG can be defined as two metrics: the empirical and uniform distributions.

We use the metric in the evaluation of the images generated by Stable Diffusion, where X𝑋Xitalic_X is the set of images for each occupation o𝑜oitalic_o from Fair Diffusion, classifier hhitalic_h is the BLIP2 model with prompts to predict gender. A𝐴Aitalic_A is a set containing “male” and “female”, and the expectation is estimated with empirical samples. We assume the model to be fairer when the empirical distributions are closer to uniform distributions, since a fair model minimizes the discrepancy by ensuring that each attribute aA𝑎𝐴a\in Aitalic_a ∈ italic_A has a similar probability (uniformly distributed).

Equation 6 incorporates ideal fair expectations into the estimation of empirical samples for each occupation. While this metric may not consider the overall gender ratio and may sometimes favor a higher proportion of males/females in all occupations, it aims to detect gender bias within each occupation. The standard deviation helps identify occupations that deviate significantly from the fair boundary.

1|O|aA,xO|ExP^[𝟙h(x)=a]1|A||1𝑂subscriptformulae-sequence𝑎𝐴𝑥𝑂subscriptEsimilar-to𝑥^𝑃delimited-[]subscriptdouble-struck-𝟙𝑥𝑎1𝐴\frac{1}{\left|O\right|}\sum_{a\in A,x\in O}\left|\mathrm{E}_{x\sim\hat{P}}% \left[\mathbb{1}_{h(x)=a}\right]-\frac{1}{|A|}\right|divide start_ARG 1 end_ARG start_ARG | italic_O | end_ARG ∑ start_POSTSUBSCRIPT italic_a ∈ italic_A , italic_x ∈ italic_O end_POSTSUBSCRIPT | roman_E start_POSTSUBSCRIPT italic_x ∼ over^ start_ARG italic_P end_ARG end_POSTSUBSCRIPT [ blackboard_𝟙 start_POSTSUBSCRIPT italic_h ( italic_x ) = italic_a end_POSTSUBSCRIPT ] - divide start_ARG 1 end_ARG start_ARG | italic_A | end_ARG | (6)

In the context of binary gender, Equation 6 for each attribute aA𝑎𝐴a\in Aitalic_a ∈ italic_A, male or female, is the same. Therefore, for simplicity, we report the male attribute in the results.

4.2.2 Results

We compare our method to the original Stable Diffusion, the Biased Prompt method proposed by Chuang et al. (2023), and Fair Diffusion versions 1.5 and 2.1. Due to the variability in results related to hyperparameters reported by Chuang et al. (2023), we select the best result from different hyperparameters for their method in both versions: 0.05 for version 1.5 and 500 for version 2 (as reported in their paper). The results are shown in Table 1.

Model Version Stable Diffusion 1.5 Stable Diffusion 2.1
Method Fairness Score STD Fairness Score STD
Benchmarks Vanilla 0.281 0.167 0.326 0.146
Bias Prompts Chuang et al. (2023) 0.279(λ𝜆\lambdaitalic_λ = 0.05) 0.169 0.255(λ𝜆\lambdaitalic_λ = 500) 0.143
Fair Diffusion Friedrich et al. (2023) 0.074 0.093 0.070 0.093
Ours MoESD 0.293 0.151 0.344 0.144
MoESD-BiAs 0.169 0.124 0.274 0.147
MoESD (special token) 0.141 0.1 0.147 0.108
MoESD-BiAs (special token) 0.135 0.103 0.136 0.107
Table 1: Fairness Evaluation of Models. For each method, we show the Fairness Score and Standard Deviation (lower values are better) for two versions of Stable Diffusion. Our MoESD-BiAs (special token) yields the best performance in all cases except for Fair Diffusion with manual guidance Friedrich et al. (2023).

MoESD-BiAs (with special token) performs better than all others except for Fair Diffusion Friedrich et al. (2023). This is expected, since we assume that guidance from manually edited prompts is SOTA in terms of fairness as it will force the model to chose the gender specified. In our method, we do not explicitly give any gender information in the prompt (e.g. male or female). We show our mitigation examples in Appendix A.1.

Model Version

We can observe that in most methods, the fairness score for Stable Diffusion Version 2.1 is higher than for Version 1.5, except for Fair Diffusion, which indicates that Version 1.5 performs better in terms of fairness. For other methods, there is a huge difference between different versions, but our method achieves similarly good performance in both versions.

Adapter

BiAs appears to perform better than full fine-tuning regardless of whether we use the special token, which suggests that only a small set of parameters (5%) are enough to mitigate bias. We attribute it to the fact that adapters allow more targeted parameter updates, reducing the risk of gradient vanishing or exploding and preventing the catastrophic forgetting of general knowledge from well-pretrained Stable Diffusion weights. Moreover, due to random initialization, our adapters are more effective in mitigating bias in relatively small data and a few fine-tuning steps.

Special Token

The special token contributes substantially to mitigating bias since it helps the model (bias experts) to remember the fine-tuning process, which is especially effective for the randomly initialized adapters. We can see the MoESD method performs worse than the vanilla method. We attribute it to the fact that small fine-tuning cannot impact biases of a large base model. However, the special token can mitigate this, as seen from the MoESD (special token) results.

4.3 Aesthetic Evaluation

4.3.1 Metric

It should be noted that due to differences in training data, the style and quality of generated images vary. While the images generated by our model achieve better fairness scores than those of others, we also need to evaluate whether the aesthetic quality of the generated images is not worse.

Linear Aesthetic Evaluation

Simulacra Aesthetic Captions - SAC Pressman et al. (2022) is a dataset consisting of over 238,000 synthetic images generated with AI models, from over forty thousand user submitted prompts. Users rate the images on their aesthetic value from 1 to 10, when they were asked “How much do you like this image on a scale from 1 to 10?”. LAION-Aesthetics Schuhmann (2022) trains a linear model on 5000 image-rating pairs from the SAC dataset, which can predict a numeric aesthetic score in 1-10. We utilize this linear model to evaluate the image sets of all methods and compare their aesthetic scores.

Human Evaluation

We randomly select 30 out of 153 occupations and then randomly select one out of 100 generated images. We choose the images from Fair Diffusion Friedrich et al. (2023), Bias Prompts (λ𝜆\lambdaitalic_λ = 0.05) Chuang et al. (2023), and our best method (MoESD-BiAs (special token)), and compare them with the vanilla images, all implemented on Stable Diffusion Version 1.5. So there are 90 pairs of images (30 pairs compare Fair Diffusion with vanilla, 30 pairs compare Bias Prompts (λ𝜆\lambdaitalic_λ = 0.05) with vanilla, and 30 pairs compare ours with vanilla) for users to choose which one is better. The instruction is “Choose the image that has better quality (with ‘same’ being an option), where the main criterion is that the person in the image looks like a normal person and their there are no unrealistic items in the background”. More details of the survey are listed in Appendix A.10.

It’s important to note that in some pairs, the vanilla model and ours have the same image because there is a probability that we do not activate the Bias expert in our model. In those cases, the quality remains the same. The same happens in some instances of Fair Diffusion, where the gender prompt engineering may not lead to different results. For the Bias Prompts, the best performance of Stable Diffusion Version 1.5 occurs with a small λ𝜆\lambdaitalic_λ value, so the pairs should be generally similar.

As final result, we calculate the average ratio at which users think each model does not perform worse than the vanilla model.

4.3.2 Results

Linear Aesthetic Evaluation

To better evaluate our fine-tuning results, we compare all the methods on the aesthetic level to determine whether fine-tuning affects abilities other than fairness. The results in the Table 2 show that our method maintain the image quality according to this metric.

Model Version Stable Diffusion 1.5 Stable Diffusion 2.1
Method Aesthetic Score Aesthetic Score
Benchmarks Vanilla 5.131 5.298
Bias Prompts Chuang et al. (2023) 5.132(λ𝜆\lambdaitalic_λ = 0.05) 5.285(λ𝜆\lambdaitalic_λ = 500)
Fair Diffusion Friedrich et al. (2023) 5.154 5.318
Ours MoESD 5.423 5.439
MoESD-BiAs 5.406 5.483
MoESD (special token) 5.348 5.452
MoESD-BiAs (special token) 5.290 5.455
Table 2: Aesthetic Evaluation of Models. For each method, we show the Aesthetic Score (higher values are better) for two versions of Stable Diffusion. Our method yields the best performance in both versions.

Compared to the vanilla approach, MoESD-BiAs (special token) has a higher aesthetics score, even though it may blurs faces at times. We speculate that the higher aesthetics score may be because of images with more colorful and complex backgrounds, which are favored by the aesthetic classifier. Additionally, we also observed that the vanilla method sometimes produces images with words, which receive very aesthetics scores, as also noted in the Laion-aesthetic example444http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html. Finally, the score for version 2.1 is higher than for version 1.5, indicating that version 2.1 has better aesthetic recognition during pre-training.

Our method with the special token sometimes performs worse than our method without the special token, suggesting that the special token may be occasionally harmful to aesthetics.

Human Evaluation

We present our human evaluation results in Table 3. Our method was same or better than the vanilla method in 59% of the cases, which ranks it worse than the SOTA methods we evaluated. This reflects the fact that our images are sometimes blurry, which is a very visible artifact and leads our human evaluators to (rightly) prefer vanilla images in many cases. This lower quality may be due to the limited size of the training data and the small number of training parameters. This indicates that more work is needed to keep the quality of the generated images, while mitigating biases.

Method Non-worse Ratio
Fair Diffusion Friedrich et al. (2023) 59.3%
Bias Prompts(λ𝜆\lambdaitalic_λ = 0.05) Chuang et al. (2023) 85.16%
MoESD-BiAs (special token) 54.58%
Table 3: Average Non-Worse Ratio for 3 Methods.

4.4 Image-Description Relevance Evaluation

We also launched an additional human evaluation on the quality of our generated images to further understand the results in the previous section. We selected the same 30 images generated by us from the aesthetic evaluation to assess how well each image matches its description (the prompt given to Stable Diffusion). The instruction is “How well does the image matches the image description. 10 stands for fully relevant and 0 stands for not relevant. Image description: A photo of the face of the [occupation name].

Out of the 30 images, 10 images do not activate the bias experts, in other words, there are no changes to the vanilla weight, while 20 images activate the bias expert.

We then calculated the mean and median relevance scores for each image and conducted an analysis for all 30 images. Details of the survey are listed in Appendix A.10.

The results are shown in Table 4. We observe a small degradation in quality when our bias mitigation method is activated. However, this degradation is acceptable, as the score difference is within a range of 1 (in 10).

Method Mean Median
Non-Activated BiAs (10) 7.17 7.5
Activated BiAs (20) 6.48 6.8
Aggregated (30) 6.71 7.03
Table 4: Mean and Median of Human Evaluation We evaluate how well the image matches the image description. The number in parentheses represents the number of images.

5 Conclusions

In this paper, we introduce the MoE to Stable Diffusion by taking a major step by assessing existing biases in the prompt embedding and fine-tuning the BiAs with special tokens to aid in understanding the biased data, which results in our MoESD-BiAs system, which has achieved significant success in mitigating bias while maintaining quality.

6 Limitations

Although our method has made significant strides in mitigating bias while maintaining aesthetic quality, we must admit the limitations of our work.

First, we cannot achieve the same level of fairness score as manual editing of the gender attributes, which could be further improved. Additionally, we did not pay attention to evaluating biases related to the race, as measuring bias in prompt embeddings for the race is much more difficult than for gender. Moreover, in our work, we only address the case of binary gender. We did not consider sexual minorities in various contexts, which is a much more complex task. We hope to explore this further in future work.

Second, our zero-shot and unsupervised prompt bias identification and hyperparameter selection are based on the results of Fair Diffusion statistical counts Friedrich et al. (2023), so it may not be entirely accurate. Moreover, we only achieve 79% accuracy in identifying bias from the prompt, which is not perfect when facing challenging occupations such as the insurance agent.

Third, although we achieve good performance in aesthetics, our method indeed loses some fine-grained details on faces due to the quality of the fine-tuning images. However, fortunately, our experts can be switched by fine-tuned ones by users themselves, which can be further improved through special fine-tuning.

7 Ethical Considerations

Our work focuses on addressing social bias, specifically gender bias. Our research has a broader impact beyond scientific research. We take a significant stride across a wide range of industries and societies and our method marks a crucial step toward eliminating gender biases in text-to-image models.

However, by introducing the BiAs approach to mitigate bias, there is a risk that people might misuse the weights to generate more biased content.

Acknowledgements

We would like to express gratitude to all who participated in the human evaluation, which has significantly enhanced the quality of this research.

References

  • Agarwal et al. (2021) Sandhini Agarwal, Gretchen Krueger, Jack Clark, Alec Radford, Jong Wook Kim, and Miles Brundage. 2021. Evaluating clip: towards characterization of broader capabilities and downstream implications. arXiv preprint arXiv:2108.02818.
  • Berg et al. (2022) Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, and Max Bain. 2022. A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 806–822. Association for Computational Linguistics.
  • Brack et al. (2023) Manuel Brack, Felix Friedrich, Dominik Hintersdorf, Lukas Struppek, Patrick Schramowski, and Kristian Kersting. 2023. Sega: Instructing text-to-image models using semantic guidance. In Thirty-seventh Conference on Neural Information Processing Systems.
  • Cherti et al. (2023) Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. 2023. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829.
  • Cho et al. (2023) Jaemin Cho, Abhay Zala, and Mohit Bansal. 2023. Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models. In ICCV.
  • Choi et al. (2020) Kristy Choi, Aditya Grover, Trisha Singh, Rui Shu, and Stefano Ermon. 2020. Fair generative modeling via weak supervision. In International Conference on Machine Learning, pages 1887–1898. PMLR.
  • Chuang et al. (2023) Ching-Yao Chuang, Jampani Varun, Yuanzhen Li, Antonio Torralba, and Stefanie Jegelka. 2023. Debiasing vision-language models via biased prompts. arXiv preprint 2302.00070.
  • Eigen et al. (2013) David Eigen, Marc’Aurelio Ranzato, and Ilya Sutskever. 2013. Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314.
  • Esposito et al. (2023) Piero Esposito, Parmida Atighehchian, Anastasis Germanidis, and Deepti Ghadiyaram. 2023. Mitigating stereotypical biases in text to image generative systems. arXiv preprint arXiv:2310.06904.
  • Friedrich et al. (2023) Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, and Kristian Kersting. 2023. Fair diffusion: Instructing text-to-image generation models on fairness. arXiv preprint at arXiv:2302.10893.
  • Hamidieh et al. (2023) Kimia Hamidieh, Haoran Zhang, Thomas Hartvigsen, and Marzyeh Ghassemi. 2023. Identifying implicit social biases in vision-language models.
  • Houlsby et al. (2019) Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  • Hu et al. (2022) Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  • Jacobs et al. (1991) Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991. Adaptive mixtures of local experts. Neural computation, 3(1):79–87.
  • Karimi Mahabadi et al. (2021) Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035.
  • Li et al. (2023) Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
  • Luccioni et al. (2023) Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. 2023. Stable bias: Evaluating societal representations in diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  • OpenAI (2023) OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  • Podell et al. (2023) Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952.
  • Pressman et al. (2022) John David Pressman, Katherine Crowson, and Simulacra Captions Contributors. 2022. Simulacra aesthetic captions. Technical Report Version 1.0, Stability AI.  url https://github.com/JD-P/simulacra-aesthetic-captions .
  • Radford et al. (2021) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  • Raffel et al. (2020) Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  • Ramesh et al. (2021) Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR.
  • Reimers and Gurevych (2019) Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  • Rombach et al. (2022) Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer.
  • Ruiz et al. (2023) Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510.
  • Schuhmann (2022) Christoph Schuhmann. 2022. LAION-AESTHETICS.
  • Schuhmann et al. (2022) Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa R Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. 2022. LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  • Seth et al. (2023) Ashish Seth, Mayur Hemani, and Chirag Agarwal. 2023. Dear: Debiasing vision-language models with additive residuals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6820–6829.
  • Teo et al. (2024) Christopher Teo, Milad Abdollahzadeh, and Ngai-Man Man Cheung. 2024. On measuring fairness in generative models. Advances in Neural Information Processing Systems, 36.
  • Van der Maaten and Hinton (2008) Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research, 9(11).

Appendix A Appendix

A.1 Mitigation Visualization

We showcase our method with vanilla Stable Diffusion to demonstrate our gender bias mitigation in Figure 3.

Refer to caption
Figure 3: Successful Mitigation of Gender Bias. From left to right are the occupations of aerospace engineer, metal worker, plumber, executive assistant, nurse, and fitness instructor. Each column is generated by the same prompt and seed. The left three are extremely male-biased occupations, and the right three are extremely female-biased occupations. The BiAs Expert successfully leads to a fairer generation.

A.2 Text-Encoded Bias from Prompts

We use T-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the text-encoded bias from prompts, as shown in Figure 4.

Refer to caption
(a) CLIP ViT-L/14 of Stable Diffusion Version 1.5
Refer to caption
(b) OpenCLIP-ViT/H of Stable Diffusion Version 2.1
Figure 4: T-Distributed Stochastic Neighbor Embedding (t-SNE) Van der Maaten and Hinton (2008) dimension reduction and visualization of prompts encoded by text encoder of Stable Diffusion Version 1.5 and 2.1 Rombach et al. (2022). There is a clear boundary between two gender-bias occupation embeddings in both versions of Stable Diffusion.

A.3 Different Methods for Bias-Identification Gate

We showcase various methods for Bias-Identification Gate, and our approach stands out as the most effective in identifying bias in the prompt according to the Equation 3.2. We directly compare classifications without the intervention of the Calibration Matrix Projection by simply comparing the similarity of z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, zmalesubscript𝑧𝑚𝑎𝑙𝑒z_{male}italic_z start_POSTSUBSCRIPT italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT, and zfemalesubscript𝑧𝑓𝑒𝑚𝑎𝑙𝑒z_{female}italic_z start_POSTSUBSCRIPT italic_f italic_e italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT to determine whether z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is closer to zmalesubscript𝑧𝑚𝑎𝑙𝑒z_{male}italic_z start_POSTSUBSCRIPT italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT oder zfemalesubscript𝑧𝑓𝑒𝑚𝑎𝑙𝑒z_{female}italic_z start_POSTSUBSCRIPT italic_f italic_e italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT. The rule is demonstrated below:

𝒢(z0)superscript𝒢subscript𝑧0\displaystyle\mathscr{G}^{\prime}(z_{0})script_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =Similarity(z0,zmale)absent𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡superscript𝑦subscript𝑧0subscript𝑧𝑚𝑎𝑙𝑒\displaystyle=Similarity^{\prime}(z_{0},z_{male})= italic_S italic_i italic_m italic_i italic_l italic_a italic_r italic_i italic_t italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT )
Similarity(z0,zfemale)𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡superscript𝑦subscript𝑧0subscript𝑧𝑓𝑒𝑚𝑎𝑙𝑒\displaystyle\quad-Similarity^{\prime}(z_{0},z_{female})- italic_S italic_i italic_m italic_i italic_l italic_a italic_r italic_i italic_t italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_f italic_e italic_m italic_a italic_l italic_e end_POSTSUBSCRIPT ) (7)

We also utilize third-party models (T5 Raffel et al. (2020) and Sequence Transformer Reimers and Gurevych (2019)) as monitoring models to utilize their embedding similarity for classification and compare the results with our method.

For T5, we perform the QA task for classification, as follows:

Question = “Answer the following question with ‘male’ or ‘female’. Is the face more likely to be male or female?”

Context = “A photo of the face of the ” + [occupation]

For Sentence Transformer, we calculate the similarity between two different set of prompts and perform the classification, selecting the maximum accuracy:

(1) Query = “A photo of the face of the ” + [occupation]

Docs = “A photo of the face of the male”, “A photo of the face of the female”

(2) Query= [occupation]

Docs = [“male”, “female”]

Once again, our identification method yields the best results.

Method Accuracy
Ours 79%
Cosine Similarity (CLIP) 72%
Euclidean Distance (CLIP) 27%
Manhattan Distance (CLIP) 28%
Jaccard Similarity (CLIP) 43%
Pearson Correlation Coefficient (CLIP) 72.5%
T5 Prompt Classification 57%
Sentence Transformer Embedding 70%
Table 5: Accuracy for Gate-Identification in Different Methods. The method labeled with “(CLIP)” indicates that we use the embeddings from the CLIP encoder obtained from Stable Diffusion.

A.4 More Details of Chuang et al. (2023)

The detailed proof is shown in the original work. For simplicity in our work, the derivation from Equation 1 is presented as follows. Equations 8 and 9 demonstrate that the calibrated projection matrix and prompt embedding have convenient closed-form solutions.

P=P0(I+λ|S|(i,j)S(zizj)(zizj)T)1superscript𝑃subscript𝑃0superscript𝐼𝜆𝑆subscript𝑖𝑗𝑆subscript𝑧𝑖subscript𝑧𝑗superscriptsubscript𝑧𝑖subscript𝑧𝑗𝑇1P^{*}=P_{0}\left(I+\frac{\lambda}{\left|S\right|}\sum_{(i,j)\in S}(z_{i}-z_{j}% )(z_{i}-z_{j})^{T}\right)^{-1}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_I + divide start_ARG italic_λ end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ italic_S end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (8)
z=(I+λ|S|(i,j)S(zizj)(zizj)T)1Calibration Matrixz0superscript𝑧subscriptsuperscript𝐼𝜆𝑆subscript𝑖𝑗𝑆subscript𝑧𝑖subscript𝑧𝑗superscriptsubscript𝑧𝑖subscript𝑧𝑗𝑇1Calibration Matrixsubscript𝑧0z^{*}=\underbrace{\left(I+\frac{\lambda}{|S|}\sum_{(i,j)\in S}(z_{i}-z_{j})(z_% {i}-z_{j})^{T}\right)^{-1}}_{\text{Calibration Matrix}}z_{0}italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = under⏟ start_ARG ( italic_I + divide start_ARG italic_λ end_ARG start_ARG | italic_S | end_ARG ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ italic_S end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT Calibration Matrix end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (9)

A.5 Parameter λ𝜆\lambdaitalic_λ for Bias-Identification

From Equation 3.2 and 8, we can observe that the parameter λ𝜆\lambdaitalic_λ is crucial for Bias Identification. Therefore, we can deduce that the accuracy of the result is influenced by this parameter. To visualize the results based on λ𝜆\lambdaitalic_λ, we observe that 4000 is optimal for Stable Diffusion 1.5, while 100 is optimal for Stable Diffusion 2.1.

Refer to caption
Figure 5: The number of correct predictions for Stable Diffusion 1.5.
Refer to caption
Figure 6: The number of correct predictions for Stable Diffusion 2.1.

A.6 Defect of Bias Prompts

As shown in Figure 7, the ideal mitigation should lie in the middle of male and female; however, the actual mitigation occurs in another dimension. Although it may have the same meaning when representing male or female, the same meaning may not be neutral and could still contain some gender attributes.

Refer to caption
Figure 7: Latent Representation of Prompt Embedding.

So Chuang et al. (2023) exhibits a defect that we have observed: as the parameter λ𝜆\lambdaitalic_λ increases, the original characteristics are lost and male characteristics become more and more prominent. We present a different set of images, where each set of images are generated with the same prompt and seeds but with different λ𝜆\lambdaitalic_λ to illustrate the “male guidance”.

Refer to caption
Figure 8: Defect of Chuang et al. (2023). The male characteristics become more and more prominent, and in some cases, the gender even switches.

A.7 Occupation list

We use the occupation list from Fair Diffusion Friedrich et al. (2023) instead of Chuang et al. (2023), as we found that the Fair Diffusion one contains more occupations and includes more challenging ones. All occupations are displayed in the Table 6 below, where we specifically label the right and wrong occupation predictions for our Bias-Identification method.

Right
accountant, aerospace engineer, air conditioning installer, architect, artist,
author, bartender, bus driver, butcher, career counselor, carpenter,
carpet installer, cashier, ceo, childcare worker, civil engineer, clergy, coach,
community manager, compliance officer, computer programmer,
computer support specialist, construction worker, cook, correctional officer,
customer service representative, dental assistant, dental hygienist, dentist,
detective, director, dishwasher, drywall installer, electrical engineer,
electrician, engineer, event planner, executive assistant, farmer,
fast food worker, financial advisor, financial analyst, financial manager,
fitness instructor, groundskeeper, hairdresser, head cook, health technician,
host, hostess, housekeeper, industrial engineer, interior designer, interviewer,
it specialist, jailer, janitor, language pathologist, librarian, logistician,
machinery mechanic, machinist, maid, manager, manicurist, market research analyst,
massage therapist, mechanic, mechanical engineer, medical records specialist,
mental health counselor, metal worker, mover, musician, network administrator,
nurse, nursing assistant, nutritionist, occupational therapist, office clerk,
painter, paralegal, payroll clerk, pharmacist, pharmacy technician, pilot,
plane mechanic, plumber, police officer, postal worker, printing press operator,
programmer, purchasing agent, radiologic technician, receptionist, repair worker,
roofer, sales manager, salesperson, scientist, security guard, sheet metal worker,
singer, social assistant, social worker, software developer, stocker, taxi driver,
teacher, teaching assistant, teller, therapist, tractor operator, truck driver,
tutor, waiter, waitress, web developer, welder, wholesale buyer, writer
Wrong
aide, baker, claims appraiser, cleaner, clerk, computer systems analyst,
courier, credit counselor, data entry keyer, designer, dispatcher, doctor,
facilities manager, file clerk, firefighter, graphic designer, insurance agent,
inventory clerk, laboratory technician, lawyer, marketing manager, office worker,
photographer, physical therapist, producer, psychologist, public relations specialist,
real estate broker, school bus driver, supervisor, underwriter, veterinarian
Table 6: Occupation Set, with right and wrong predictions for our Bias-Identification method.

A.8 BLIP2 VQA for Fairness Evaluation

For each picture, we use the following question to conduct the VQA evaluation and count the number of males and females in 100 pictures for each occupation.

Question: “Answer the following question with ‘male’ or ‘female’ or ‘people not present’ only. Is this person on this file male or female?”

For “people not present”, BLIP2 sometimes gives the answer “unknown”, so it will not be reflected in the count of “people not present”. However, it does not matter since we only care about the male and female count for the fairness score.

A.9 Failure cases

A.9.1 Right and Wrong Case for Bias Identification

As previously presented in the Table 6, which illustrates the right and wrong cases for Bias Identification. We can observe that the wrong cases are most likely to be difficult ones, including very neutral occupations and occupations containing two or more words.

A.9.2 Failure Generation for our Method

As mentioned above, some of our generated images blur the faces. This could be further improved through fine-tuning.

Refer to caption
Figure 9: Unsuccessful Generation of Face. Althrough we achieve fairness of gender, but we lose some aesthetic details of faces.

A.10 Human Evaluation Details

For the aesthetic response, we collected 29 responses, and for how well the image matches the image description, we collected 27 responses. All surveys were conducted anonymously, and the experiment was double-blinded. All the users were informed all the collected data would be used for scientific research only. When clicking the submission button, they know the data collection is anonymous and consent to the collection.