Search | arXiv e-print repository

"Patriarchy Hurts Men Too." Does Your Model Agree? A Discussion on Fairness Assumptions

Abstract: The pipeline of a fair ML practitioner is generally divided into three phases: 1) Selecting a fairness measure. 2) Choosing a model that minimizes this measure. 3) Maximizing the model's performance on the data. In the context of group fairness, this approach often obscures implicit assumptions about how bias is introduced into the data. For instance, in binary classification, it is often assumed… ▽ More The pipeline of a fair ML practitioner is generally divided into three phases: 1) Selecting a fairness measure. 2) Choosing a model that minimizes this measure. 3) Maximizing the model's performance on the data. In the context of group fairness, this approach often obscures implicit assumptions about how bias is introduced into the data. For instance, in binary classification, it is often assumed that the best model, with equal fairness, is the one with better performance. However, this belief already imposes specific properties on the process that introduced bias. More precisely, we are already assuming that the biasing process is a monotonic function of the fair scores, dependent solely on the sensitive attribute. We formally prove this claim regarding several implicit fairness assumptions. This leads, in our view, to two possible conclusions: either the behavior of the biasing process is more complex than mere monotonicity, which means we need to identify and reject our implicit assumptions in order to develop models capable of tackling more complex situations; or the bias introduced in the data behaves predictably, implying that many of the developed models are superfluous. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.16431 [pdf, other]

FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Authors: Ewoenam Kwaku Tokpo, Toon Calders

Abstract: Despite the evolution of language models, they continue to portray harmful societal biases and stereotypes inadvertently learned from training data. These inherent biases often result in detrimental effects in various applications. Counterfactual Data Augmentation (CDA), which seeks to balance demographic attributes in training data, has been a widely adopted approach to mitigate bias in natural l… ▽ More Despite the evolution of language models, they continue to portray harmful societal biases and stereotypes inadvertently learned from training data. These inherent biases often result in detrimental effects in various applications. Counterfactual Data Augmentation (CDA), which seeks to balance demographic attributes in training data, has been a widely adopted approach to mitigate bias in natural language processing. However, many existing CDA approaches rely on word substitution techniques using manually compiled word-pair dictionaries. These techniques often lead to out-of-context substitutions, resulting in potential quality issues. The advancement of model-based techniques, on the other hand, has been challenged by the need for parallel training data. Works in this area resort to manually generated parallel data that are expensive to collect and are consequently limited in scale. This paper proposes FairFlow, an automated approach to generating parallel data for training counterfactual text generator models that limits the need for human intervention. Furthermore, we show that FairFlow significantly overcomes the limitations of dictionary-based word-substitution approaches whilst maintaining good performance. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2406.16606 [pdf, other]

Cherry on the Cake: Fairness is NOT an Optimization Problem

Authors: Marco Favier, Toon Calders

Abstract: Fair cake-cutting is a mathematical subfield that studies the problem of fairly dividing a resource among a number of participants. The so-called ``cake,'' as an object, represents any resource that can be distributed among players. This concept is connected to supervised multi-label classification: any dataset can be thought of as a cake that needs to be distributed, where each label is a player… ▽ More Fair cake-cutting is a mathematical subfield that studies the problem of fairly dividing a resource among a number of participants. The so-called ``cake,'' as an object, represents any resource that can be distributed among players. This concept is connected to supervised multi-label classification: any dataset can be thought of as a cake that needs to be distributed, where each label is a player that receives its share of the dataset. In particular, any efficient cake-cutting solution for the dataset is equivalent to an optimal decision function. Although we are not the first to demonstrate this connection, the important ramifications of this parallel seem to have been partially forgotten. We revisit these classical results and demonstrate how this connection can be prolifically used for fairness in machine learning problems. Understanding the set of achievable fair decisions is a fundamental step in finding optimal fair solutions and satisfying fairness requirements. By employing the tools of cake-cutting theory, we have been able to describe the behavior of optimal fair decisions, which, counterintuitively, often exhibit quite unfair properties. Specifically, in order to satisfy fairness constraints, it is sometimes preferable, in the name of optimality, to purposefully make mistakes and deny giving the positive label to deserving individuals in a community in favor of less worthy individuals within the same community. This practice is known in the literature as cherry-picking and has been described as ``blatantly unfair.'' △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2403.14282 [pdf, other]

doi 10.1007/s10994-023-06401-1

How to be fair? A study of label and selection bias

Authors: Marco Favier, Toon Calders, Sam Pinxteren, Jonathan Meyer

Abstract: It is widely accepted that biased data leads to biased and thus potentially unfair models. Therefore, several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques whose aim is to learn models that are fair by design. Despite the myriad of mitigation techniques developed in the past decade, however, it is still poorly understood under what circum… ▽ More It is widely accepted that biased data leads to biased and thus potentially unfair models. Therefore, several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques whose aim is to learn models that are fair by design. Despite the myriad of mitigation techniques developed in the past decade, however, it is still poorly understood under what circumstances which methods work. Recently, Wick et al. showed, with experiments on synthetic data, that there exist situations in which bias mitigation techniques lead to more accurate models when measured on unbiased data. Nevertheless, in the absence of a thorough mathematical analysis, it remains unclear which techniques are effective under what circumstances. We propose to address this problem by establishing relationships between the type of bias and the effectiveness of a mitigation technique, where we categorize the mitigation techniques by the bias measure they optimize. In this paper we illustrate this principle for label and selection bias on the one hand, and demographic parity and ``We're All Equal'' on the other hand. Our theoretical analysis allows to explain the results of Wick et al. and we also show that there are situations where minimizing fairness measures does not result in the fairest possible distribution. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Journal ref: Machine Learning 112.12 (2023): 5081-5104

arXiv:2401.13391 [pdf, other]

Reranking individuals: The effect of fair classification within-groups

Authors: Sofie Goethals, Toon Calders

Abstract: Artificial Intelligence (AI) finds widespread application across various domains, but it sparks concerns about fairness in its deployment. The prevailing discourse in classification often emphasizes outcome-based metrics comparing sensitive subgroups without a nuanced consideration of the differential impacts within subgroups. Bias mitigation techniques not only affect the ranking of pairs of inst… ▽ More Artificial Intelligence (AI) finds widespread application across various domains, but it sparks concerns about fairness in its deployment. The prevailing discourse in classification often emphasizes outcome-based metrics comparing sensitive subgroups without a nuanced consideration of the differential impacts within subgroups. Bias mitigation techniques not only affect the ranking of pairs of instances across sensitive groups, but often also significantly affect the ranking of instances within these groups. Such changes are hard to explain and raise concerns regarding the validity of the intervention. Unfortunately, these effects remain under the radar in the accuracy-fairness evaluation framework that is usually applied. Additionally, we illustrate the effect of several popular bias mitigation methods, and how their output often does not reflect real-world scenarios. △ Less

Submitted 22 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2311.03186 [pdf, other]

Model-based Counterfactual Generator for Gender Bias Mitigation

Authors: Ewoenam Kwaku Tokpo, Toon Calders

Abstract: Counterfactual Data Augmentation (CDA) has been one of the preferred techniques for mitigating gender bias in natural language models. CDA techniques have mostly employed word substitution based on dictionaries. Although such dictionary-based CDA techniques have been shown to significantly improve the mitigation of gender bias, in this paper, we highlight some limitations of such dictionary-based… ▽ More Counterfactual Data Augmentation (CDA) has been one of the preferred techniques for mitigating gender bias in natural language models. CDA techniques have mostly employed word substitution based on dictionaries. Although such dictionary-based CDA techniques have been shown to significantly improve the mitigation of gender bias, in this paper, we highlight some limitations of such dictionary-based counterfactual data augmentation techniques, such as susceptibility to ungrammatical compositions, and lack of generalization outside the set of predefined dictionary words. Model-based solutions can alleviate these problems, yet the lack of qualitative parallel training data hinders development in this direction. Therefore, we propose a combination of data processing techniques and a bi-objective training regime to develop a model-based solution for generating counterfactuals to mitigate gender bias. We implemented our proposed solution and performed an empirical evaluation which shows how our model alleviates the shortcomings of dictionary-based solutions. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2301.12855 [pdf, other]

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Authors: Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders

Abstract: To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of… ▽ More To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of this. In this work, we design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover that instead of resolving gender bias, intrinsic mitigation techniques and metrics are able to hide it in such a way that significant gender information is retained in the embeddings. Furthermore, we show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not all, and each intrinsic bias measure can be fooled by some mitigation techniques, but not all. We confirm experimentally, that none of the intrinsic mitigation techniques used without any other fairness intervention is able to consistently impact extrinsic bias. We recommend that intrinsic bias mitigation techniques should be combined with other fairness interventions for downstream tasks. △ Less

Submitted 30 January, 2023; originally announced January 2023.

arXiv:2201.08643 [pdf, other]

Text Style Transfer for Bias Mitigation using Masked Language Modeling

Authors: Ewoenam Kwaku Tokpo, Toon Calders

Abstract: It is well known that textual data on the internet and other digital platforms contain significant levels of bias and stereotypes. Although many such texts contain stereotypes and biases that inherently exist in natural language for reasons that are not necessarily malicious, there are crucial reasons to mitigate these biases. For one, these texts are being used as training corpus to train languag… ▽ More It is well known that textual data on the internet and other digital platforms contain significant levels of bias and stereotypes. Although many such texts contain stereotypes and biases that inherently exist in natural language for reasons that are not necessarily malicious, there are crucial reasons to mitigate these biases. For one, these texts are being used as training corpus to train language models for salient applications like cv-screening, search engines, and chatbots; such applications are turning out to produce discriminatory results. Also, several research findings have concluded that biased texts have significant effects on the target demographic groups. For instance, masculine-worded job advertisements tend to be less appealing to female applicants. In this paper, we present a text style transfer model that can be used to automatically debias textual data. Our style transfer model improves on the limitations of many existing style transfer techniques such as loss of content information. Our model solves such issues by combining latent content encoding with explicit keyword replacement. We will show that this technique produces better content preservation whilst maintaining good style transfer accuracy. △ Less

Submitted 21 January, 2022; originally announced January 2022.

Comments: 9 pages, 3 figures, 5 tables

arXiv:2112.07447 [pdf, other]

Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models

Authors: Pieter Delobelle, Ewoenam Kwaku Tokpo, Toon Calders, Bettina Berendt

Abstract: An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metrics for pretrained language models and experimentall… ▽ More An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metrics for pretrained language models and experimentally evaluate compatibility, including both biases in language models as in their downstream tasks. We do this by a mixture of traditional literature survey and correlation analysis, as well as by running empirical evaluations. We find that many metrics are not compatible and highly depend on (i) templates, (ii) attribute and target seeds and (iii) the choice of embeddings. These results indicate that fairness or bias evaluation remains challenging for contextualized language models, if not at least highly subjective. To improve future comparisons and fairness evaluations, we recommend avoiding embedding-based metrics and focusing on fairness evaluations in downstream tasks. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 15 pages, 4 figures, 3 tables

arXiv:1902.06743 [pdf, other]

doi 10.1145/2656261

Finding Robust Itemsets Under Subsampling

Authors: Nikolaj Tatti, Fabian Moerchen, Toon Calders

Abstract: Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds… ▽ More Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: if an itemset is closed, free, non-derivable or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-$k$ approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets. △ Less

Submitted 23 April, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

Comments: Journal version. The previous version is the conference version (DOI: 10.1109/ICDM.2011.69)

arXiv:1809.05650 [pdf, other]

Detecting and Explaining Drifts in Yearly Grant Applications

Authors: Stephen Pauwels, Toon Calders

Abstract: During the lifetime of a Business Process changes can be made to the workflow, the required resources, required documents, . . . . Different traces from the same Business Process within a single log file can thus differ substantially due to these changes. We propose a method that is able to detect concept drift in multivariate log files with a dozen attributes. We test our approach on the BPI Chal… ▽ More During the lifetime of a Business Process changes can be made to the workflow, the required resources, required documents, . . . . Different traces from the same Business Process within a single log file can thus differ substantially due to these changes. We propose a method that is able to detect concept drift in multivariate log files with a dozen attributes. We test our approach on the BPI Challenge 2018 data con- sisting of applications for EU direct payment from farmers in Germany where we use it to detect Concept Drift. In contrast to other methods our algorithm does not require the manual selection of the features used to detect drift. Our method first creates a model that captures the re- lations between attributes and between events of different time steps. This model is then used to score every event and trace. These scores can be used to detect outlying cases and concept drift. Thanks to the decomposability of the score we are able to perform detailed root-cause analysis. △ Less

Submitted 16 October, 2018; v1 submitted 15 September, 2018; originally announced September 2018.

Comments: BPI Challenge 2018 - Academic Report

arXiv:1805.07107 [pdf, other]

Extending Dynamic Bayesian Networks for Anomaly Detection in Complex Logs

Authors: Stephen Pauwels, Toon Calders

Abstract: Checking various log files from different processes can be a tedious task as these logs contain lots of events, each with a (possibly large) number of attributes. We developed a way to automatically model log files and detect outlier traces in the data. For that we extend Dynamic Bayesian Networks to model the normal behavior found in log files. We introduce a new algorithm that is able to learn a… ▽ More Checking various log files from different processes can be a tedious task as these logs contain lots of events, each with a (possibly large) number of attributes. We developed a way to automatically model log files and detect outlier traces in the data. For that we extend Dynamic Bayesian Networks to model the normal behavior found in log files. We introduce a new algorithm that is able to learn a model of a log file starting from the data itself. The model is capable of scoring traces even when new values or new combinations of values appear in the log file. △ Less

Submitted 17 August, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

arXiv:1603.00091 [pdf, other]

PROMETHEE is Not Quadratic: An O(qn log(n)) Algorithm

Authors: Toon Calders, Dimitri Van Assche

Abstract: It is generally believed that the preference ranking method PROMETHEE has a quadratic time complexity. In this paper, however, we present an exact algorithm that computes PROMETHEE's net flow scores in time O(qn log(n)), where q represents the number of criteria and n the number of alternatives. The method is based on first sorting the alternatives after which the unicriterion flow scores of all a… ▽ More It is generally believed that the preference ranking method PROMETHEE has a quadratic time complexity. In this paper, however, we present an exact algorithm that computes PROMETHEE's net flow scores in time O(qn log(n)), where q represents the number of criteria and n the number of alternatives. The method is based on first sorting the alternatives after which the unicriterion flow scores of all alternatives can be computed in one scan over the sorted list of alternatives while maintaining a sliding window. This method works with the linear and level criterion preference functions. The algorithm we present is exact and, due to the sub-quadratic time complexity, vastly extends the applicability of the PROMETHEE method. Experiments show that with the new algorithm, PROMETHEE can scale up to millions of tuples. △ Less

Submitted 29 February, 2016; originally announced March 2016.

Comments: 16 pages, 2 figures

arXiv:1512.08150 [pdf, other]

doi 10.1145/2820783.2820840

Towards Distributed Convoy Pattern Mining

Authors: Faisal Orakzai, Thomas Devogele, Toon Calders

Abstract: Mining movement data to reveal interesting behavioral patterns has gained attention in recent years. One such pattern is the convoy pattern which consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however do not scale to real-life dataset sizes. Therefore a distributed… ▽ More Mining movement data to reveal interesting behavioral patterns has gained attention in recent years. One such pattern is the convoy pattern which consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however do not scale to real-life dataset sizes. Therefore a distributed algorithm for convoy mining is inevitable. In this paper, we discuss the problem of convoy mining and analyze different data partitioning strategies to pave the way for a generic distributed convoy pattern mining algorithm. △ Less

Submitted 26 December, 2015; originally announced December 2015.

Comments: SIGSPATIAL'15 November 03-06, 2015, Bellevue, WA, USA

arXiv:cs/0206004 [pdf, ps, other]

Mining All Non-Derivable Frequent Itemsets

Authors: Toon Calders, Bart Goethals

Abstract: Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, inst… ▽ More Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the set of all frequent itemsets and to exploit these redundancies in order to reduce the result of a mining operation. We present deduction rules to derive tight bounds on the support of candidate itemsets. We show how the deduction rules allow for constructing a minimal representation for all frequent itemsets. We also present connections between our proposal and recent proposals for concise representations and we give the results of experiments on real-life datasets that show the effectiveness of the deduction rules. In fact, the experiments even show that in many cases, first mining the concise representation, and then creating the frequent itemsets from this representation outperforms existing frequent set mining algorithms. △ Less

Submitted 3 June, 2002; originally announced June 2002.

Comments: 3 figures

ACM Class: H.2.8

Showing 1–15 of 15 results for author: Calders, T