Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Calders, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00330  [pdf, other

    cs.LG

    "Patriarchy Hurts Men Too." Does Your Model Agree? A Discussion on Fairness Assumptions

    Authors: Marco Favier, Toon Calders

    Abstract: The pipeline of a fair ML practitioner is generally divided into three phases: 1) Selecting a fairness measure. 2) Choosing a model that minimizes this measure. 3) Maximizing the model's performance on the data. In the context of group fairness, this approach often obscures implicit assumptions about how bias is introduced into the data. For instance, in binary classification, it is often assumed… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  2. arXiv:2407.16431  [pdf, other

    cs.CL

    FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

    Authors: Ewoenam Kwaku Tokpo, Toon Calders

    Abstract: Despite the evolution of language models, they continue to portray harmful societal biases and stereotypes inadvertently learned from training data. These inherent biases often result in detrimental effects in various applications. Counterfactual Data Augmentation (CDA), which seeks to balance demographic attributes in training data, has been a widely adopted approach to mitigate bias in natural l… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2406.16606  [pdf, other

    cs.LG cs.CY cs.GT

    Cherry on the Cake: Fairness is NOT an Optimization Problem

    Authors: Marco Favier, Toon Calders

    Abstract: Fair cake-cutting is a mathematical subfield that studies the problem of fairly dividing a resource among a number of participants. The so-called ``cake,'' as an object, represents any resource that can be distributed among players. This concept is connected to supervised multi-label classification: any dataset can be thought of as a cake that needs to be distributed, where each label is a player… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. How to be fair? A study of label and selection bias

    Authors: Marco Favier, Toon Calders, Sam Pinxteren, Jonathan Meyer

    Abstract: It is widely accepted that biased data leads to biased and thus potentially unfair models. Therefore, several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques whose aim is to learn models that are fair by design. Despite the myriad of mitigation techniques developed in the past decade, however, it is still poorly understood under what circum… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Journal ref: Machine Learning 112.12 (2023): 5081-5104

  5. arXiv:2401.13391  [pdf, other

    cs.LG

    Reranking individuals: The effect of fair classification within-groups

    Authors: Sofie Goethals, Toon Calders

    Abstract: Artificial Intelligence (AI) finds widespread application across various domains, but it sparks concerns about fairness in its deployment. The prevailing discourse in classification often emphasizes outcome-based metrics comparing sensitive subgroups without a nuanced consideration of the differential impacts within subgroups. Bias mitigation techniques not only affect the ranking of pairs of inst… ▽ More

    Submitted 22 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  6. arXiv:2311.03186  [pdf, other

    cs.CL

    Model-based Counterfactual Generator for Gender Bias Mitigation

    Authors: Ewoenam Kwaku Tokpo, Toon Calders

    Abstract: Counterfactual Data Augmentation (CDA) has been one of the preferred techniques for mitigating gender bias in natural language models. CDA techniques have mostly employed word substitution based on dictionaries. Although such dictionary-based CDA techniques have been shown to significantly improve the mitigation of gender bias, in this paper, we highlight some limitations of such dictionary-based… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  7. arXiv:2301.12855  [pdf, other

    cs.CL

    How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

    Authors: Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders

    Abstract: To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  8. arXiv:2201.08643  [pdf, other

    cs.CL

    Text Style Transfer for Bias Mitigation using Masked Language Modeling

    Authors: Ewoenam Kwaku Tokpo, Toon Calders

    Abstract: It is well known that textual data on the internet and other digital platforms contain significant levels of bias and stereotypes. Although many such texts contain stereotypes and biases that inherently exist in natural language for reasons that are not necessarily malicious, there are crucial reasons to mitigate these biases. For one, these texts are being used as training corpus to train languag… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

    Comments: 9 pages, 3 figures, 5 tables

  9. arXiv:2112.07447  [pdf, other

    cs.CL cs.CY cs.LG

    Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models

    Authors: Pieter Delobelle, Ewoenam Kwaku Tokpo, Toon Calders, Bettina Berendt

    Abstract: An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metrics for pretrained language models and experimentall… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: 15 pages, 4 figures, 3 tables

  10. arXiv:1902.06743  [pdf, other

    cs.DB cs.IR

    Finding Robust Itemsets Under Subsampling

    Authors: Nikolaj Tatti, Fabian Moerchen, Toon Calders

    Abstract: Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds… ▽ More

    Submitted 23 April, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: Journal version. The previous version is the conference version (DOI: 10.1109/ICDM.2011.69)

  11. arXiv:1809.05650  [pdf, other

    cs.AI cs.LG

    Detecting and Explaining Drifts in Yearly Grant Applications

    Authors: Stephen Pauwels, Toon Calders

    Abstract: During the lifetime of a Business Process changes can be made to the workflow, the required resources, required documents, . . . . Different traces from the same Business Process within a single log file can thus differ substantially due to these changes. We propose a method that is able to detect concept drift in multivariate log files with a dozen attributes. We test our approach on the BPI Chal… ▽ More

    Submitted 16 October, 2018; v1 submitted 15 September, 2018; originally announced September 2018.

    Comments: BPI Challenge 2018 - Academic Report

  12. arXiv:1805.07107  [pdf, other

    cs.AI cs.LG

    Extending Dynamic Bayesian Networks for Anomaly Detection in Complex Logs

    Authors: Stephen Pauwels, Toon Calders

    Abstract: Checking various log files from different processes can be a tedious task as these logs contain lots of events, each with a (possibly large) number of attributes. We developed a way to automatically model log files and detect outlier traces in the data. For that we extend Dynamic Bayesian Networks to model the normal behavior found in log files. We introduce a new algorithm that is able to learn a… ▽ More

    Submitted 17 August, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

  13. arXiv:1603.00091  [pdf, other

    cs.DS cs.CC

    PROMETHEE is Not Quadratic: An O(qn log(n)) Algorithm

    Authors: Toon Calders, Dimitri Van Assche

    Abstract: It is generally believed that the preference ranking method PROMETHEE has a quadratic time complexity. In this paper, however, we present an exact algorithm that computes PROMETHEE's net flow scores in time O(qn log(n)), where q represents the number of criteria and n the number of alternatives. The method is based on first sorting the alternatives after which the unicriterion flow scores of all a… ▽ More

    Submitted 29 February, 2016; originally announced March 2016.

    Comments: 16 pages, 2 figures

  14. Towards Distributed Convoy Pattern Mining

    Authors: Faisal Orakzai, Thomas Devogele, Toon Calders

    Abstract: Mining movement data to reveal interesting behavioral patterns has gained attention in recent years. One such pattern is the convoy pattern which consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however do not scale to real-life dataset sizes. Therefore a distributed… ▽ More

    Submitted 26 December, 2015; originally announced December 2015.

    Comments: SIGSPATIAL'15 November 03-06, 2015, Bellevue, WA, USA

  15. arXiv:cs/0206004  [pdf, ps, other

    cs.DB cs.AI

    Mining All Non-Derivable Frequent Itemsets

    Authors: Toon Calders, Bart Goethals

    Abstract: Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, inst… ▽ More

    Submitted 3 June, 2002; originally announced June 2002.

    Comments: 3 figures

    ACM Class: H.2.8