Search | arXiv e-print repository

The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking

Authors: Ali Vardasbi, Maarten de Rijke, Fernando Diaz, Mostafa Dehghani

Abstract: When learning to rank from user interactions, search and recommender systems must address biases in user behavior to provide a high-quality ranking. One type of bias that has recently been studied in the ranking literature is when sensitive attributes, such as gender, have an impact on a user's judgment about an item's utility. For example, in a search for an expertise area, some users may be bias… ▽ More When learning to rank from user interactions, search and recommender systems must address biases in user behavior to provide a high-quality ranking. One type of bias that has recently been studied in the ranking literature is when sensitive attributes, such as gender, have an impact on a user's judgment about an item's utility. For example, in a search for an expertise area, some users may be biased towards clicking on male candidates over female candidates. We call this type of bias group membership bias. Increasingly, we seek rankings that are fair to individuals and sensitive groups. Merit-based fairness measures rely on the estimated utility of the items. With group membership bias, the utility of the sensitive groups is under-estimated, hence, without correcting for this bias, a supposedly fair ranking is not truly fair. In this paper, first, we analyze the impact of group membership bias on ranking quality as well as merit-based fairness metrics and show that group membership bias can hurt both ranking and fairness. Then, we provide a correction method for group bias that is based on the assumption that the utility score of items in different groups comes from the same distribution. This assumption has two potential issues of sparsity and equality-instead-of-equity; we use an amortized approach to address these. We show that our correction method can consistently compensate for the negative impact of group membership bias on ranking quality and fairness metrics. △ Less

Submitted 29 April, 2024; v1 submitted 5 August, 2023; originally announced August 2023.

arXiv:2305.02914 [pdf, ps, other]

doi 10.1145/3539618.3594247

Recent Advances in the Foundations and Applications of Unbiased Learning to Rank

Authors: Shashank Gupta, Philipp Hager, Jin Huang, Ali Vardasbi, Harrie Oosterhuis

Abstract: Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations along with several applications of its methods. The tutorial is divided into four parts: Firstly, we give an ov… ▽ More Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations along with several applications of its methods. The tutorial is divided into four parts: Firstly, we give an overview of the different forms of bias that can be addressed with ULTR methods. Secondly, we present a comprehensive discussion of the latest estimation techniques in the ULTR field. Thirdly, we survey published results of ULTR in real-world applications. Fourthly, we discuss the connection between ULTR and fairness in ranking. We end by briefly reflecting on the future of ULTR research and its applications. This tutorial is intended to benefit both researchers and industry practitioners who are interested in developing new ULTR solutions or utilizing them in real-world applications. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: SIGIR 2023 - Tutorial

arXiv:2305.00857 [pdf, other]

doi 10.1145/3539618.3591745

On the Impact of Outlier Bias on User Clicks

Authors: Fatemeh Sarvi, Ali Vardasbi, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke

Abstract: User interaction data is an important source of supervision in counterfactual learning to rank (CLTR). Such data suffers from presentation bias. Much work in unbiased learning to rank (ULTR) focuses on position bias, i.e., items at higher ranks are more likely to be examined and clicked. Inter-item dependencies also influence examination probabilities, with outlier items in a ranking as an importa… ▽ More User interaction data is an important source of supervision in counterfactual learning to rank (CLTR). Such data suffers from presentation bias. Much work in unbiased learning to rank (ULTR) focuses on position bias, i.e., items at higher ranks are more likely to be examined and clicked. Inter-item dependencies also influence examination probabilities, with outlier items in a ranking as an important example. Outliers are defined as items that observably deviate from the rest and therefore stand out in the ranking. In this paper, we identify and introduce the bias brought about by outlier items: users tend to click more on outlier items and their close neighbors. To this end, we first conduct a controlled experiment to study the effect of outliers on user clicks. Next, to examine whether the findings from our controlled experiment generalize to naturalistic situations, we explore real-world click logs from an e-commerce platform. We show that, in both scenarios, users tend to click significantly more on outlier items than on non-outlier items in the same rankings. We show that this tendency holds for all positions, i.e., for any specific position, an item receives more interactions when presented as an outlier as opposed to a non-outlier item. We conclude from our analysis that the effect of outliers on clicks is a type of bias that should be addressed in ULTR. We therefore propose an outlier-aware click model that accounts for both outlier and position bias, called outlier-aware position-based model ( OPBM). We estimate click propensities based on OPBM ; through extensive experiments performed on both real-world e-commerce data and semi-synthetic data, we verify the effectiveness of our outlier-aware click model. Our results show the superiority of OPBM against baselines in terms of ranking performance and true relevance estimation. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: Accepted at SIGIR'23, Full Paper Track

arXiv:2304.12776 [pdf, other]

State Spaces Aren't Enough: Machine Translation Needs Attention

Authors: Ali Vardasbi, Telmo Pessoa Pires, Robin M. Schmidt, Stephan Peitz

Abstract: Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Tra… ▽ More Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT'14 and WMT'16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4's inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2208.09529 [pdf, other]

Intersection of Parallels as an Early Stopping Criterion

Authors: Ali Vardasbi, Maarten de Rijke, Mostafa Dehghani

Abstract: A common way to avoid overfitting in supervised learning is early stopping, where a held-out set is used for iterative evaluation during training to find a sweet spot in the number of training steps that gives maximum generalization. However, such a method requires a disjoint validation set, thus part of the labeled data from the training set is usually left out for this purpose, which is not idea… ▽ More A common way to avoid overfitting in supervised learning is early stopping, where a held-out set is used for iterative evaluation during training to find a sweet spot in the number of training steps that gives maximum generalization. However, such a method requires a disjoint validation set, thus part of the labeled data from the training set is usually left out for this purpose, which is not ideal when training data is scarce. Furthermore, when the training labels are noisy, the performance of the model over a validation set may not be an accurate proxy for generalization. In this paper, we propose a method to spot an early stopping point in the training iterations without the need for a validation set. We first show that in the overparameterized regime the randomly initialized weights of a linear model converge to the same direction during training. Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting. In order to detect intersection, we use the cosine distance between the weights of the parallel models during training iterations. Noticing that the final layer of a NN is a linear map of pre-last layer activations to output logits, we build on our criterion for linear models and propose an extension to multi-layer networks, using the new notion of counterfactual weights. We conduct experiments on two areas that early stopping has noticeable impact on preventing overfitting of a NN: (i) learning from noisy labels; and (ii) learning to rank in IR. Our experiments on four widely used datasets confirm the effectiveness of our method for generalization. For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against in almost all of the tested cases. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: CIKM 2022

arXiv:2204.13765 [pdf, other]

doi 10.1145/3477495.3532045

Probabilistic Permutation Graph Search: Black-Box Optimization for Fairness in Ranking

Authors: Ali Vardasbi, Fatemeh Sarvi, Maarten de Rijke

Abstract: There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. PL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate number of repeating sessions, PL optimization has… ▽ More There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. PL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate number of repeating sessions, PL optimization has room for improvement for queries with a small number of repeating sessions. In this paper, we present a novel way of representing permutation distributions, based on the notion of permutation graphs. Similar to PL, our distribution representation, called PPG, can be used for black-box optimization of fairness. Different from PL, where pointwise logits are used as the distribution parameters, in PPG pairwise inversion probabilities together with a reference permutation construct the distribution. As such, the reference permutation can be set to the best sampled permutation regarding the objective function, making PPG suitable for both deterministic and stochastic rankings. Our experiments show that PPG, while comparable to PL for larger session repetitions (i.e., stochastic ranking), improves over PL for optimizing fairness metrics for queries with one session (i.e., deterministic ranking). Additionally, when accurate utility estimations are available, e.g., in tabular models, the performance of PPG in fairness optimization is significantly boosted compared to lower quality utility estimations from a learning to rank model, leading to a large performance gap with PL. Finally, the pairwise probabilities make it possible to impose pairwise constraints such as "item $d_1$ should always be ranked higher than item $d_2$." Such constraints can be used to simultaneously optimize the fairness metric and control another objective such as ranking performance. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: SIGIR 2022

arXiv:2109.05929 [pdf, other]

doi 10.1145/3459637.3482493

Cross-Market Product Recommendation

Authors: Hamed Bonab, Mohammad Aliannejadi, Ali Vardasbi, Evangelos Kanoulas, James Allan

Abstract: We study the problem of recommending relevant products to users in relatively resource-scarce markets by leveraging data from similar, richer in resource auxiliary markets. We hypothesize that data from one market can be used to improve performance in another. Only a few studies have been conducted in this area, partly due to the lack of publicly available experimental data. To this end, we collec… ▽ More We study the problem of recommending relevant products to users in relatively resource-scarce markets by leveraging data from similar, richer in resource auxiliary markets. We hypothesize that data from one market can be used to improve performance in another. Only a few studies have been conducted in this area, partly due to the lack of publicly available experimental data. To this end, we collect and release XMarket, a large dataset covering 18 local markets on 16 different product categories, featuring 52.5 million user-item interactions. We introduce and formalize the problem of cross-market product recommendation, i.e., market adaptation. We explore different market-adaptation techniques inspired by state-of-the-art domain-adaptation and meta-learning approaches and propose a novel neural approach for market adaptation, named FOREC. Our model follows a three-step procedure -- pre-training, forking, and fine-tuning -- in order to fully utilize the data from an auxiliary market as well as the target market. We conduct extensive experiments studying the impact of market adaptation on different pairs of markets. Our proposed approach demonstrates robust effectiveness, consistently improving the performance on target markets compared to competitive baselines selected for our analysis. In particular, FOREC improves on average 24% and up to 50% in terms of nDCG@10, compared to the NMF baseline. Our analysis and experiments suggest specific future directions in this research area. We release our data and code for academic purposes. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted in CIKM 2021

arXiv:2108.08538 [pdf, other]

doi 10.1145/3459637.3482275

Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank

Authors: Ali Vardasbi, Maarten de Rijke, Ilya Markov

Abstract: In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that cor… ▽ More In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that corrects for position bias and trust bias. IPS and AC provably remove bias, conditioned on an accurate estimation of the bias parameters. Estimating the bias parameters, in turn, requires an accurate estimation of the relevance probabilities. This cyclic dependency introduces practical limitations in terms of sensitivity, convergence and efficiency. We propose a new correction method for position and trust bias in CLTR in which, unlike the existing methods, the correction does not rely on relevance estimation. Our proposed method, mixture-based correction (MBC), is based on the assumption that the distribution of the CTRs over the items being ranked is a mixture of two distributions: the distribution of CTRs for relevant items and the distribution of CTRs for non-relevant items. We prove that our method is unbiased. The validity of our proof is not conditioned on accurate bias parameter estimation. Our experiments show that MBC, when used in different bias settings and accompanied by different LTR algorithms, outperforms AC, the state-of-the-art method for correcting position and trust bias, in some settings, while performing on par in other settings. Furthermore, MBC is orders of magnitude more efficient than AC in terms of the training time. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: CIKM 2021

arXiv:2008.10242 [pdf, other]

doi 10.1145/3340531.3412031

When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank

Authors: Ali Vardasbi, Harrie Oosterhuis, Maarten de Rijke

Abstract: Besides position bias, which has been well-studied, trust bias is another type of bias prevalent in user interactions with rankings: users are more likely to click incorrectly w.r.t. their preferences on highly ranked items because they trust the ranking system. While previous work has observed this behavior in users, we prove that existing Counterfactual Learning to Rank (CLTR) methods do not rem… ▽ More Besides position bias, which has been well-studied, trust bias is another type of bias prevalent in user interactions with rankings: users are more likely to click incorrectly w.r.t. their preferences on highly ranked items because they trust the ranking system. While previous work has observed this behavior in users, we prove that existing Counterfactual Learning to Rank (CLTR) methods do not remove this bias, including methods specifically designed to mitigate this type of bias. Moreover, we prove that Inverse Propensity Scoring (IPS) is principally unable to correct for trust bias under non-trivial circumstances. Our main contribution is a new estimator based on affine corrections: it both reweights clicks and penalizes items displayed on ranks with high trust bias. Our estimator is the first estimator that is proven to remove the effect of both trust bias and position bias. Furthermore, we show that our estimator is a generalization of the existing CLTR framework: if no trust bias is present, it reduces to the original IPS estimator. Our semi-synthetic experiments indicate that by removing the effect of trust bias in addition to position bias, CLTR can approximate the optimal ranking system even closer than previously possible. △ Less

Submitted 9 September, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: CIKM 2020

arXiv:2005.11938 [pdf, other]

doi 10.1145/3397271.3401299

Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank

Authors: Ali Vardasbi, Maarten de Rijke, Ilya Markov

Abstract: Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where… ▽ More Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20)

arXiv:1708.01891 [pdf]

A Behavioral Analysis on the Reselection of Seed Nodes in Independent Cascade Based Influence Maximization

Authors: Ali Vardasbi, Heshaam Faili, Masoud Asadpour

Abstract: Influence maximization serves as the main goal of a variety of social network activities such as viral marketing and campaign advertising. The independent cascade model for the influence spread assumes a one-time chance for each activated node to influence its neighbors. This reasonable assumption cannot be bypassed, since otherwise the influence probabilities of the nodes, modeled by the edge wei… ▽ More Influence maximization serves as the main goal of a variety of social network activities such as viral marketing and campaign advertising. The independent cascade model for the influence spread assumes a one-time chance for each activated node to influence its neighbors. This reasonable assumption cannot be bypassed, since otherwise the influence probabilities of the nodes, modeled by the edge weights, would be altered. On the other hand, the manually activated seed set nodes can be reselected without violating the model parameters or assumptions. The reselection of a seed set node, simply means paying extra budget to a previously paid node in order for it to retry its influential skills on its uninfluenced neighbors. This view divides the influence maximization process into two cases: the simple case where the reselection of the nodes is not considered and the reselection case. In this study we will analyze the behavior of real world networks on the difference between these two influence maximization cases. First we will show that the difference between the simple and the reselection cases constitutes a wide spectrum of networks ranging from the reselection-independent ones, where the reselection case has no noticeable advantage to the simple case, to the reselection-friendly ones, where the influence spread in the reselection case is twice the one in the simple case. Then we will correlate this dynamic to other influence maximization dynamics of the network. Finally, a significant entanglement between this dynamic and the network structure is shown and verified by the experiments. In other words, a series of conditions on the network structure is specified whose fulfilment is a sign for a reselection-friendly network. As a result of this entanglement, reselection-friendly networks can be spotted without performing the time consuming influence maximization algorithms. △ Less

Submitted 6 August, 2017; originally announced August 2017.

Showing 1–11 of 11 results for author: Vardasbi, A