Search | arXiv e-print repository

Unlocking Fair Use in the Generative AI Supply Chain: A Systematized Literature Review

Abstract: Through a systematization of generative AI (GenAI) stakeholder goals and expectations, this work seeks to uncover what value different stakeholders see in their contributions to the GenAI supply line. This valuation enables us to understand whether fair use advocated by GenAI companies to train model progresses the copyright law objective of promoting science and arts. While assessing the validity… ▽ More Through a systematization of generative AI (GenAI) stakeholder goals and expectations, this work seeks to uncover what value different stakeholders see in their contributions to the GenAI supply line. This valuation enables us to understand whether fair use advocated by GenAI companies to train model progresses the copyright law objective of promoting science and arts. While assessing the validity and efficacy of the fair use argument, we uncover research gaps and potential avenues for future works for researchers and policymakers to address. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2312.07348 [pdf, other]

doi 10.1145/3613904.3642260

"It doesn't tell me anything about how my data is used'': User Perceptions of Data Collection Purposes

Authors: Lin Kyi, Abraham Mhaidli, Cristiana Santos, Franziska Roesner, Asia Biega

Abstract: Data collection purposes and their descriptions are presented on almost all privacy notices under the GDPR, yet there is a lack of research focusing on how effective they are at informing users about data practices. We fill this gap by investigating users' perceptions of data collection purposes and their descriptions, a crucial aspect of informed consent. We conducted 23 semi-structured interview… ▽ More Data collection purposes and their descriptions are presented on almost all privacy notices under the GDPR, yet there is a lack of research focusing on how effective they are at informing users about data practices. We fill this gap by investigating users' perceptions of data collection purposes and their descriptions, a crucial aspect of informed consent. We conducted 23 semi-structured interviews with European users to investigate user perceptions of six common purposes (Strictly Necessary, Statistics and Analytics, Performance and Functionality, Marketing and Advertising, Personalized Advertising, and Personalized Content) and identified elements of an effective purpose name and description. We found that most purpose descriptions do not contain the information users wish to know, and that participants preferred some purpose names over others due to their perceived transparency or ease of understanding. Based on these findings, we suggest how the framing of purposes can be improved toward meaningful informed consent. △ Less

Submitted 6 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted for publication at the 2024 ACM Conference on Human Factors in Computing Systems (CHI'24)

arXiv:2309.13933 [pdf, other]

Fairness and Bias in Algorithmic Hiring: a Multidisciplinary Survey

Authors: Alessandro Fabris, Nina Baranowska, Matthew J. Dennis, David Graus, Philipp Hacker, Jorge Saldivar, Frederik Zuiderveen Borgesius, Asia J. Biega

Abstract: Employers are adopting algorithmic hiring technology throughout the recruitment pipeline. Algorithmic fairness is especially applicable in this domain due to its high stakes and structural inequalities. Unfortunately, most work in this space provides partial treatment, often constrained by two competing narratives, optimistically focused on replacing biased recruiter decisions or pessimistically p… ▽ More Employers are adopting algorithmic hiring technology throughout the recruitment pipeline. Algorithmic fairness is especially applicable in this domain due to its high stakes and structural inequalities. Unfortunately, most work in this space provides partial treatment, often constrained by two competing narratives, optimistically focused on replacing biased recruiter decisions or pessimistically pointing to the automation of discrimination. Whether, and more importantly what types of, algorithmic hiring can be less biased and more beneficial to society than low-tech alternatives currently remains unanswered, to the detriment of trustworthiness. This multidisciplinary survey caters to practitioners and researchers with a balanced and integrated coverage of systems, biases, measures, mitigation strategies, datasets, and legal aspects of algorithmic hiring and fairness. Our work supports a contextualized understanding and governance of this technology by highlighting current opportunities and limitations, providing recommendations for future work to ensure shared benefits for all stakeholders. △ Less

Submitted 8 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.00939 [pdf, other]

Data Repurposing through Compatibility: A Computational Perspective

Authors: Asia J. Biega

Abstract: Reuse of data in new contexts beyond the purposes for which it was originally collected has contributed to technological innovation and reducing the consent burden on data subjects. One of the legal mechanisms that makes such reuse possible is purpose compatibility assessment. In this paper, I offer an in-depth analysis of this mechanism through a computational lens. I moreover consider what shoul… ▽ More Reuse of data in new contexts beyond the purposes for which it was originally collected has contributed to technological innovation and reducing the consent burden on data subjects. One of the legal mechanisms that makes such reuse possible is purpose compatibility assessment. In this paper, I offer an in-depth analysis of this mechanism through a computational lens. I moreover consider what should qualify as repurposing apart from using data for a completely new task, and argue that typical purpose formulations are an impediment to meaningful repurposing. Overall, the paper positions compatibility assessment as a constructive practice beyond an ineffective standard. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: To appear in the Special Issue of the Journal of Institutional and Theoretical Economics on "Machine Learning and the Law". Written for the Symposium on Machine Learning and the Law of the Max Planck Institute for Research on Collective Goods: https://www.coll.mpg.de/329557/segovia?c=67659

arXiv:2305.05608 [pdf, other]

doi 10.1145/3539618.3591933

The Role of Relevance in Fair Ranking

Authors: Aparna Balagopalan, Abigail Z. Jacobs, Asia Biega

Abstract: Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because t… ▽ More Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because these constructs are typically not directly observable, platforms must instead resort to using proxy scores such as relevance and infer them from behavioral signals such as searcher clicks. Yet, it remains an open question whether relevance fulfills its role as such a worthiness score in high-stakes fair rankings. In this paper, we combine perspectives and tools from the social sciences, information retrieval, and fairness in machine learning to derive a set of desired criteria that relevance scores should satisfy in order to meaningfully guide fairness interventions. We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data. We assess the impact of these violations on the estimated system fairness and analyze whether existing fairness interventions may mitigate the identified issues. Our analyses and results surface the pressing need for new approaches to relevance collection and generation that are suitable for use in fair ranking. △ Less

Submitted 6 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Published in SIGIR 2023

arXiv:2302.14661 [pdf, other]

Explainability as a Requirement for Hardware: Introducing Explainable Hardware (XHW)

Authors: Timo Speith, Julian Speith, Steffen Becker, Yixin Zou, Asia Biega, Christof Paar

Abstract: In today's age of digital technology, ethical concerns regarding computing systems are increasing. While the focus of such concerns currently is on requirements for software, this article spotlights the hardware domain, specifically microchips. For example, the opaqueness of modern microchips raises security issues, as malicious actors can manipulate them, jeopardizing system integrity. As a conse… ▽ More In today's age of digital technology, ethical concerns regarding computing systems are increasing. While the focus of such concerns currently is on requirements for software, this article spotlights the hardware domain, specifically microchips. For example, the opaqueness of modern microchips raises security issues, as malicious actors can manipulate them, jeopardizing system integrity. As a consequence, governments invest substantially to facilitate a secure microchip supply chain. To combat the opaqueness of hardware, this article introduces the concept of Explainable Hardware (XHW). Inspired by and building on previous work on Explainable AI (XAI) and explainable software systems, we develop a framework for achieving XHW comprising relevant stakeholders, requirements they might have concerning hardware, and possible explainability approaches to meet these requirements. Through an exploratory survey among 18 hardware experts, we showcase applications of the framework and discover potential research gaps. Our work lays the foundation for future work and structured debates on XHW. △ Less

Submitted 25 April, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

arXiv:2208.14137 [pdf, other]

On the Trade-Off between Actionable Explanations and the Right to be Forgotten

Authors: Martin Pawelczyk, Tobias Leemann, Asia Biega, Gjergji Kasneci

Abstract: As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to… ▽ More As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to reverse unfavorable decisions. To date, it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of differentiable models, we suggest a framework to identify a minimal subset of critical training points which, when removed, maximize the fraction of invalidated recourses. Using our framework, we empirically show that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of "the right to an actionable explanation" in the context of the "right to be forgotten", while also providing constructive insights on the determining factors of recourse robustness. △ Less

Submitted 11 October, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: ICLR 2023 camera ready version

Journal ref: 11th International Conference on Learning Representations (ICLR) 2023

arXiv:2203.08199 [pdf, ps, other]

doi 10.1145/3491102.3501857

(Re)Politicizing Digital Well-Being: Beyond User Engagements

Authors: Niall Docherty, Asia J. Biega

Abstract: The psychological costs of the attention economy are often considered through the binary of harmful design and healthy use, with digital well-being chiefly characterised as a matter of personal responsibility. This article adopts an interdisciplinary approach to highlight the empirical, ideological, and political limits of embedding this individualised perspective in computational discourses and d… ▽ More The psychological costs of the attention economy are often considered through the binary of harmful design and healthy use, with digital well-being chiefly characterised as a matter of personal responsibility. This article adopts an interdisciplinary approach to highlight the empirical, ideological, and political limits of embedding this individualised perspective in computational discourses and designs of digital well-being measurement. We will reveal well-being to be a culturally specific and environmentally conditioned concept and will problematize user engagement as a universal proxy for well-being. Instead, the contributing factors of user well-being will be located in environing social, cultural, and political conditions far beyond the control of individual users alone. In doing so, we hope to reinvigorate the issue of digital well-being measurement as a nexus point of political concern, through which multiple disciplines can study experiences of digital ill as symptomatic of wider social inequalities and (capitalist) relations of power. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: Published in Proceedings of CHI '22

arXiv:2110.07701 [pdf, other]

Exposing Query Identification for Search Transparency

Authors: Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega

Abstract: Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Ex… ▽ More Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and traditional BM25 models. We then propose how this approach can be improved through metric learning over the retrieval embedding space. We further derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI. Overall, our work contributes a novel conception of transparency in search systems and computational means of achieving it. △ Less

Submitted 11 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2108.05152 [pdf, other]

Estimation of Fair Ranking Metrics with Incomplete Judgments

Authors: Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Yılmaz

Abstract: There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individu… ▽ More There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Published in Proceedings of the Web Conference 2021 (WWW '21)

arXiv:2108.05135 [pdf, other]

Overview of the TREC 2020 Fair Ranking Track

Authors: Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sergey Feldman, Sebastian Kohlmeier

Abstract: This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine. The central goal of the Fair Ranking track is to provide fair exposure to different groups of authors (a group fairness framing). We recognize that there may b… ▽ More This paper provides an overview of the NIST TREC 2020 Fair Ranking track. For 2020, we again adopted an academic search task, where we have a corpus of academic article abstracts and queries submitted to a production academic search engine. The central goal of the Fair Ranking track is to provide fair exposure to different groups of authors (a group fairness framing). We recognize that there may be multiple group definitions (e.g. based on demographics, stature, topic) and hoped for the systems to be robust to these. We expected participants to develop systems that optimize for fairness and relevance for arbitrary group definitions, and did not reveal the exact group definitions until after the evaluation runs were submitted.The track contains two tasks,reranking and retrieval, with a shared evaluation. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Published in The Twenty-Ninth Text REtrieval Conference Proceedings (TREC 2020). arXiv admin note: substantial text overlap with arXiv:2003.11650

arXiv:2107.08096 [pdf, other]

Learning to Limit Data Collection via Scaling Laws: A Computational Interpretation for the Legal Principle of Data Minimization

Authors: Divya Shanmugam, Samira Shabanian, Fernando Diaz, Michèle Finck, Asia Biega

Abstract: Modern machine learning systems are increasingly characterized by extensive personal data collection, despite the diminishing returns and increasing societal costs of such practices. Yet, data minimisation is one of the core data protection principles enshrined in the European Union's General Data Protection Regulation ('GDPR') and requires that only personal data that is adequate, relevant and li… ▽ More Modern machine learning systems are increasingly characterized by extensive personal data collection, despite the diminishing returns and increasing societal costs of such practices. Yet, data minimisation is one of the core data protection principles enshrined in the European Union's General Data Protection Regulation ('GDPR') and requires that only personal data that is adequate, relevant and limited to what is necessary is processed. However, the principle has seen limited adoption due to the lack of technical interpretation. In this work, we build on literature in machine learning and law to propose FIDO, a Framework for Inhibiting Data Overcollection. FIDO learns to limit data collection based on an interpretation of data minimization tied to system performance. Concretely, FIDO provides a data collection stopping criterion by iteratively updating an estimate of the performance curve, or the relationship between dataset size and performance, as data is acquired. FIDO estimates the performance curve via a piecewise power law technique that models distinct phases of an algorithm's performance throughout data collection separately. Empirical experiments show that the framework produces accurate performance curves and data collection stopping criteria across datasets and feature acquisition algorithms. We further demonstrate that many other families of curves systematically overestimate the return on additional data. Results and analysis from our investigation offer deeper insights into the relevant considerations when designing a data minimization framework, including the impacts of active feature acquisition on individual users and the feasability of user-specific data minimization. We conclude with practical recommendations for the implementation of data minimization. △ Less

Submitted 12 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

Comments: To appear at ACM Conference on Fairness, Accountability, and Transparency, 2022

arXiv:2101.06203 [pdf]

doi 10.26116/techreg.2021.004

Reviving Purpose Limitation and Data Minimisation in Data-Driven Systems

Authors: Asia J. Biega, Michèle Finck

Abstract: This paper determines whether the two core data protection principles of data minimisation and purpose limitation can be meaningfully implemented in data-driven systems. While contemporary data processing practices appear to stand at odds with these principles, we demonstrate that systems could technically use much less data than they currently do. This observation is a starting point for our deta… ▽ More This paper determines whether the two core data protection principles of data minimisation and purpose limitation can be meaningfully implemented in data-driven systems. While contemporary data processing practices appear to stand at odds with these principles, we demonstrate that systems could technically use much less data than they currently do. This observation is a starting point for our detailed techno-legal analysis uncovering obstacles that stand in the way of meaningful implementation and compliance as well as exemplifying unexpected trade-offs which emerge where data protection law is applied in practice. Our analysis seeks to inform debates about the impact of data protection on the development of artificial intelligence in the European Union, offering practical action points for data controllers, regulators, and researchers. △ Less

Submitted 16 December, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: In Technology and Regulation (2021): https://doi.org/10.26116/techreg.2021.004

Journal ref: Technology and Regulation 2021 (August), 44-61

arXiv:2005.13718 [pdf, other]

Operationalizing the Legal Principle of Data Minimization for Personalization

Authors: Asia J. Biega, Peter Potash, Hal Daumé III, Fernando Diaz, Michèle Finck

Abstract: Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the… ▽ More Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the interpretation of these principles is an open issue for information access systems that optimize for user experience through personalization and do not strictly require personal data collection for the delivery of basic service. In this paper, we identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization. The focus of our empirical study in the domain of recommender systems is on providing foundational insights about the (i) feasibility of different data minimization definitions, (ii) robustness of different recommendation algorithms to minimization, and (iii) performance of different minimization strategies.We find that the performance decrease incurred by data minimization might not be substantial, but that it might disparately impact different users---a finding which has implications for the viability of different formal minimization definitions. Overall, our analysis uncovers the complexities of the data minimization problem in the context of personalization and maps the remaining computational and regulatory challenges. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: SIGIR 2020 paper: In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

arXiv:2004.13157 [pdf, other]

doi 10.1145/3340531.3411962

Evaluating Stochastic Rankings with Expected Exposure

Authors: Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, Ben Carterette

Abstract: We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principl… ▽ More We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a \emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including \emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress. △ Less

Submitted 20 October, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA

arXiv:2004.02023 [pdf, other]

doi 10.1007/978-3-030-45442-5_14

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

Authors: Asia J. Biega, Jana Schmidt, Rishiraj Saha Roy

Abstract: Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formul… ▽ More Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding. △ Less

Submitted 3 June, 2021; v1 submitted 4 April, 2020; originally announced April 2020.

Comments: ECIR 2020 Short Paper

arXiv:2003.11650 [pdf, other]

Overview of the TREC 2019 Fair Ranking Track

Authors: Asia J. Biega, Fernando Diaz, Michael D. Ekstrand, Sebastian Kohlmeier

Abstract: The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstrac… ▽ More The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers in addition to classic notions of relevance. As part of the benchmark, we defined standardized fairness metrics with evaluation protocols and released a dataset for the fair ranking problem. The 2019 task focused on reranking academic paper abstracts given a query. The objective was to fairly represent relevant authors from several groups that were unknown at the system submission time. Thus, the track emphasized the development of systems which have robust performance across a variety of group definitions. Participants were provided with querylog data (queries, documents, and relevance) from Semantic Scholar. This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process, as well as a comparison of the performance of submitted systems. △ Less

Submitted 25 March, 2020; originally announced March 2020.

Comments: Published in The Twenty-Eighth Text REtrieval Conference Proceedings (TREC 2019)

arXiv:1912.09910 [pdf, other]

Report on the First HIPstIR Workshop on the Future of Information Retrieval

Authors: Laura Dietz, Bhaskar Mitra, Jeremy Pickens, Hana Anber, Sandeep Avula, Asia Biega, Adrian Boteanu, Shubham Chatterjee, Jeff Dalton, Shiri Dori-Hacohen, John Foley, Henry Feild, Ben Gamari, Rosie Jones, Pallika Kanani, Sumanta Kashyapi, Widad Machmouchi, Matthew Mitsui, Steve Nole, Alexandre Tachard Passos, Jordan Ramsdell, Adam Roegiest, David Smith, Alessandro Sordoni

Abstract: The vision of HIPstIR is that early stage information retrieval (IR) researchers get together to develop a future for non-mainstream ideas and research agendas in IR. The first iteration of this vision materialized in the form of a three day workshop in Portsmouth, New Hampshire attended by 24 researchers across academia and industry. Attendees pre-submitted one or more topics that they want to pi… ▽ More The vision of HIPstIR is that early stage information retrieval (IR) researchers get together to develop a future for non-mainstream ideas and research agendas in IR. The first iteration of this vision materialized in the form of a three day workshop in Portsmouth, New Hampshire attended by 24 researchers across academia and industry. Attendees pre-submitted one or more topics that they want to pitch at the meeting. Then over the three days during the workshop, we self-organized into groups and worked on six specific proposals of common interest. In this report, we present an overview of the workshop and brief summaries of the six proposals that resulted from the workshop. △ Less

Submitted 20 December, 2019; originally announced December 2019.

arXiv:1805.01788 [pdf, other]

doi 10.1145/3209978.3210063

Equity of Attention: Amortizing Individual Fairness in Rankings

Authors: Asia J. Biega, Krishna P. Gummadi, Gerhard Weikum

Abstract: Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mech… ▽ More Rankings of people and items are at the heart of selection-making, match-making, and recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence the amount of attention the ranked subjects receive, biases in rankings can lead to unfair distribution of opportunities and resources, such as jobs or income. This paper proposes new measures and mechanisms to quantify and mitigate unfairness from a bias inherent to all rankings, namely, the position bias, which leads to disproportionately less attention being paid to low-ranked subjects. Our approach differs from recent fair ranking approaches in two important ways. First, existing works measure unfairness at the level of subject groups while our measures capture unfairness at the level of individual subjects, and as such subsume group unfairness. Second, as no single ranking can achieve individual attention fairness, we propose a novel mechanism that achieves amortized fairness, where attention accumulated across a series of rankings is proportional to accumulated relevance. We formulate the challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem and show that it can be solved as an integer linear program. Our experimental evaluation reveals that unfair attention distribution in rankings can be substantial, and demonstrates that our method can improve individual fairness while retaining high ranking quality. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: Accepted to SIGIR 2018

Showing 1–19 of 19 results for author: Biega, A