Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Lum, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  2. arXiv:2406.11757  [pdf, other

    cs.AI cs.CL cs.CY cs.HC

    STAR: SocioTechnical Approach to Red Teaming Language Models

    Authors: Laura Weidinger, John Mellor, Bernat Guillen Pegueroles, Nahema Marchal, Ravin Kumar, Kristian Lum, Canfer Akbulut, Mark Diaz, Stevie Bergman, Mikel Rodriguez, Verena Rieser, William Isaac

    Abstract: This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. STAR makes two key contributions: it enhances steerability by generating parameterised instructions for human red teamers, leading to improved coverage of the risk surface. Parameterised instructions also provide more detailed insights into model failur… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, 5 pages appendix. * denotes equal contribution

  3. arXiv:2406.03198  [pdf, other

    cs.CL cs.HC cs.LG stat.AP stat.ML

    The Impossibility of Fair LLMs

    Authors: Jacy Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Alexander D'Amour, Chenhao Tan

    Abstract: The need for fair AI is increasingly clear in the era of general-purpose systems such as ChatGPT, Gemini, and other large language models (LLMs). However, the increasing complexity of human-AI interaction and its social impacts have raised questions of how fairness standards could be applied. Here, we review the technical frameworks that machine learning researchers have used to evaluate fairness,… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

    Comments: Presented at the 1st Human-Centered Evaluation and Auditing of Language Models (HEAL) workshop at CHI 2024

  4. arXiv:2402.12649  [pdf, other

    cs.CL stat.AP

    Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation

    Authors: Kristian Lum, Jacy Reese Anthis, Chirag Nagpal, Alexander D'Amour

    Abstract: Bias benchmarks are a popular method for studying the negative impacts of bias in LLMs, yet there has been little empirical investigation of whether these benchmarks are actually indicative of how real world harm may manifest in the real world. In this work, we study the correspondence between such decontextualized "trick tests" and evaluations that are more grounded in Realistic Use and Tangible… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  5. arXiv:2211.08667  [pdf, other

    cs.SI

    County-level Algorithmic Audit of Racial Bias in Twitter's Home Timeline

    Authors: Luca Belli, Kyra Yee, Uthaipon Tantipongpipat, Aaron Gonzales, Kristian Lum, Moritz Hardt

    Abstract: We report on the outcome of an audit of Twitter's Home Timeline ranking system. The goal of the audit was to determine if authors from some racial groups experience systematically higher impression counts for their Tweets than others. A central obstacle for any such audit is that Twitter does not ordinarily collect or associate racial information with its users, thus prohibiting an analysis at the… ▽ More

    Submitted 10 February, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  6. arXiv:2209.05000  [pdf, other

    cs.IR cs.SI

    Random Isn't Always Fair: Candidate Set Imbalance and Exposure Inequality in Recommender Systems

    Authors: Amanda Bower, Kristian Lum, Tomo Lazovich, Kyra Yee, Luca Belli

    Abstract: Traditionally, recommender systems operate by returning a user a set of items, ranked in order of estimated relevance to that user. In recent years, methods relying on stochastic ordering have been developed to create "fairer" rankings that reduce inequality in who or what is shown to users. Complete randomization -- ordering candidate items randomly, independent of estimated relevance -- is large… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: 12 pages

  7. arXiv:2205.14867  [pdf, other

    cs.CY cs.LG

    Measuring and mitigating voting access disparities: a study of race and polling locations in Florida and North Carolina

    Authors: Mohsen Abbasi, Suresh Venkatasubramanian, Sorelle A. Friedler, Kristian Lum, Calvin Barrett

    Abstract: Voter suppression and associated racial disparities in access to voting are long-standing civil rights concerns in the United States. Barriers to voting have taken many forms over the decades. A history of violent explicit discouragement has shifted to more subtle access limitations that can include long lines and wait times, long travel times to reach a polling station, and other logistical barri… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  8. Flipping the Script on Criminal Justice Risk Assessment: An actuarial model for assessing the risk the federal sentencing system poses to defendants

    Authors: Mikaela Meyer, Aaron Horowitz, Erica Marshall, Kristian Lum

    Abstract: In the criminal justice system, algorithmic risk assessment instruments are used to predict the risk a defendant poses to society; examples include the risk of recidivating or the risk of failing to appear at future court dates. However, defendants are also at risk of harm from the criminal justice system. To date, there exists no risk assessment instrument that considers the risk the system poses… ▽ More

    Submitted 13 July, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Conference on Fairness, Accountability, and Transparency (FAccT 2022)

  9. De-biasing "bias" measurement

    Authors: Kristian Lum, Yunfeng Zhang, Amanda Bower

    Abstract: When a model's performance differs across socially or culturally relevant groups--like race, gender, or the intersections of many such groups--it is often called "biased." While much of the work in algorithmic fairness over the last several years has focused on developing various definitions of model fairness (the absence of group-wise model performance disparities) and eliminating such "bias," mu… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Journal ref: 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22), June 21--24, 2022, Seoul, Republic of Korea

  10. Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics

    Authors: Tomo Lazovich, Luca Belli, Aaron Gonzales, Amanda Bower, Uthaipon Tantipongpipat, Kristian Lum, Ferenc Huszar, Rumman Chowdhury

    Abstract: The harmful impacts of algorithmic decision systems have recently come into focus, with many examples of systems such as machine learning (ML) models amplifying existing societal biases. Most metrics attempting to quantify disparities resulting from ML algorithms focus on differences between groups, dividing users based on demographic identities and comparing model performance or overall outcomes… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: 11 pages, 7 figures

  11. arXiv:2109.08245  [pdf, other

    cs.SI

    The 2021 RecSys Challenge Dataset: Fairness is not optional

    Authors: Luca Belli, Alykhan Tejani, Frank Portman, Alexandre Lung-Yut-Fong, Ben Chamberlain, Yuanpu Xie, Kristian Lum, Jonathan Hunt, Michael Bronstein, Vito Walter Anelli, Saikishore Kalloori, Bruce Ferwerda, Wenzhe Shi

    Abstract: After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dat… ▽ More

    Submitted 21 September, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

  12. arXiv:2106.05498  [pdf, ps, other

    cs.CY

    It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

    Authors: Michelle Bao, Angela Zhou, Samantha Zottola, Brian Brubach, Sarah Desmarais, Aaron Horowitz, Kristian Lum, Suresh Venkatasubramanian

    Abstract: Risk assessment instrument (RAI) datasets, particularly ProPublica's COMPAS dataset, are commonly used in algorithmic fairness papers due to benchmarking practices of comparing algorithms on datasets used in prior work. In many cases, this data is used as a benchmark to demonstrate good performance without accounting for the complexities of criminal justice (CJ) processes. However, we show that pr… ▽ More

    Submitted 28 April, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Datasets and Benchmarks

  13. arXiv:1610.08077  [pdf, other

    stat.ML cs.LG

    A statistical framework for fair predictive algorithms

    Authors: Kristian Lum, James Johndrow

    Abstract: Predictive modeling is increasingly being employed to assist human decision-makers. One purported advantage of replacing human judgment with computer models in high stakes settings-- such as sentencing, hiring, policing, college admissions, and parole decisions-- is the perceived "neutrality" of computers. It is argued that because computer models do not hold personal prejudice, the predictions th… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.