-
LongLaMP: A Benchmark for Personalized Long-form Text Generation
Authors:
Ishita Kumar,
Snigdha Viswanathan,
Sushrita Yerra,
Alireza Salemi,
Ryan A. Rossi,
Franck Dernoncourt,
Hanieh Deilamsalehy,
Xiang Chen,
Ruiyi Zhang,
Shubham Agarwal,
Nedim Lipka,
Hamed Zamani
Abstract:
Long-text generation is seemingly ubiquitous in real-world applications of large language models such as generating an email or writing a review. Despite the fundamental importance and prevalence of long-text generation in many practical applications, existing work on personalized generation has focused on the generation of very short text. To overcome these limitations, we study the problem of pe…
▽ More
Long-text generation is seemingly ubiquitous in real-world applications of large language models such as generating an email or writing a review. Despite the fundamental importance and prevalence of long-text generation in many practical applications, existing work on personalized generation has focused on the generation of very short text. To overcome these limitations, we study the problem of personalized long-text generation, that is, generating long-text that is personalized for a specific user while being practically useful for the vast majority of real-world applications that naturally require the generation of longer text. In this work, we demonstrate the importance of user-specific personalization for long-text generation tasks and develop the Long-text Language Model Personalization (LongLaMP) Benchmark. LongLaMP provides a comprehensive and diverse evaluation framework for personalized long-text generation. Extensive experiments on LongLaMP for zero-shot and fine-tuned language tasks demonstrate the effectiveness of the proposed benchmark and its utility for developing and evaluating techniques for personalized long-text generation across a wide variety of long-text generation tasks. The results highlight the importance of personalization across a wide variety of long-text generation tasks. Finally, we release the benchmark for others to use for this important problem.
△ Less
Submitted 26 June, 2024;
originally announced July 2024.
-
To Pool or Not To Pool: Analyzing the Regularizing Effects of Group-Fair Training on Shared Models
Authors:
Cyrus Cousins,
I. Elizabeth Kumar,
Suresh Venkatasubramanian
Abstract:
In fair machine learning, one source of performance disparities between groups is over-fitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis cl…
▽ More
In fair machine learning, one source of performance disparities between groups is over-fitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis class, which contains the family of models likely to perform well with respect to a fair learning objective (e.g., a power-mean). Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Coarse-Grained Configurational Polymer Fingerprints for Property Prediction using Machine Learning
Authors:
Ishan Kumar,
Prateek K Jha
Abstract:
In this work, we present a method to generate a configurational level fingerprint for polymers using the Bead-Spring-Model. Unlike some of the previous fingerprinting approaches that employ monomer-level information where atomistic descriptors are computed using quantum chemistry calculations, this approach incorporates configurational information from a coarse-grained model of a long polymer chai…
▽ More
In this work, we present a method to generate a configurational level fingerprint for polymers using the Bead-Spring-Model. Unlike some of the previous fingerprinting approaches that employ monomer-level information where atomistic descriptors are computed using quantum chemistry calculations, this approach incorporates configurational information from a coarse-grained model of a long polymer chain. The proposed approach may be advantageous for the study of behavior resulting from large molecular weights. To create this fingerprint, we make use of two kinds of descriptors. First, we calculate certain geometric descriptors like Re2, Rg2 etc. and label them as Calculated Descriptors. Second, we generate a set of data-driven descriptors using an unsupervised autoencoder model and call them Learnt Descriptors. Using a combination of both of them, we are able to learn mappings from the structure to various properties of the polymer chain by training ML models. We test our fingerprint to predict the probability of occurrence of a configuration at equilibrium, which is approximated by a simple linear relationship between the instantaneous internal energy and equilibrium average internal energy.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
CausalCite: A Causal Formulation of Paper Citations
Authors:
Ishan Kumar,
Zhijing Jin,
Ehsan Mokhtarian,
Siyuan Guo,
Yuen Chen,
Mrinmaya Sachan,
Bernhard Schölkopf
Abstract:
Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a…
▽ More
Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from large language models (LLMs), extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.
△ Less
Submitted 27 May, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
High-Fidelity Audio Compression with Improved RVQGAN
Authors:
Rithesh Kumar,
Prem Seetharaman,
Alejandro Luebs,
Ishaan Kumar,
Kundan Kumar
Abstract:
Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz…
▽ More
Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them significantly. We provide thorough ablations for every design choice, as well as open-source code and trained model weights. We hope our work can lay the foundation for the next generation of high-fidelity audio modeling.
△ Less
Submitted 26 October, 2023; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Hunter: Using Change Point Detection to Hunt for Performance Regressions
Authors:
Matt Fleming,
Piotr Kołaczkowski,
Ishita Kumar,
Shaunak Das,
Sean McCarthy,
Pushkala Pattabhiraman,
Henrik Ingo
Abstract:
Change point detection has recently gained popularity as a method of detecting performance changes in software due to its ability to cope with noisy data. In this paper we present Hunter, an open source tool that automatically detects performance regressions and improvements in time-series data. Hunter uses a modified E-divisive means algorithm to identify statistically significant changes in norm…
▽ More
Change point detection has recently gained popularity as a method of detecting performance changes in software due to its ability to cope with noisy data. In this paper we present Hunter, an open source tool that automatically detects performance regressions and improvements in time-series data. Hunter uses a modified E-divisive means algorithm to identify statistically significant changes in normally-distributed performance metrics. We describe the changes we made to the E-divisive means algorithm along with their motivation. The main change we adopted was to replace the significance test using randomized permutations with a Student's t-test, as we discovered that the randomized approach did not produce deterministic results, at least not with a reasonable number of iterations. In addition we've made tweaks that allow us to find change points the original algorithm would not, such as two nearby changes. For evaluation, we developed a method to generate real timeseries, but with artificially injected changes in latency. We used these data sets to compare Hunter against two other well known algorithms, PELT and DYNP. Finally, we conclude with lessons we've learned supporting Hunter across teams with individual responsibility for the performance of their project.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with U.S. Fair Lending Regulation
Authors:
I. Elizabeth Kumar,
Keegan E. Hines,
John P. Dickerson
Abstract:
Credit is an essential component of financial wellbeing in America, and unequal access to it is a large factor in the economic disparities between demographic groups that exist today. Today, machine learning algorithms, sometimes trained on alternative data, are increasingly being used to determine access to credit, yet research has shown that machine learning can encode many different versions of…
▽ More
Credit is an essential component of financial wellbeing in America, and unequal access to it is a large factor in the economic disparities between demographic groups that exist today. Today, machine learning algorithms, sometimes trained on alternative data, are increasingly being used to determine access to credit, yet research has shown that machine learning can encode many different versions of "unfairness," thus raising the concern that banks and other financial institutions could -- potentially unwittingly -- engage in illegal discrimination through the use of this technology. In the US, there are laws in place to make sure discrimination does not happen in lending and agencies charged with enforcing them. However, conversations around fair credit models in computer science and in policy are often misaligned: fair machine learning research often lacks legal and practical considerations specific to existing fair lending policy, and regulators have yet to issue new guidance on how, if at all, credit risk models should be utilizing practices and techniques from the research community. This paper aims to better align these sides of the conversation. We describe the current state of credit discrimination regulation in the United States, contextualize results from fair ML research to identify the specific fairness concerns raised by the use of machine learning in lending, and discuss regulatory opportunities to address these concerns.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
The Fallacy of AI Functionality
Authors:
Inioluwa Deborah Raji,
I. Elizabeth Kumar,
Aaron Horowitz,
Andrew D. Selbst
Abstract:
Deployed AI systems often do not work. They can be constructed haphazardly, deployed indiscriminately, and promoted deceptively. However, despite this reality, scholars, the press, and policymakers pay too little attention to functionality. This leads to technical and policy solutions focused on "ethical" or value-aligned deployments, often skipping over the prior question of whether a given syste…
▽ More
Deployed AI systems often do not work. They can be constructed haphazardly, deployed indiscriminately, and promoted deceptively. However, despite this reality, scholars, the press, and policymakers pay too little attention to functionality. This leads to technical and policy solutions focused on "ethical" or value-aligned deployments, often skipping over the prior question of whether a given system functions, or provides any benefits at all. To describe the harms of various types of functionality failures, we analyze a set of case studies to create a taxonomy of known AI functionality issues. We then point to policy and organizational responses that are often overlooked and become more readily available once functionality is drawn into focus. We argue that functionality is a meaningful AI policy challenge, operating as a necessary first step towards protecting affected communities from algorithmic harm.
△ Less
Submitted 1 July, 2022; v1 submitted 19 June, 2022;
originally announced June 2022.
-
Epistemic values in feature importance methods: Lessons from feminist epistemology
Authors:
Leif Hancox-Li,
I. Elizabeth Kumar
Abstract:
As the public seeks greater accountability and transparency from machine learning algorithms, the research literature on methods to explain algorithms and their outputs has rapidly expanded. Feature importance methods form a popular class of explanation methods. In this paper, we apply the lens of feminist epistemology to recent feature importance research. We investigate what epistemic values are…
▽ More
As the public seeks greater accountability and transparency from machine learning algorithms, the research literature on methods to explain algorithms and their outputs has rapidly expanded. Feature importance methods form a popular class of explanation methods. In this paper, we apply the lens of feminist epistemology to recent feature importance research. We investigate what epistemic values are implicitly embedded in feature importance methods and how or whether they are in conflict with feminist epistemology. We offer some suggestions on how to conduct research on explanations that respects feminist epistemic values, taking into account the importance of social context, the epistemic privileges of subjugated knowers, and adopting more interactional ways of knowing.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
Problems with Shapley-value-based explanations as feature importance measures
Authors:
I. Elizabeth Kumar,
Suresh Venkatasubramanian,
Carlos Scheidegger,
Sorelle Friedler
Abstract:
Game-theoretic formulations of feature importance have become popular as a way to "explain" machine learning models. These methods define a cooperative game between the features of a model and distribute influence among these input elements using some form of the game's unique Shapley values. Justification for these methods rests on two pillars: their desirable mathematical properties, and their a…
▽ More
Game-theoretic formulations of feature importance have become popular as a way to "explain" machine learning models. These methods define a cooperative game between the features of a model and distribute influence among these input elements using some form of the game's unique Shapley values. Justification for these methods rests on two pillars: their desirable mathematical properties, and their applicability to specific motivations for explanations. We show that mathematical problems arise when Shapley values are used for feature importance and that the solutions to mitigate these necessarily induce further complexity, such as the need for causal reasoning. We also draw on additional literature to argue that Shapley values do not provide explanations which suit human-centric goals of explainability.
△ Less
Submitted 30 June, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.