Skip to main content

Showing 1–28 of 28 results for author: Narasimhan, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00060  [pdf, other

    cs.CL cs.LG

    Cascade-Aware Training of Language Models

    Authors: Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

    Abstract: Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 22 pages, 13 figures

  2. arXiv:2405.19261  [pdf, other

    cs.CL cs.AI cs.LG

    Faster Cascades via Speculative Decoding

    Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2404.10136  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model Cascades: Token-level uncertainty and beyond

    Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  4. arXiv:2403.04182  [pdf, other

    cs.CL cs.AI

    Metric-aware LLM inference for regression and scoring

    Authors: Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose… ▽ More

    Submitted 4 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 15 pages

  5. arXiv:2309.08825  [pdf, other

    cs.LG cs.AI

    Distributionally Robust Post-hoc Classifiers under Prior Shifts

    Authors: Jiaheng Wei, Harikrishna Narasimhan, Ehsan Amid, Wen-Sheng Chu, Yang Liu, Abhishek Kumar

    Abstract: The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Camera ready version, accepted at ICLR 2023

  6. arXiv:2307.02764  [pdf, other

    cs.LG stat.ML

    When Does Confidence-Based Cascade Deferral Suffice?

    Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

    Abstract: Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  7. arXiv:2301.12386  [pdf, other

    cs.LG

    Plugin estimators for selective classification with out-of-distribution detection

    Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

    Abstract: Real-world classifiers can benefit from the option of abstaining from predicting on samples where they have low confidence. Such abstention is particularly useful on samples which are close to the learned decision boundary, or which are outliers with respect to the training sample. These settings have been the subject of extensive but disjoint study in the selective classification (SC) and out-of-… ▽ More

    Submitted 24 July, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

  8. arXiv:2210.09695  [pdf, other

    stat.ML cs.LG

    Consistent Multiclass Algorithms for Complex Metrics and Constraints

    Authors: Harikrishna Narasimhan, Harish G. Ramaswamy, Shiv Kumar Tavker, Drona Khurana, Praneeth Netrapalli, Shivani Agarwal

    Abstract: We present consistent algorithms for multiclass learning with complex performance metrics and constraints, where the objective and constraints are defined by arbitrary functions of the confusion matrix. This setting includes many common performance metrics such as the multiclass G-mean and micro F1-measure, and constraints such as those on the classifier's precision and recall and more recent meas… ▽ More

    Submitted 18 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

  9. arXiv:2206.06479  [pdf, other

    cs.LG

    Robust Distillation for Worst-class Performance

    Authors: Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

    Abstract: Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  10. arXiv:2107.10960  [pdf, other

    cs.LG stat.ML

    Implicit Rate-Constrained Optimization of Non-decomposable Objectives

    Authors: Abhishek Kumar, Harikrishna Narasimhan, Andrew Cotter

    Abstract: We consider a popular family of constrained optimization problems arising in machine learning that involve optimizing a non-decomposable evaluation metric with a certain thresholded form, while constraining another metric of interest. Examples of such problems include optimizing the false negative rate at a fixed false positive rate, optimizing precision at a fixed recall, optimizing the area unde… ▽ More

    Submitted 28 July, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: ICML 2021; Code available at https://github.com/google-research/google-research/tree/master/implicit_constrained_optimization

  11. arXiv:2107.04641  [pdf, other

    cs.LG cs.AI

    Training Over-parameterized Models with Non-decomposable Objectives

    Authors: Harikrishna Narasimhan, Aditya Krishna Menon

    Abstract: Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the t… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  12. arXiv:2106.02654  [pdf, other

    cs.LG cs.AI stat.ML

    Churn Reduction via Distillation

    Authors: Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh

    Abstract: In real-world systems, models are frequently updated as more data becomes available, and in addition to achieving high accuracy, the goal is to also maintain a low difference in predictions compared to the base model (i.e. predictive "churn"). If model retraining results in vastly different behavior, then it could cause negative effects in downstream systems, especially if this churn can be avoide… ▽ More

    Submitted 14 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Journal ref: ICLR 2022

  13. arXiv:2102.09492  [pdf, other

    cs.LG stat.ML

    Optimizing Black-box Metrics with Iterative Example Weighting

    Authors: Gaurush Hiranandani, Jatin Mathur, Harikrishna Narasimhan, Mahdi Milani Fard, Oluwasanmi Koyejo

    Abstract: We consider learning to optimize a classification metric defined by a black-box function of the confusion matrix. Such black-box learning settings are ubiquitous, for example, when the learner only has query access to the metric of interest, or in noisy-label and domain adaptation applications where the learner must evaluate the metric via performance evaluation using a small validation sample. Ou… ▽ More

    Submitted 23 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: The paper to appear at ICML 2021. This version includes the camera-ready edits. 42 pages, 2 figures, and 7 tables

  14. arXiv:2102.06849  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Double Descent

    Authors: Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

    Abstract: Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that,… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  15. arXiv:2011.01516  [pdf, other

    stat.ML cs.LG

    Quadratic Metric Elicitation for Fairness and Beyond

    Authors: Gaurush Hiranandani, Jatin Mathur, Harikrishna Narasimhan, Oluwasanmi Koyejo

    Abstract: Metric elicitation is a recent framework for eliciting classification performance metrics that best reflect implicit user preferences based on the task and context. However, available elicitation strategies have been limited to linear (or quasi-linear) functions of predictive rates, which can be practically restrictive for many applications including fairness. This paper develops a strategy for el… ▽ More

    Submitted 21 August, 2022; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: The paper to appear at UAI 2022. This version includes the camera-ready edits. Paper 48 pages, 11 figures, and 5 tables

  16. arXiv:2006.12732  [pdf, other

    stat.ML cs.LG

    Fair Performance Metric Elicitation

    Authors: Gaurush Hiranandani, Harikrishna Narasimhan, Oluwasanmi Koyejo

    Abstract: What is a fair performance metric? We consider the choice of fairness metrics through the lens of metric elicitation -- a principled framework for selecting performance metrics that best reflect implicit preferences. The use of metric elicitation enables a practitioner to tune the performance and fairness metrics to the task, context, and population at hand. Specifically, we propose a novel strate… ▽ More

    Submitted 3 November, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: The paper to appear at NeurIPS 2020. This version includes the camera-ready edits. 31 pages, 6 figures, and 2 tables

  17. arXiv:2002.09343  [pdf, ps, other

    cs.LG stat.ML

    Robust Optimization for Fairness with Noisy Protected Groups

    Authors: Serena Wang, Wenshuo Guo, Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Michael I. Jordan

    Abstract: Many existing fairness criteria for machine learning involve equalizing some metric across protected groups such as race or gender. However, practitioners trying to audit or enforce such group-based criteria can easily face the problem of noisy or biased protected group information. First, we study the consequences of naively relying on noisy protected group labels: we provide an upper bound on th… ▽ More

    Submitted 10 November, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: To appear at 34th Conference on Neural Information Processing Systems (NeurIPS 2020); first two authors contributed equally to this work

  18. arXiv:2002.08605  [pdf, other

    cs.LG cs.AI stat.ML

    Optimizing Black-box Metrics with Adaptive Surrogates

    Authors: Qijia Jiang, Olaoluwa Adigun, Harikrishna Narasimhan, Mahdi Milani Fard, Maya Gupta

    Abstract: We address the problem of training models with black-box and hard-to-optimize metrics by expressing the metric as a monotonic function of a small number of easy-to-optimize surrogates. We pose the training problem as an optimization over a relaxed surrogate space, which we solve by estimating local gradients for the metric and performing inexact convex projections. We analyze gradient estimates ba… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

  19. arXiv:1909.02939  [pdf, other

    cs.LG cs.GT stat.ML

    Optimizing Generalized Rate Metrics through Game Equilibrium

    Authors: Harikrishna Narasimhan, Andrew Cotter, Maya Gupta

    Abstract: We present a general framework for solving a large class of learning problems with non-linear functions of classification rates. This includes problems where one wishes to optimize a non-decomposable performance metric such as the F-measure or G-mean, and constrained training problems where the classifier needs to satisfy non-linear rate constraints such as predictive parity fairness, distribution… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  20. arXiv:1906.05330  [pdf, other

    cs.LG stat.ML

    Pairwise Fairness for Ranking and Regression

    Authors: Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Serena Wang

    Abstract: We present pairwise fairness metrics for ranking models and regression models that form analogues of statistical fairness notions such as equal opportunity, equal accuracy, and statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using existin… ▽ More

    Submitted 7 January, 2020; v1 submitted 12 June, 2019; originally announced June 2019.

  21. arXiv:1805.10582  [pdf, other

    stat.ML cs.AI cs.LG

    Metric-Optimized Example Weights

    Authors: Sen Zhao, Mahdi Milani Fard, Harikrishna Narasimhan, Maya Gupta

    Abstract: Real-world machine learning applications often have complex test metrics, and may have training and test data that are not identically distributed. Motivated by known connections between complex test metrics and cost-weighted learning, we propose addressing these issues by using a weighted loss function with a standard loss, where the weights on the training examples are learned to optimize the te… ▽ More

    Submitted 15 June, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: Proceedings of the 36th International Conference on Machine Learning (ICML'19)

  22. arXiv:1706.03459  [pdf, other

    cs.GT cs.AI cs.LG

    Optimal Auctions through Deep Learning: Advances in Differentiable Economics

    Authors: Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David C. Parkes, Sai Srivatsa Ravindranath

    Abstract: Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981, but more than 40 years later a full analytical understanding of the optimal design still remains elusive for settings with two or more items. In this work, we initiate the exploration of the use of tools from deep learning… ▽ More

    Submitted 14 October, 2022; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: An extended abstract appeared in ICML'19, along with a short Research Highlight in the Communications of the ACM

  23. arXiv:1605.04337  [pdf, other

    cs.LG stat.ML

    Support Vector Algorithms for Optimizing the Partial Area Under the ROC Curve

    Authors: Harikrishna Narasimhan, Shivani Agarwal

    Abstract: The area under the ROC curve (AUC) is a widely used performance measure in machine learning. Increasingly, however, in several applications, ranging from ranking to biometric screening to medicine, performance is measured not in terms of the full area under the ROC curve, but in terms of the \emph{partial} area under the ROC curve between two false positive rates. In this paper, we develop support… ▽ More

    Submitted 13 May, 2016; originally announced May 2016.

  24. arXiv:1605.04135  [pdf, other

    stat.ML cs.AI cs.IR cs.LG

    Online Optimization Methods for the Quantification Problem

    Authors: Purushottam Kar, Shuai Li, Harikrishna Narasimhan, Sanjay Chawla, Fabrizio Sebastiani

    Abstract: The estimation of class prevalence, i.e., the fraction of a population that belongs to a certain class, is a very useful tool in data analytics and learning, and finds applications in many domains such as sentiment analysis, epidemiology, etc. For example, in sentiment analysis, the objective is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather es… ▽ More

    Submitted 13 June, 2016; v1 submitted 13 May, 2016; originally announced May 2016.

    Comments: 26 pages, 6 figures. A short version of this manuscript will appear in the proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2016

    Journal ref: Final version published in Proceedings of the 22nd ACM Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, US, 2016, pp. 1625-1634

  25. arXiv:1505.06813  [pdf, other

    stat.ML cs.LG

    Surrogate Functions for Maximizing Precision at the Top

    Authors: Purushottam Kar, Harikrishna Narasimhan, Prateek Jain

    Abstract: The problem of maximizing precision at the top of a ranked list, often dubbed Precision@k (prec@k), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of thes… ▽ More

    Submitted 26 May, 2015; originally announced May 2015.

    Comments: To appear in the the proceedings of the 32nd International Conference on Machine Learning (ICML 2015)

    Journal ref: Journal of Machine Learning Research, W&CP 37 (2015)

  26. arXiv:1505.06812  [pdf, other

    stat.ML cs.LG

    Optimizing Non-decomposable Performance Measures: A Tale of Two Classes

    Authors: Harikrishna Narasimhan, Purushottam Kar, Prateek Jain

    Abstract: Modern classification problems frequently present mild to severe label imbalance as well as specific requirements on classification characteristics, and require optimizing performance measures that are non-decomposable over the dataset, such as F-measure. Such measures have spurred much interest and pose specific challenges to learning algorithms since their non-additive nature precludes a direct… ▽ More

    Submitted 26 May, 2015; originally announced May 2015.

    Comments: To appear in proceedings of the 32nd International Conference on Machine Learning (ICML 2015)

    Journal ref: Journal of Machine Learning Research, W&CP 37 (2015)

  27. arXiv:1501.00287  [pdf, ps, other

    cs.LG stat.ML

    Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

    Authors: Harish G. Ramaswamy, Harikrishna Narasimhan, Shivani Agarwal

    Abstract: We study consistency of learning algorithms for a multi-class performance metric that is a non-decomposable function of the confusion matrix of a classifier and cannot be expressed as a sum of losses on individual data points; examples of such performance metrics include the macro F-measure popular in information retrieval and the G-mean metric used in class-imbalanced problems. While there has be… ▽ More

    Submitted 1 January, 2015; originally announced January 2015.

  28. arXiv:1410.6776  [pdf, other

    cs.LG stat.ML

    Online and Stochastic Gradient Methods for Non-decomposable Loss Functions

    Authors: Purushottam Kar, Harikrishna Narasimhan, Prateek Jain

    Abstract: Modern applications in sensitive domains such as biometrics and medicine frequently require the use of non-decomposable loss functions such as precision@k, F-measure etc. Compared to point loss functions such as hinge-loss, these offer much more fine grained control over prediction, but at the same time present novel challenges in terms of algorithm design and analysis. In this work we initiate a… ▽ More

    Submitted 24 October, 2014; originally announced October 2014.

    Comments: 25 pages, 3 figures, To appear in the proceedings of the 28th Annual Conference on Neural Information Processing Systems, NIPS 2014