Skip to main content

Showing 1–50 of 87 results for author: Ravikumar, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02694  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    LLM-Select: Feature Selection with Large Language Models

    Authors: Daniel P. Jeong, Zachary C. Lipton, Pradeep Ravikumar

    Abstract: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Preprint

  2. arXiv:2406.18400  [pdf, other

    cs.CL cs.LG stat.ML

    Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

    Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

    Abstract: Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2403.03867  [pdf, other

    cs.CL cs.LG stat.ML

    On the Origins of Linear Representations in Large Language Models

    Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam, Victor Veitch

    Abstract: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction ob… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2402.09236  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

    Authors: Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

    Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 36 pages

  5. arXiv:2402.00645  [pdf, other

    stat.ML cs.LG

    Spectrally Transformed Kernel Regression

    Authors: Runtian Zhai, Rattana Pukdee, Roger Jin, Maria-Florina Balcan, Pradeep Ravikumar

    Abstract: Unlabeled data is a key component of modern machine learning. In general, the role of unlabeled data is to impose a form of smoothness, usually from the similarity information encoded in a base kernel, such as the $ε$-neighbor kernel or the adjacency matrix of a graph. This work revisits the classical idea of spectrally transformed kernel regression (STKR), and provides a new class of general and… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: ICLR 2024 spotlight. 36 pages

  6. arXiv:2311.18048  [pdf, other

    cs.LG cs.CE eess.SY stat.ME

    An Interventional Perspective on Identifiability in Gaussian LTI Systems with Independent Component Analysis

    Authors: Goutham Rajendran, Patrik Reizinger, Wieland Brendel, Pradeep Ravikumar

    Abstract: We investigate the relationship between system identification and intervention design in dynamical systems. While previous research demonstrated how identifiable representation learning methods, such as Independent Component Analysis (ICA), can reveal cause-effect relationships, it relied on a passive perspective without considering how to collect data. Our work shows that in Gaussian Linear Time-… ▽ More

    Submitted 16 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CLeaR2024 camera ready. Code available at https://github.com/rpatrik96/lti-ica

  7. arXiv:2310.18832  [pdf, other

    cs.AI

    Responsible AI (RAI) Games and Ensembles

    Authors: Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar

    Abstract: Several recent works have studied the societal effects of AI; these include issues such as fairness, robustness, and safety. In many of these objectives, a learner seeks to minimize its worst-case loss over a set of predefined distributions (known as uncertainty sets), with usual examples being perturbed versions of the empirical distribution. In other words, aforementioned problems can be written… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  8. arXiv:2310.18526  [pdf, other

    cs.LG cs.AI

    Sample based Explanations via Generalized Representers

    Authors: Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar

    Abstract: We propose a general class of sample based explanations of machine learning models, which we term generalized representers. To measure the effect of a training sample on a model's test prediction, generalized representers use two components: a global sample importance that quantifies the importance of the training point to the model and is invariant to test samples, and a local sample importance t… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted by Neurips 2023

  9. arXiv:2310.04295  [pdf, other

    cs.LG cs.AI stat.ML

    Identifying Representations for Intervention Extrapolation

    Authors: Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas Pfister, Jonas Peters

    Abstract: The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: p… ▽ More

    Submitted 5 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted at the International Conference on Learning Representations (ICLR) 2024

  10. arXiv:2306.17378  [pdf, other

    cs.LG stat.ML

    Global Optimality in Bivariate Gradient-based DAG Learning

    Authors: Chang Deng, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

    Abstract: Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other no… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 39 pages, 13 figures

  11. arXiv:2306.17361  [pdf, other

    cs.LG cs.AI stat.AP stat.ME stat.ML

    iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

    Authors: Tianyu Chen, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

    Abstract: Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the underlying causal structure is often unknown, and estimating it from data remains a challenging task. In many situations, however, the end goal is to localize the changes (shifts) in the causal mechanisms between related datasets instead of… ▽ More

    Submitted 12 January, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 36 pages, 18 figures. Published at NeurIPS 2023

  12. arXiv:2306.02235  [pdf, other

    cs.LG cs.AI math.ST stat.ME stat.ML

    Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

    Authors: Simon Buchholz, Goutham Rajendran, Elan Rosenfeld, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

    Abstract: We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown single-node interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker cl… ▽ More

    Submitted 18 December, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted as Oral paper at NeurIPS 2023

  13. arXiv:2306.00788  [pdf, other

    cs.LG stat.ML

    Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

    Authors: Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar

    Abstract: Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, su… ▽ More

    Submitted 18 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 spotlight. 34 pages

  14. arXiv:2305.20002  [pdf, other

    cs.LG

    Representer Point Selection for Explaining Regularized High-dimensional Models

    Authors: Che-Ping Tsai, Jiong Zhang, Eli Chien, Hsiang-Fu Yu, Cho-Jui Hsieh, Pradeep Ravikumar

    Abstract: We introduce a novel class of sample-based explanations we term high-dimensional representers, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples. Our workhorse is a novel representer theorem for general regularized high-dimensional models, which decomposes the model prediction in terms of contribution… ▽ More

    Submitted 30 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  15. arXiv:2305.17277  [pdf, other

    stat.ML cs.LG

    Optimizing NOTEARS Objectives via Topological Swaps

    Authors: Chang Deng, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

    Abstract: Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimization challenges associated with this class of non-convex prog… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 39 pages, 12 figures, ICML 2023

  16. arXiv:2303.14496  [pdf, other

    cs.LG cs.AI stat.ML

    Learning with Explanation Constraints

    Authors: Rattana Pukdee, Dylan Sam, J. Zico Kolter, Maria-Florina Balcan, Pradeep Ravikumar

    Abstract: As larger deep learning models are hard to interpret, there has been a recent focus on generating explanations of these black-box models. In contrast, we may have apriori explanations of how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of ou… ▽ More

    Submitted 22 December, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2023

  17. arXiv:2302.08015  [pdf, other

    cs.LG cs.AI cs.CY

    Individual Fairness under Uncertainty

    Authors: Wenbin Zhang, Zichong Wang, Juyong Kim, Cheng Cheng, Thomas Oommen, Pradeep Ravikumar, Jeremy Weiss

    Abstract: Algorithmic fairness, the research field of making machine learning (ML) algorithms fair, is an established area in ML. As ML technologies expand their application domains, including ones with high societal impact, it becomes essential to take fairness into consideration during the building of ML systems. Yet, despite its wide range of socially sensitive applications, most work treats the issue of… ▽ More

    Submitted 11 December, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

  18. arXiv:2210.12606  [pdf, other

    cs.LG cs.GT

    Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games

    Authors: Maria-Florina Balcan, Rattana Pukdee, Pradeep Ravikumar, Hongyang Zhang

    Abstract: Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not conv… ▽ More

    Submitted 27 February, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: AISTATS 2023

  19. arXiv:2210.03594  [pdf, other

    cs.LG stat.ML

    Label Propagation with Weak Supervision

    Authors: Rattana Pukdee, Dylan Sam, Maria-Florina Balcan, Pradeep Ravikumar

    Abstract: Semi-supervised learning and weakly supervised learning are important paradigms that aim to reduce the growing demand for labeled data in current machine learning applications. In this paper, we introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002) that moreover takes advantage of useful prior information, specifically probabilistic hypothesized lab… ▽ More

    Submitted 9 April, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: ICLR 2023, 26 pages, 2 figures

  20. arXiv:2209.08037  [pdf, other

    cs.LG stat.ME stat.ML

    DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization

    Authors: Kevin Bello, Bryon Aragam, Pradeep Ravikumar

    Abstract: The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles… ▽ More

    Submitted 15 January, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

    Comments: 28 pages, 13 figures, published at NeurIPS 2022

  21. arXiv:2208.14966  [pdf, other

    cs.LG

    Concept Gradient: Concept-based Interpretation Without Linear Assumption

    Authors: Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil Y. C. Lin, Cho-Jui Hsieh

    Abstract: Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. I… ▽ More

    Submitted 5 February, 2024; v1 submitted 31 August, 2022; originally announced August 2022.

    Comments: 21 pages, 7 figures, published in ICLR 2023

  22. arXiv:2206.10044  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Identifiability of deep generative models without auxiliary information

    Authors: Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

    Abstract: We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Specifically, we show that for a broad class of generativ… ▽ More

    Submitted 18 October, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: 34 pages, 9 figures, to appear in NeurIPS 2022

  23. arXiv:2206.03362  [pdf, other

    cs.LG cs.AI cs.CR stat.ME stat.ML

    Building Robust Ensembles via Margin Boosting

    Authors: Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala

    Abstract: In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles.… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted by ICML 2022

  24. arXiv:2203.00870  [pdf, other

    cs.LG cs.GT

    Faith-Shap: The Faithful Shapley Interaction Index

    Authors: Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar

    Abstract: Shapley values, which were originally designed to assign attributions to individual players in coalition games, have become a commonly used approach in explainable machine learning to provide attributions to input features for black-box machine learning models. A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties. However, extending the Shaple… ▽ More

    Submitted 22 March, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

  25. arXiv:2202.12451  [pdf, other

    cs.LG cs.AI

    Human-Centered Concept Explanations for Neural Networks

    Authors: Chih-Kuan Yeh, Been Kim, Pradeep Ravikumar

    Abstract: Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: book chapter of Neuro-Symbolic Artificial Intelligence: The State of the Art, volume: 342, p.337 - 352, 2022

  26. arXiv:2202.11919  [pdf, other

    cs.LG cs.AI cs.GT

    Threading the Needle of On and Off-Manifold Value Functions for Shapley Explanations

    Authors: Chih-Kuan Yeh, Kuan-Yun Lee, Frederick Liu, Pradeep Ravikumar

    Abstract: A popular explainable AI (XAI) approach to quantify feature importance of a given model is via Shapley values. These Shapley values arose in cooperative games, and hence a critical ingredient to compute these in an XAI context is a so-called value function, that computes the "value" of a subset of features, and which connects machine learning models to cooperative games. There are many possible ch… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: AISTATS 2022

  27. arXiv:2202.11844  [pdf, other

    cs.LG cs.CL cs.CY

    First is Better Than Last for Language Data Influence

    Authors: Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep Ravikumar

    Abstract: The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer o… ▽ More

    Submitted 27 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

  28. arXiv:2202.09305  [pdf, other

    cs.LG stat.ML

    Masked prediction tasks: a parameter identifiability view

    Authors: Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski

    Abstract: The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the downstream task itself. This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream task… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

  29. arXiv:2202.06856  [pdf, other

    cs.LG cs.AI

    Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization

    Authors: Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

    Abstract: A common explanation for the failure of deep networks to generalize out-of-distribution is that they fail to recover the "correct" features. We challenge this notion with a simple experiment which suggests that ERM already learns sufficient features and that the current bottleneck is not feature learning, but robust regression. Our findings also imply that given a small amount of data from the tar… ▽ More

    Submitted 27 October, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

  30. arXiv:2201.12293  [pdf, other

    cs.LG stat.ML

    Understanding Why Generalized Reweighting Does Not Improve Over ERM

    Authors: Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

    Abstract: Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift where the training and the test distributions are different. A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO), have been proposed to solve this problem. But a line of recent work has empirically shown that these approaches do not significant… ▽ More

    Submitted 7 February, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: ICLR 2023. 40 pages, 3 figures

  31. arXiv:2110.13948  [pdf, other

    cs.LG stat.ML

    Boosted CVaR Classification

    Authors: Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar

    Abstract: Many modern machine learning tasks require models with high tail performance, i.e. high performance over the worst-off samples in the dataset. This problem has been widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A popular approach to maximize the model's tail performance is to minimize the CVaR (Conditional Value at Risk) loss, which com… ▽ More

    Submitted 10 November, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021. 16 pages, 4 figures

  32. arXiv:2110.11271  [pdf, other

    cs.LG stat.ML

    Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation

    Authors: Bingbin Liu, Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

    Abstract: Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. However, such observations have never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribut… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  33. arXiv:2110.07342  [pdf, other

    cs.CL cs.LG

    FILM: Following Instructions in Language with Modular Methods

    Authors: So Yeon Min, Devendra Singh Chaplot, Pradeep Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov

    Abstract: Recent methods for embodied instruction following are typically trained end-to-end using imitation learning. This often requires the use of expert trajectories and low-level language instructions. Such approaches assume that neural states will integrate multimodal semantics to perform state tracking, building spatial memory, exploration, and long-term planning. In contrast, we propose a modular me… ▽ More

    Submitted 16 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2022

  34. arXiv:2108.11483  [pdf, other

    cs.LG math.OC stat.ML

    Heavy-tailed Streaming Statistical Estimation

    Authors: Che-Ping Tsai, Adarsh Prasad, Sivaraman Balakrishnan, Pradeep Ravikumar

    Abstract: We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gra… ▽ More

    Submitted 25 February, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

  35. arXiv:2106.15563  [pdf, other

    cs.LG cs.AI stat.ML

    Learning latent causal graphs via mixture oracles

    Authors: Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

    Abstract: We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant… ▽ More

    Submitted 21 November, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: To appear at NeurIPS 2021. 41 pages

  36. arXiv:2106.10434  [pdf, other

    cs.LG cs.CL

    Improving Compositional Generalization in Classification Tasks via Structure Annotations

    Authors: Juyong Kim, Pradeep Ravikumar, Joshua Ainslie, Santiago Ontañón

    Abstract: Compositional generalization is the ability to generalize systematically to a new data distribution by combining known components. Although humans seem to have a great ability to generalize compositionally, state-of-the-art neural models struggle to do so. In this work, we study compositional generalization in classification tasks and present two main contributions. First, we study ways to convert… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: Accepted as a short paper at ACL 2021

  37. arXiv:2106.06142  [pdf, ps, other

    cs.LG stat.ML

    DORO: Distributional and Outlier Robust Optimization

    Authors: Runtian Zhai, Chen Dan, J. Zico Kolter, Pradeep Ravikumar

    Abstract: Many machine learning tasks involve subpopulation shift where the testing data distribution is a subpopulation of the training distribution. For such settings, a line of recent work has proposed the use of a variant of empirical risk minimization(ERM) known as distributionally robust optimization (DRO). In this work, we apply DRO to real, large-scale tasks with subpopulation shift, and observe tha… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: ICML 2021. Codes: https://github.com/RuntianZ/doro

  38. arXiv:2104.07232  [pdf, other

    cs.LG stat.ML

    Iterative Alignment Flows

    Authors: Zeyu Zhou, Ziyu Gong, Pradeep Ravikumar, David I. Inouye

    Abstract: The unsupervised task of aligning two or more distributions in a shared latent space has many applications including fair representations, batch effect mitigation, and unsupervised domain adaptation. Existing flow-based approaches estimate multiple flows independently, which is equivalent to learning multiple full generative models. Other approaches require adversarial learning, which can be compu… ▽ More

    Submitted 15 March, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

  39. arXiv:2103.02740  [pdf, ps, other

    stat.ML cs.LG

    Contrastive learning of strong-mixing continuous-time stochastic processes

    Authors: Bingbin Liu, Pradeep Ravikumar, Andrej Risteski

    Abstract: Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. It has recently emerged as one of the leading learning paradigms in the absence of labels across many different domains (e.g. brain imaging, text, images). However, theoretical understanding of many aspects of training, both statistical and algorithmi… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Appearing in AISTATS 2021

  40. arXiv:2102.13128  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization

    Authors: Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

    Abstract: A popular assumption for out-of-distribution generalization is that the training data comprises sub-datasets, each drawn from a distinct distribution; the goal is then to "interpolate" these distributions and "extrapolate" beyond them -- this objective is broadly known as domain generalization. A common belief is that ERM can interpolate but not extrapolate and that the latter is considerably more… ▽ More

    Submitted 18 November, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

  41. arXiv:2102.10264  [pdf, other

    cs.LG cs.RO stat.ML

    On Proximal Policy Optimization's Heavy-tailed Gradients

    Authors: Saurabh Garg, Joshua Zhanson, Emilio Parisotto, Adarsh Prasad, J. Zico Kolter, Zachary C. Lipton, Sivaraman Balakrishnan, Ruslan Salakhutdinov, Pradeep Ravikumar

    Abstract: Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characteriz… ▽ More

    Submitted 12 July, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  42. arXiv:2101.00300  [pdf, ps, other

    cs.LG cs.AI stat.ML

    When Is Generalizable Reinforcement Learning Tractable?

    Authors: Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar

    Abstract: Agents trained by reinforcement learning (RL) often fail to generalize beyond the environment they were trained in, even when presented with new scenarios that seem similar to the training environment. We study the query complexity required to train RL agents that generalize to multiple environments. Intuitively, tractable generalization is only possible when the environments are similar or close… ▽ More

    Submitted 25 October, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

    Comments: Neurips 2021, v3 fixes minor typos

  43. arXiv:2012.10713  [pdf, other

    cs.LG cs.AI stat.ML

    Fundamental Limits and Tradeoffs in Invariant Representation Learning

    Authors: Han Zhao, Chen Dan, Bryon Aragam, Tommi S. Jaakkola, Geoffrey J. Gordon, Pradeep Ravikumar

    Abstract: A wide range of machine learning applications such as privacy-preserving learning, algorithmic fairness, and domain adaptation/generalization among others, involve learning invariant representations of the data that aim to achieve two competing goals: (a) maximize information or accuracy with respect to a target response, and (b) maximize invariance or independence with respect to a set of protect… ▽ More

    Submitted 23 November, 2022; v1 submitted 19 December, 2020; originally announced December 2020.

    Comments: JMLR camera-ready version

  44. arXiv:2010.05761  [pdf, other

    cs.LG cs.AI stat.ML

    The Risks of Invariant Risk Minimization

    Authors: Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

    Abstract: Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of d… ▽ More

    Submitted 27 March, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: ICLR 2021 Camera-Ready

  45. arXiv:2006.16384  [pdf, other

    stat.ML cs.LG

    Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification

    Authors: Chen Dan, Yuting Wei, Pradeep Ravikumar

    Abstract: Adversarial robustness has become a fundamental requirement in modern machine learning applications. Yet, there has been surprisingly little statistical understanding so far. In this paper, we provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification, under Gaussian mixture model proposed by \cite{schmidt2018adversarially}. The results a… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: 25 pages, 1 figure. Accepted by ICML 2020

  46. arXiv:2006.11430  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Learning Minimax Estimators via Online Learning

    Authors: Kartik Gupta, Arun Sai Suggala, Adarsh Prasad, Praneeth Netrapalli, Pradeep Ravikumar

    Abstract: We consider the problem of designing minimax estimators for estimating the parameters of a probability distribution. Unlike classical approaches such as the MLE and minimum distance estimators, we consider an algorithmic approach for constructing such estimators. We view the problem of designing minimax estimators as finding a mixed strategy Nash equilibrium of a zero-sum game. By leveraging recen… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 60 pages. Under review

  47. arXiv:2006.07972  [pdf, other

    cs.LG stat.ML

    Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances

    Authors: Sijie He, Xinyan Li, Timothy DelSole, Pradeep Ravikumar, Arindam Banerjee

    Abstract: Sub-seasonal climate forecasting (SSF) focuses on predicting key climate variables such as temperature and precipitation in the 2-week to 2-month time scales. Skillful SSF would have immense societal value, in areas such as agricultural productivity, water resource management, transportation and aviation systems, and emergency planning for extreme weather events. However, SSF is considered more ch… ▽ More

    Submitted 24 June, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

  48. arXiv:2006.00442  [pdf, other

    cs.LG stat.ML

    Evaluations and Methods for Explanation through Robustness Analysis

    Authors: Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

    Abstract: Feature based explanations, that provide importance of each feature towards the model prediction, is arguably one of the most intuitive ways to explain a model. In this paper, we establish a novel set of evaluation criteria for such feature based explanations by robustness analysis. In contrast to existing evaluations which require us to specify some way to "remove" features that could inevitably… ▽ More

    Submitted 8 April, 2021; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: To appear in ICLR 2021

  49. arXiv:2005.12914  [pdf, other

    stat.ML cs.LG

    Class-Weighted Classification: Trade-offs and Robust Approaches

    Authors: Ziyu Xu, Chen Dan, Justin Khim, Pradeep Ravikumar

    Abstract: We address imbalanced classification, the problem in which a label may have low marginal probability relative to other labels, by weighting losses according to the correct class. First, we examine the convergence rates of the expected excess weighted risk of plug-in classifiers where the weighting for the plug-in classifier and the risk may be different. This leads to irreducible errors that do no… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: 28 pages, 4 figures

  50. arXiv:2004.05665  [pdf, other

    cs.LG stat.ML

    Minimizing FLOPs to Learn Efficient Sparse Representations

    Authors: Biswajit Paria, Chih-Kuan Yeh, Ian E. H. Yen, Ning Xu, Pradeep Ravikumar, Barnabás Póczos

    Abstract: Deep representation learning has become one of the most widely adopted approaches for visual search, recommendation, and identification. Retrieval of such representations from a large database is however computationally challenging. Approximate methods based on learning compact representations, have been widely explored for this problem, such as locality sensitive hashing, product quantization, an… ▽ More

    Submitted 12 April, 2020; originally announced April 2020.

    Comments: Published at ICLR 2020