Search | arXiv e-print repository

Evaluating Large Language Models with fmeval

Authors: Pola Schwöbel, Luca Franceschi, Muhammad Bilal Zafar, Keerthan Vasist, Aman Malhotra, Tomer Shenhar, Pinal Tailor, Pinar Yilmaz, Michael Diamond, Michele Donini

Abstract: fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific an… ▽ More fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2402.09947 [pdf, other]

Explaining Probabilistic Models with Distributional Values

Authors: Luca Franceschi, Michele Donini, Cédric Archambeau, Matthias Seeger

Abstract: A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This pape… ▽ More A large branch of explainable machine learning is grounded in cooperative game theory. However, research indicates that game-theoretic explanations may mislead or be hard to interpret. We argue that often there is a critical mismatch between what one wishes to explain (e.g. the output of a classifier) and what current methods such as SHAP explain (e.g. the scalar probability of a class). This paper addresses such gap for probabilistic models by generalising cooperative games and value operators. We introduce the distributional values, random variables that track changes in the model output (e.g. flipping of the predicted class) and derive their analytic expressions for games with Gaussian, Bernoulli and Categorical payoffs. We further establish several characterising properties, and show that our framework provides fine-grained and insightful explanations with case studies on vision and language models. △ Less

Submitted 14 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: ICML 2024 (spotlight paper). Code: https://github.com/amazon-science/explaining-probabilistic-models-with-distributinal-values

arXiv:2301.11898 [pdf, other]

DAG Learning on the Permutahedron

Authors: Valentina Zantedeschi, Luca Franceschi, Jean Kaddour, Matt J. Kusner, Vlad Niculae

Abstract: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimiz… ▽ More We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimization approaches our formulation has a number of advantages including: 1. validity: optimizes over exact DAGs as opposed to other relaxations optimizing approximate DAGs; 2. modularity: accommodates any edge-optimization procedure, edge structural parameterization, and optimization loss; 3. end-to-end: either alternately iterates between node-ordering and edge-optimization, or optimizes them jointly. We demonstrate, on real-world data problems in protein-signaling and transcriptional network discovery, that our approach lies on the Pareto frontier of two key metrics, the SID and SHD. △ Less

Submitted 10 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: The Eleventh International Conference on Learning Representations

arXiv:2210.15353 [pdf, other]

Learning Discrete Directed Acyclic Graphs via Backpropagation

Authors: Andrew J. Wren, Pasquale Minervini, Luca Franceschi, Valentina Zantedeschi

Abstract: Recently continuous relaxations have been proposed in order to learn Directed Acyclic Graphs (DAGs) from data by backpropagation, instead of using combinatorial optimization. However, a number of techniques for fully discrete backpropagation could instead be applied. In this paper, we explore that direction and propose DAG-DB, a framework for learning DAGs by Discrete Backpropagation. Based on the… ▽ More Recently continuous relaxations have been proposed in order to learn Directed Acyclic Graphs (DAGs) from data by backpropagation, instead of using combinatorial optimization. However, a number of techniques for fully discrete backpropagation could instead be applied. In this paper, we explore that direction and propose DAG-DB, a framework for learning DAGs by Discrete Backpropagation. Based on the architecture of Implicit Maximum Likelihood Estimation [I-MLE, arXiv:2106.01798], DAG-DB adopts a probabilistic approach to the problem, sampling binary adjacency matrices from an implicit probability distribution. DAG-DB learns a parameter for the distribution from the loss incurred by each sample, performing competitively using either of two fully discrete backpropagation techniques, namely I-MLE and Straight-Through Estimation. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 15 pages, 2 figures, 7 tables. Accepted for NeurIPS 2022 workshops on: Causal Machine Learning for Real-World Impact; and Neuro Causal and Symbolic AI

arXiv:2209.04862 [pdf, other]

Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models

Authors: Pasquale Minervini, Luca Franceschi, Mathias Niepert

Abstract: The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. Howe… ▽ More The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size, which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE), the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators. △ Less

Submitted 5 February, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

arXiv:2207.09980 [pdf, other]

ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective

Authors: Yihong Chen, Pushkar Mishra, Luca Franceschi, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

Abstract: Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing ReFactor GNNs. This new architecture draws upon… ▽ More Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing ReFactor GNNs. This new architecture draws upon both modelling paradigms, which previously were largely thought of as disjoint. Concretely, using a message-passing formalism, we show how FMs can be cast as GNNs by reformulating the gradient descent procedure as message-passing operations, which forms the basis of our ReFactor GNNs. Across a multitude of well-established KGC benchmarks, our ReFactor GNNs achieve comparable transductive performance to FMs, and state-of-the-art inductive performance while using an order of magnitude fewer parameters. △ Less

Submitted 27 October, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

MSC Class: 68T05; 68T07; 68T50 ACM Class: I.2.7; I.2.6

arXiv:2106.01798 [pdf, other]

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Authors: Mathias Niepert, Pasquale Minervini, Luca Franceschi

Abstract: Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it… ▽ More Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations. △ Less

Submitted 27 October, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 camera-ready; repo: https://github.com/nec-research/tf-imle

arXiv:2006.16218 [pdf, other]

On the Iteration Complexity of Hypergradient Computation

Authors: Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo

Abstract: We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or… ▽ More We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings. △ Less

Submitted 10 July, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: accepted at ICML 2020; 19 pages, 4 figures; code at https://github.com/prolearner/hypertorch (corrected typos and one reference)

arXiv:1910.08525 [pdf, other]

MARTHE: Scheduling the Learning Rate Via Online Hypergradients

Authors: Michele Donini, Luca Franceschi, Massimiliano Pontil, Orchid Majumder, Paolo Frasconi

Abstract: We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses pas… ▽ More We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO (Franceschi et al., 2017) and HD (Baydin et al. 2018), and is able to produce learning rate schedules that are more stable leading to models that generalize better. △ Less

Submitted 17 May, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: IJCAI 2020. Larger images. Code available at https://github.com/awslabs/adatune

arXiv:1903.11960 [pdf, other]

Learning Discrete Structures for Graph Neural Networks

Authors: Luca Franceschi, Mathias Niepert, Massimiliano Pontil, Xiao He

Abstract: Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we pro… ▽ More Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin. △ Less

Submitted 19 June, 2020; v1 submitted 28 March, 2019; originally announced March 2019.

Comments: ICML 2019, code at https://github.com/lucfra/LDS - Revision of Sec. 3

arXiv:1809.09759 [pdf, other]

doi 10.1109/LRA.2019.2899434

Fast and Continuous Foothold Adaptation for Dynamic Locomotion through CNNs

Authors: Octavio Villarreal, Victor Barasuol, Marco Camurri, Luca Franceschi, Michele Focchi, Massimiliano Pontil, Darwin G. Caldwell, Claudio Semini

Abstract: Legged robots can outperform wheeled machines for most navigation tasks across unknown and rough terrains. For such tasks, visual feedback is a fundamental asset to provide robots with terrain-awareness. However, robust dynamic locomotion on difficult terrains with real-time performance guarantees remains a challenge. We present here a real-time, dynamic foothold adaptation strategy based on visua… ▽ More Legged robots can outperform wheeled machines for most navigation tasks across unknown and rough terrains. For such tasks, visual feedback is a fundamental asset to provide robots with terrain-awareness. However, robust dynamic locomotion on difficult terrains with real-time performance guarantees remains a challenge. We present here a real-time, dynamic foothold adaptation strategy based on visual feedback. Our method adjusts the landing position of the feet in a fully reactive manner, using only on-board computers and sensors. The correction is computed and executed continuously along the swing phase trajectory of each leg. To efficiently adapt the landing position, we implement a self-supervised foothold classifier based on a Convolutional Neural Network (CNN). Our method results in an up to 200 times faster computation with respect to the full-blown heuristics. Our goal is to react to visual stimuli from the environment, bridging the gap between blind reactive locomotion and purely vision-based planning strategies. We assess the performance of our method on the dynamic quadruped robot HyQ, executing static and dynamic gaits (at speeds up to 0.5 m/s) in both simulated and real scenarios; the benefit of safe foothold adaptation is clearly demonstrated by the overall robot behavior. △ Less

Submitted 15 February, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

Comments: 9 pages, 11 figures. Accepted to RA-L + ICRA 2019, January 2019

arXiv:1806.04941 [pdf, other]

Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning

Authors: Luca Franceschi, Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo, Paolo Frasconi

Abstract: In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize l… ▽ More In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize learning rates, automatically weight the loss of single examples and learn hyper-representations with Far-HO, a software package based on the popular deep learning framework TensorFlow that allows to seamlessly tackle both HO and ML problems. △ Less

Submitted 13 June, 2018; originally announced June 2018.

Comments: This submission is a reduced version of (Franceschi et al., arXiv:1806.04910) which has been accepted at the main ICML 2018 conference. In this paper we illustrate the software framework, material that could not be included in the conference paper

arXiv:1806.04910 [pdf, other]

Bilevel Programming for Hyperparameter Optimization and Meta-Learning

Authors: Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimilano Pontil

Abstract: We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised l… ▽ More We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn. △ Less

Submitted 3 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: ICML 2018; code for replicating experiments at https://github.com/prolearner/hyper-representation, main package (Far-HO) at https://github.com/lucfra/FAR-HO

arXiv:1712.06283 [pdf, other]

A Bridge Between Hyperparameter Optimization and Learning-to-learn

Authors: Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil

Abstract: We consider a class of a nested optimization problems involving inner and outer objectives. We observe that by taking into explicit account the optimization dynamics for the inner objective it is possible to derive a general framework that unifies gradient-based hyperparameter optimization and meta-learning (or learning-to-learn). Depending on the specific setting, the variables of the outer objec… ▽ More We consider a class of a nested optimization problems involving inner and outer objectives. We observe that by taking into explicit account the optimization dynamics for the inner objective it is possible to derive a general framework that unifies gradient-based hyperparameter optimization and meta-learning (or learning-to-learn). Depending on the specific setting, the variables of the outer objective take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We show that some recently proposed methods in the latter setting can be instantiated in our framework and tackled with the same gradient-based algorithms. Finally, we discuss possible design patterns for learning-to-learn and present encouraging preliminary experiments for few-shot learning. △ Less

Submitted 21 August, 2019; v1 submitted 18 December, 2017; originally announced December 2017.

Comments: NIPS 2017 workshop on Meta-learning (http://metalearning.ml/)

Showing 1–14 of 14 results for author: Franceschi, L