-
From Conformal Predictions to Confidence Regions
Authors:
Charles Guille-Escuret,
Eugene Ndiaye
Abstract:
Conformal prediction methodologies have significantly advanced the quantification of uncertainties in predictive models. Yet, the construction of confidence regions for model parameters presents a notable challenge, often necessitating stringent assumptions regarding data distribution or merely providing asymptotic guarantees. We introduce a novel approach termed CCR, which employs a combination o…
▽ More
Conformal prediction methodologies have significantly advanced the quantification of uncertainties in predictive models. Yet, the construction of confidence regions for model parameters presents a notable challenge, often necessitating stringent assumptions regarding data distribution or merely providing asymptotic guarantees. We introduce a novel approach termed CCR, which employs a combination of conformal prediction intervals for the model outputs to establish confidence regions for model parameters. We present coverage guarantees under minimal assumptions on noise and that is valid in finite sample regime. Our approach is applicable to both split conformal predictions and black-box methodologies including full or cross-conformal approaches. In the specific case of linear models, the derived confidence region manifests as the feasible set of a Mixed-Integer Linear Program (MILP), facilitating the deduction of confidence intervals for individual parameters and enabling robust optimization. We empirically compare CCR to recent advancements in challenging settings such as with heteroskedastic and non-Gaussian noise.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Conformal Prediction via Regression-as-Classification
Authors:
Etash Guha,
Shlok Natarajan,
Thomas Möllenhoff,
Mohammad Emtiyaz Khan,
Eugene Ndiaye
Abstract:
Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classifica…
▽ More
Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression.~To preserve the ordering of the continuous-output space, we design a new loss function and make necessary modifications to the CP classification techniques.~Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Careful with that Scalpel: Improving Gradient Surgery with an EMA
Authors:
Yu-Guan Hsieh,
James Thornton,
Eugene Ndiaye,
Michal Klein,
Marco Cuturi,
Pierre Ablin
Abstract:
Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent works have shown that one…
▽ More
Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent works have shown that one can improve performance by blending the gradients beyond a simple sum; this is known as gradient surgery. We cast the problem as a constrained minimization problem where the auxiliary objective is minimized among the set of minimizers of the training loss. To solve this bilevel problem, we follow a parameter update direction that combines the training loss gradient and the orthogonal projection of the auxiliary gradient to the training gradient. In a setting where gradients come from mini-batches, we explain how, using a moving average of the training loss gradients, we can carefully maintain this critical orthogonality property. We demonstrate that our method, Bloop, can lead to much better performances on NLP and vision experiments than other gradient surgery methods without EMA.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Finite Sample Confidence Regions for Linear Regression Parameters Using Arbitrary Predictors
Authors:
Charles Guille-Escuret,
Eugene Ndiaye
Abstract:
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor. Our framework requires minimal assumptions on the noise and can be extended to functions deviating from strict linearity up to some adjustable threshold, thereby accommodating a comprehensive and pragmatically relevant set of functions. The derived con…
▽ More
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor. Our framework requires minimal assumptions on the noise and can be extended to functions deviating from strict linearity up to some adjustable threshold, thereby accommodating a comprehensive and pragmatically relevant set of functions. The derived confidence regions can be cast as constraints within a Mixed Integer Linear Programming framework, enabling optimisation of linear objectives. This representation enables robust optimization and the extraction of confidence intervals for specific parameter coordinates. Unlike previous methods, the confidence region can be empty, which can be used for hypothesis testing. Finally, we validate the empirical applicability of our method on synthetic data.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection
Authors:
Matthew Lau,
Ismaila Seck,
Athanasios P Meliopoulos,
Wenke Lee,
Eugene Ndiaye
Abstract:
The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated i…
▽ More
The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing numbers, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.
△ Less
Submitted 18 June, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
Conformalization of Sparse Generalized Linear Models
Authors:
Etash Kumar Guha,
Eugene Ndiaye,
Xiaoming Huo
Abstract:
Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indee…
▽ More
Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Learning Elastic Costs to Shape Monge Displacements
Authors:
Michal Klein,
Aram-Alexandre Pooladian,
Pierre Ablin,
Eugène Ndiaye,
Jonathan Niles-Weed,
Marco Cuturi
Abstract:
Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other. This efficiency is quantified by defining a \textit{cost} function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance,…
▽ More
Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other. This efficiency is quantified by defining a \textit{cost} function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance, $\ell^2_2(\mathbf{x},\mathbf{y})=\tfrac12|\mathbf{x}-\mathbf{y}|_2^2$. Recently, Cuturi et. al '23 highlighted the benefits of using elastic costs, defined through a regularizer $τ$ as $c(\mathbf{x},\mathbf{y})=\ell^2_2(\mathbf{x},\mathbf{y})+τ(\mathbf{x}-\mathbf{y})$. Such costs shape the \textit{displacements} of Monge maps $T$, i.e., the difference between a source point and its image $T(\mathbf{x})-\mathbf{x})$, by giving them a structure that matches that of the proximal operator of $τ$. In this work, we make two important contributions to the study of elastic costs: (i) For any elastic cost, we propose a numerical method to compute Monge maps that are provably optimal. This provides a much-needed routine to create synthetic problems where the ground truth OT map is known, by analogy to the Brenier theorem, which states that the gradient of any convex potential is always a valid Monge map for the $\ell_2^2$ cost; (ii) We propose a loss to \textit{learn} the parameter $θ$ of a parameterized regularizer $τ_θ$, and apply it in the case where $τ_{A}(\mathbf{z})=|A^\perp \mathbf{z}|^2_2$. This regularizer promotes displacements that lie on a low dimensional subspace of $\mathbb{R}^d$, spanned by the $p$ rows of $A\in\mathbb{R}^{p\times d}$.
△ Less
Submitted 23 May, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Exact and Approximate Conformal Inference for Multi-Output Regression
Authors:
Chancellor Johnstone,
Eugene Ndiaye
Abstract:
It is common in machine learning to estimate a response $y$ given covariate information $x$. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response $y$ with a prescribed probability. Unfortunately, even with a one-dimensional…
▽ More
It is common in machine learning to estimate a response $y$ given covariate information $x$. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response $y$ with a prescribed probability. Unfortunately, even with a one-dimensional response, conformal inference is computationally expensive despite recent encouraging advances. In this paper, we explore multi-output regression, delivering exact derivations of conformal inference $p$-values when the predictive model can be described as a linear function of $y$. Additionally, we propose \texttt{unionCP} and a multivariate extension of \texttt{rootCP} as efficient ways of approximating the conformal prediction region for a wide array of multi-output predictors, both linear and nonlinear, while preserving computational advantages. We also provide both theoretical and empirical evidence of the effectiveness of these methods using both real-world and simulated data.
△ Less
Submitted 22 June, 2024; v1 submitted 31 October, 2022;
originally announced October 2022.
-
A Confidence Machine for Sparse High-Order Interaction Model
Authors:
Diptesh Das,
Eugene Ndiaye,
Ichiro Takeuchi
Abstract:
In predictive modeling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the confidence of prediction results with fewer theoretical assumptions. To obtain the confidence set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible…
▽ More
In predictive modeling for high-stake decision-making, predictors must be not only accurate but also reliable. Conformal prediction (CP) is a promising approach for obtaining the confidence of prediction results with fewer theoretical assumptions. To obtain the confidence set by so-called full-CP, we need to refit the predictor for all possible values of prediction results, which is only possible for simple predictors. For complex predictors such as random forests (RFs) or neural networks (NNs), split-CP is often employed where the data is split into two parts: one part for fitting and another to compute the confidence set. Unfortunately, because of the reduced sample size, split-CP is inferior to full-CP both in fitting as well as confidence set computation. In this paper, we develop a full-CP of sparse high-order interaction model (SHIM), which is sufficiently flexible as it can take into account high-order interactions among variables. We resolve the computational challenge for full-CP of SHIM by introducing a novel approach called homotopy mining. Through numerical experiments, we demonstrate that SHIM is as accurate as complex predictors such as RF and NN and enjoys the superior statistical power of full-CP.
△ Less
Submitted 1 November, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Stable Conformal Prediction Sets
Authors:
Eugene Ndiaye
Abstract:
When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP) is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. CP sets have guaranteed coverage for any finite population size $n$. While appealing, the computation of such a set turns out to be infeasib…
▽ More
When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP) is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. CP sets have guaranteed coverage for any finite population size $n$. While appealing, the computation of such a set turns out to be infeasible in general, e.g. when the unknown variable $y_{n+1}$ is continuous. The bottleneck is that it is based on a procedure that readjusts a prediction model on data where we replace the unknown target by all its possible values in order to select the most probable one. This requires computing an infinite number of models, which often makes it intractable. In this paper, we combine CP techniques with classical algorithmic stability bounds to derive a prediction set computable with a single model fit. We demonstrate that our proposed confidence set does not lose any coverage guarantees while avoiding the need for data splitting as currently done in the literature. We provide some numerical experiments to illustrate the tightness of our estimation when the sample size is sufficiently large, on both synthetic and real datasets.
△ Less
Submitted 6 December, 2022; v1 submitted 19 December, 2021;
originally announced December 2021.
-
Continuation Path with Linear Convergence Rate
Authors:
Eugene Ndiaye,
Ichiro Takeuchi
Abstract:
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems, with varying regularization hyperparameters, are solved sequentially. By reusing the previous solutions as initialization, better convergence speeds have been observed numerically. This makes it a rather useful heuristic to speed up the execution of optimization algorithms in machine lea…
▽ More
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems, with varying regularization hyperparameters, are solved sequentially. By reusing the previous solutions as initialization, better convergence speeds have been observed numerically. This makes it a rather useful heuristic to speed up the execution of optimization algorithms in machine learning. We present a primal dual analysis of the path-following algorithm and explore how to design its hyperparameters as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem. Furthermore, considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter. The latter can then be adaptively calibrated to finely determine the number of features that will be selected along the solution path. This leads to simple heuristics for calibrating hyperparameters of active set approaches to reduce their complexity and improve their execution time.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Root-finding Approaches for Computing Conformal Prediction Set
Authors:
Eugene Ndiaye,
Ichiro Takeuchi
Abstract:
Conformal prediction constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features. It has a coverage guarantee at any nominal level without additional assumptions on their distribution. Its computation deplorably requires a refitting procedure for all replacement candidates of the target…
▽ More
Conformal prediction constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features. It has a coverage guarantee at any nominal level without additional assumptions on their distribution. Its computation deplorably requires a refitting procedure for all replacement candidates of the target response. In regression settings, this corresponds to an infinite number of model fits. Apart from relatively simple estimators that can be written as pieces of linear function of the response, efficiently computing such sets is difficult, and is still considered as an open problem. We exploit the fact that, \emph{often}, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding algorithms. We investigate how this approach can overcome many limitations of formerly used strategies; we discuss its complexity and drawbacks.
△ Less
Submitted 6 December, 2022; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Screening Rules and its Complexity for Active Set Identification
Authors:
Eugene Ndiaye,
Olivier Fercoq,
Joseph Salmon
Abstract:
Screening rules were recently introduced as a technique for explicitly identifying active structures such as sparsity, in optimization problem arising in machine learning. This has led to new methods of acceleration based on a substantial dimension reduction. We show that screening rules stem from a combination of natural properties of subdifferential sets and optimality conditions, and can hence…
▽ More
Screening rules were recently introduced as a technique for explicitly identifying active structures such as sparsity, in optimization problem arising in machine learning. This has led to new methods of acceleration based on a substantial dimension reduction. We show that screening rules stem from a combination of natural properties of subdifferential sets and optimality conditions, and can hence be understood in a unified way. Under mild assumptions, we analyze the number of iterations needed to identify the optimal active set for any converging algorithm. We show that it only depends on its convergence rate.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Computing Full Conformal Prediction Set with Approximate Homotopy
Authors:
Eugene Ndiaye,
Ichiro Takeuchi
Abstract:
If you are predicting the label $y$ of a new object with $\hat y$, how confident are you that $y = \hat y$? Conformal prediction methods provide an elegant framework for answering such question by building a $100 (1 - α)\%$ confidence region without assumptions on the distribution of the data. It is based on a refitting procedure that parses all the possibilities for $y$ to select the most likely…
▽ More
If you are predicting the label $y$ of a new object with $\hat y$, how confident are you that $y = \hat y$? Conformal prediction methods provide an elegant framework for answering such question by building a $100 (1 - α)\%$ confidence region without assumptions on the distribution of the data. It is based on a refitting procedure that parses all the possibilities for $y$ to select the most likely ones. Although providing strong coverage guarantees, conformal set is impractical to compute exactly for many regression problems. We propose efficient algorithms to compute conformal prediction set using approximated solution of (convex) regularized empirical risk minimization. Our approaches rely on a new homotopy continuation technique for tracking the solution path with respect to sequential changes of the observations. We also provide a detailed analysis quantifying its complexity.
△ Less
Submitted 7 November, 2019; v1 submitted 20 September, 2019;
originally announced September 2019.
-
Safe Grid Search with Optimal Complexity
Authors:
Eugene Ndiaye,
Tam Le,
Olivier Fercoq,
Joseph Salmon,
Ichiro Takeuchi
Abstract:
Popular machine learning estimators involve regularization parameters that can be challenging to tune, and standard strategies rely on grid search for this task. In this paper, we revisit the techniques of approximating the regularization path up to predefined tolerance $ε$ in a unified framework and show that its complexity is $O(1/\sqrt[d]ε)$ for uniformly convex loss of order $d \geq 2$ and…
▽ More
Popular machine learning estimators involve regularization parameters that can be challenging to tune, and standard strategies rely on grid search for this task. In this paper, we revisit the techniques of approximating the regularization path up to predefined tolerance $ε$ in a unified framework and show that its complexity is $O(1/\sqrt[d]ε)$ for uniformly convex loss of order $d \geq 2$ and $O(1/\sqrtε)$ for Generalized Self-Concordant functions. This framework encompasses least-squares but also logistic regression, a case that as far as we know was not handled as precisely in previous works. We leverage our technique to provide refined bounds on the validation error as well as a practical algorithm for hyperparameter tuning. The latter has global convergence guarantee when targeting a prescribed accuracy on the validation set. Last but not least, our approach helps relieving the practitioner from the (often neglected) task of selecting a stopping criterion when optimizing over the training set: our method automatically calibrates this criterion based on the targeted accuracy on the validation set.
△ Less
Submitted 27 May, 2019; v1 submitted 12 October, 2018;
originally announced October 2018.
-
Gap Safe screening rules for sparsity enforcing penalties
Authors:
Eugene Ndiaye,
Olivier Fercoq,
Alexandre Gramfort,
Joseph Salmon
Abstract:
In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the ru…
▽ More
In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the rules are said to be safe. In this work, we propose a unifying framework for generalized linear models regularized with standard sparsity enforcing penalties such as $\ell_1$ or $\ell_1/\ell_2$ norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods. Applied to many standard learning tasks, Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc., we report significant speed-ups compared to previously proposed safe rules on all tested data sets.
△ Less
Submitted 27 December, 2017; v1 submitted 17 November, 2016;
originally announced November 2016.
-
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
Authors:
Eugene Ndiaye,
Olivier Fercoq,
Alexandre Gramfort,
Vincent Leclère,
Joseph Salmon
Abstract:
In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider $\ell_1$ penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting ver…
▽ More
In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider $\ell_1$ penalty to enforce sparsity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for confidence sets or uncertainty quantification. In this work, after illustrating numerical difficulties for the Smoothed Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expansive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.
△ Less
Submitted 8 June, 2016;
originally announced June 2016.
-
GAP Safe Screening Rules for Sparse-Group-Lasso
Authors:
Eugene Ndiaye,
Olivier Fercoq,
Alexandre Gramfort,
Joseph Salmon
Abstract:
In high dimensional settings, sparse structures are crucial for efficiency, either in term of memory, computation or performance. In some contexts, it is natural to handle more refined structures than pure sparsity, such as for instance group sparsity. Sparse-Group Lasso has recently been introduced in the context of linear regression to enforce sparsity both at the feature level and at the group…
▽ More
In high dimensional settings, sparse structures are crucial for efficiency, either in term of memory, computation or performance. In some contexts, it is natural to handle more refined structures than pure sparsity, such as for instance group sparsity. Sparse-Group Lasso has recently been introduced in the context of linear regression to enforce sparsity both at the feature level and at the group level. We adapt to the case of Sparse-Group Lasso recent safe screening rules that discard early in the solver irrelevant features/groups. Such rules have led to important speed-ups for a wide range of iterative methods. Thanks to dual gap computations, we provide new safe screening rules for Sparse-Group Lasso and show significant gains in term of computing time for a coordinate descent implementation.
△ Less
Submitted 19 February, 2016;
originally announced February 2016.
-
GAP Safe screening rules for sparse multi-task and multi-class models
Authors:
Eugene Ndiaye,
Olivier Fercoq,
Alexandre Gramfort,
Joseph Salmon
Abstract:
High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be \emph{safe}. In this paper we derive new safe rules for generalized linear models regularized with…
▽ More
High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be \emph{safe}. In this paper we derive new safe rules for generalized linear models regularized with $\ell_1$ and $\ell_1/\ell_2$ norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters. The GAP Safe rule can cope with any iterative solver and we illustrate its performance on coordinate descent for multi-task Lasso, binary and multinomial logistic regression, demonstrating significant speed ups on all tested datasets with respect to previous safe rules.
△ Less
Submitted 18 November, 2015; v1 submitted 11 June, 2015;
originally announced June 2015.