Zum Hauptinhalt springen

Showing 1–48 of 48 results for author: Cutkosky, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01825  [pdf, other

    cs.LG math.OC

    Empirical Tests of Optimization Assumptions in Deep Learning

    Authors: Hoang Tran, Qinzi Zhang, Ashok Cutkosky

    Abstract: There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/prac… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.19579  [pdf, ps, other

    math.OC cs.CR cs.LG

    Private Zeroth-Order Nonsmooth Nonconvex Optimization

    Authors: Qinzi Zhang, Hoang Tran, Ashok Cutkosky

    Abstract: We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives. Given a dataset of size $M$, our algorithm ensures $(α,αρ^2/2)$-Rényi differential privacy and finds a $(δ,ε)$-stationary point so long as $M=\tildeΩ\left(\frac{d}{δε^3} + \frac{d^{3/2}}{ρδε^2}\right)$. This matches the optimal complexity of its non-private zeroth-order analog. Nota… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2405.20540  [pdf, ps, other

    cs.LG math.OC stat.ML

    Fully Unconstrained Online Learning

    Authors: Ashok Cutkosky, Zakaria Mhammedi

    Abstract: We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or $\|w_\star\|$. Importantly, this matches the optimal bound $G\|w_\star\|\sqrt{T}$ available with such knowledge (up to logarithmic factors), unless either $\|w_\star\|$ or… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.19175  [pdf, ps, other

    cs.LG stat.ML

    Online Linear Regression in Dynamic Environments via Discounting

    Authors: Andrew Jacobsen, Ashok Cutkosky

    Abstract: We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees \emph{even in the complete absence of prior knowledge}. We present a novel analysis showing that a discounted variant of the Vovk-Azoury-Warmuth forecaster achieves dynamic regret of the form $R_{T}(\vec{u})\le O\left(d\log(T)\vee \sqrt{dP_{T}^γ(\vec{u})T}\right)$, where… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 38 pages

  5. arXiv:2405.18199  [pdf, ps, other

    cs.LG math.OC

    Adam with model exponential moving average is effective for nonconvex optimization

    Authors: Kwangjun Ahn, Ashok Cutkosky

    Abstract: In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demonstrate that a clipped version of Adam with model EMA achieves the optimal convergence rates in various nonconvex optimization settings, both smooth an… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Comments would be appreciated!

  6. arXiv:2405.15682  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Road Less Scheduled

    Authors: Aaron Defazio, Xingyu Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

    Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More

    Submitted 7 August, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  7. arXiv:2405.09742  [pdf, other

    cs.LG math.OC

    Random Scaling and Momentum for Non-smooth Non-convex Optimization

    Authors: Qinzi Zhang, Ashok Cutkosky

    Abstract: Training neural networks requires optimizing a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on stochastic gradient descent with momentum (SGDM), for which classical analysis applies only if the loss is either convex or smooth. We show that a very small modification to SGDM closes this gap: simply scale the update at… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  8. arXiv:2310.07831  [pdf, other

    cs.LG cs.AI stat.ML

    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

    Authors: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

    Abstract: Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works t… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  9. arXiv:2309.16044  [pdf, ps, other

    cs.LG stat.ML

    Improving Adaptive Online Learning Using Refined Discretization

    Authors: Zhiyu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis

    Abstract: We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm that simultaneously achieves ($i$) the AdaGrad-style second order gradient adaptivity; and ($ii$) the comparator norm adaptivity also known as "parameter freeness" in the literature. In particular, - our algorithm does not employ the impractical dou… ▽ More

    Submitted 22 February, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: ALT 2024

  10. arXiv:2306.04923  [pdf, other

    cs.LG stat.ML

    Unconstrained Online Learning with Unbounded Losses

    Authors: Andrew Jacobsen, Ashok Cutkosky

    Abstract: Algorithms for online learning typically require one or more boundedness assumptions: that the domain is bounded, that the losses are Lipschitz, or both. In this paper, we develop a new setting for online learning with unbounded domains and non-Lipschitz losses. For this setting we provide an algorithm which guarantees $R_{T}(u)\le \tilde O(G\|u\|\sqrt{T}+L\|u\|^{2}\sqrt{T})$ regret on any problem… ▽ More

    Submitted 14 July, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 41 pages; ICML 2023; v2: fixed some details in the exposition introducing saddle-point problems

  11. arXiv:2306.00144  [pdf, other

    cs.LG

    Mechanic: A Learning Rate Tuner

    Authors: Ashok Cutkosky, Aaron Defazio, Harsh Mehta

    Abstract: We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call \textsc{mechanic}. Our method provides a practical realization of recent theoretical reductions for accomplishing a similar goal in online convex optimization. We rigorously evaluate \textsc{mechanic} on a range of large scale deep learning tasks with vary… ▽ More

    Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  12. arXiv:2302.03775  [pdf, ps, other

    cs.LG math.OC stat.ML

    Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

    Authors: Ashok Cutkosky, Harsh Mehta, Francesco Orabona

    Abstract: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a $(δ,ε)$-stationary point from $O(ε^{-4}δ^{-1})$ stochastic gradient queries to $O(ε^{-3}δ^{-1})$, which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to onl… ▽ More

    Submitted 11 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  13. arXiv:2301.13349  [pdf, other

    cs.LG math.OC stat.ML

    Unconstrained Dynamic Regret via Sparse Coding

    Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

    Abstract: Motivated by the challenge of nonstationarity in sequential decision making, we study Online Convex Optimization (OCO) under the coupling of two problem structures: the domain is unbounded, and the comparator sequence $u_1,\ldots,u_T$ is arbitrarily time-varying. As no algorithm can guarantee low regret simultaneously against all comparator sequences, handling this setting requires moving from min… ▽ More

    Submitted 25 October, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: NeurIPS 2023

  14. arXiv:2211.13403  [pdf, other

    cs.LG cs.CR cs.CV

    Differentially Private Image Classification from Features

    Authors: Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific c… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  15. arXiv:2210.14355  [pdf, other

    stat.ML cs.LG

    Parameter-free Regret in High Probability with Heavy Tails

    Authors: Jiujia Zhang, Ashok Cutkosky

    Abstract: We present new algorithms for online convex optimization over unbounded domains that obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates. Previous work in unbounded domains considers only in-expectation results for sub-exponential subgradients. Unlike in the bounded domain case, we cannot rely on straight-forward martingale concentra… ▽ More

    Submitted 25 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

  16. arXiv:2210.06593  [pdf, ps, other

    cs.LG cs.CR

    Differentially Private Online-to-Batch for Smooth Losses

    Authors: Qinzi Zhang, Hoang Tran, Ashok Cutkosky

    Abstract: We develop a new reduction that converts any online convex optimization algorithm suffering $O(\sqrt{T})$ regret into an $ε$-differentially private stochastic convex optimization algorithm with the optimal convergence rate $\tilde O(1/\sqrt{T} + \sqrt{d}/εT)$ on smooth losses in linear time, forming a direct analogy to the classical non-private "online-to-batch" conversion. By applying our techniq… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  17. arXiv:2210.06328  [pdf, ps, other

    cs.LG

    Momentum Aggregation for Private Non-convex ERM

    Authors: Hoang Tran, Ashok Cutkosky

    Abstract: We introduce new algorithms and convergence guarantees for privacy-preserving non-convex Empirical Risk Minimization (ERM) on smooth $d$-dimensional objectives. We develop an improved sensitivity analysis of stochastic gradient descent on smooth objectives that exploits the recurrence of examples in different epochs. By combining this new approach with recent analysis of momentum with private aggr… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  18. arXiv:2206.13947  [pdf, other

    cs.LG cs.CL

    Long Range Language Modeling via Gated State Spaces

    Authors: Harsh Mehta, Ankit Gupta, Ashok Cutkosky, Behnam Neyshabur

    Abstract: State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and sh… ▽ More

    Submitted 2 July, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

  19. arXiv:2205.06846  [pdf, other

    cs.LG

    Optimal Comparator Adaptive Online Learning with Switching Cost

    Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

    Abstract: Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity. In this paper, we design such algorithms in the presence of switching cost - the latter penalizes the typical optimism in adaptive algorithms, leading to a delicate design trade-off. Based on a novel dual… ▽ More

    Submitted 11 October, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  20. arXiv:2205.02973  [pdf, other

    cs.LG cs.CR cs.CV

    Large Scale Transfer Learning for Differentially Private Image Classification

    Authors: Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private trai… ▽ More

    Submitted 20 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

  21. arXiv:2203.10327  [pdf, other

    cs.LG

    Implicit Parameter-free Online Learning with Truncated Linear Models

    Authors: Keyi Chen, Ashok Cutkosky, Francesco Orabona

    Abstract: Parameter-free algorithms are online learning algorithms that do not require setting learning rates. They achieve optimal regret with respect to the distance between the initial point and any competitor. Yet, parameter-free algorithms do not take into account the geometry of the losses. Recently, in the stochastic optimization literature, it has been proposed to instead use truncated linear lower… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

  22. arXiv:2203.04274  [pdf, ps, other

    cs.LG cs.DS

    Leveraging Initial Hints for Free in Stochastic Linear Bandits

    Authors: Ashok Cutkosky, Chris Dann, Abhimanyu Das, Qiuyi, Zhang

    Abstract: We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action. We present a novel algorithm for stochastic linear bandits that uses this hint to improve its regret to $\tilde O(\sqrt{T})$ when the hint is accurate, while maintaining a minimax-optimal $\tilde O(d\sqrt{T})$ regret independent of th… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: ALT 2022

  23. arXiv:2203.00444  [pdf, other

    cs.LG math.OC stat.ML

    Parameter-free Mirror Descent

    Authors: Andrew Jacobsen, Ashok Cutkosky

    Abstract: We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains. We leverage this technique to develop the first unconstrained online linear optimization algorithm achieving an optimal dynamic regret bound, and we further demonstrate that natural strategies based on Follow-the-Regularized-Leader are unable to achieve s… ▽ More

    Submitted 8 February, 2024; v1 submitted 26 February, 2022; originally announced March 2022.

    Comments: 59 pages. v4: Added a new section (7. Trade-offs in the Horizon Dependence) discussing how to achieve an alternative type of parameter-free bound using our framework; v3: published at COLT 2022 + fixed typos; v2: improved the algorithms in sections 3, 5, and 6 (tighter regret, simpler updates and analysis), corrected minor technical details and fixed typos

  24. arXiv:2202.00089  [pdf, other

    cs.LG math.OC

    Understanding AdamW through Proximal Methods and Scale-Freeness

    Authors: Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

    Abstract: Adam has been widely adopted for training deep neural networks due to less hyperparameter tuning and remarkable performance. To improve generalization, Adam is typically used in tandem with a squared $\ell_2$ regularizer (referred to as Adam-$\ell_2$). However, even better performance can be obtained with AdamW, which decouples the gradient of the regularizer from the update rule of Adam-$\ell_2$.… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  25. arXiv:2201.07877  [pdf, other

    cs.LG

    PDE-Based Optimal Strategy for Unconstrained Online Learning

    Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Paschalidis

    Abstract: Unconstrained Online Linear Optimization (OLO) is a practical problem setting to study the training of machine learning models. Existing works proposed a number of potential-based algorithms, but in general the design of these potential functions relies heavily on guessing. To streamline this workflow, we present a framework that generates new potential functions by solving a Partial Differential… ▽ More

    Submitted 15 June, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: ICML 2022

  26. arXiv:2111.05257  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Logarithmic Regret from Sublinear Hints

    Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

    Abstract: We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a hint $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarante… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  27. arXiv:2110.14243  [pdf, other

    cs.LG stat.ML

    Online Selective Classification with Limited Feedback

    Authors: Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama

    Abstract: Motivated by applications to resource-limited and safety-critical domains, we study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance. For example, this may model an adaptive decision to invoke more resources on this instance. Two salient aspects of the setting we consider are that the data may be non-realisable, due to which absten… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: To appear at NeurIPS 2021

  28. arXiv:2106.14343  [pdf, other

    cs.LG math.OC stat.ML

    High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

    Authors: Ashok Cutkosky, Harsh Mehta

    Abstract: We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded $\mathfrak{p}$th moments for some… ▽ More

    Submitted 9 November, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

  29. arXiv:2103.03265  [pdf, other

    cs.LG stat.ML

    Better SGD using Second-order Momentum

    Authors: Hoang Tran, Ashok Cutkosky

    Abstract: We develop a new algorithm for non-convex stochastic optimization that finds an $ε$-critical point in the optimal $O(ε^{-3})$ stochastic gradient and Hessian-vector product computations. Our algorithm uses Hessian-vector products to "correct" a bias term in the momentum of SGD with momentum. This leads to better gradient estimates in a manner analogous to variance reduction methods. In contrast to… ▽ More

    Submitted 11 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  30. arXiv:2102.01623  [pdf, other

    cs.LG eess.SY

    Adversarial Tracking Control via Strongly Adaptive Online Learning with Memory

    Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Ch. Paschalidis

    Abstract: We consider the problem of tracking an adversarial state sequence in a linear dynamical system subject to adversarial disturbances and loss functions, generalizing earlier settings in the literature. To this end, we develop three techniques, each of independent interest. First, we propose a comparator-adaptive algorithm for online linear optimization with movement cost. Without tuning, it nearly m… ▽ More

    Submitted 21 February, 2022; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: AISTATS 2022

  31. arXiv:2012.13115  [pdf, other

    cs.LG stat.ML

    Upper Confidence Bounds for Combining Stochastic Bandits

    Authors: Ashok Cutkosky, Abhimanyu Das, Manish Purohit

    Abstract: We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a variant of the classic UCB algorithm. Our final regret depends only on the regret of the base algorithm with the best regret in hindsight. This approach provid… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

  32. arXiv:2010.03082  [pdf, ps, other

    cs.LG

    Online Linear Optimization with Many Hints

    Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

    Abstract: We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors. This significantly extends prior work that considered only… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted at Neurips 2020

  33. arXiv:2008.13363  [pdf, other

    cs.LG cs.CV stat.ML

    Extreme Memorization via Scale of Initialization

    Authors: Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

    Abstract: We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD, interpolating from good generalization performance to completely memorizing the training set while making little progress on the test set. Moreover, we find that the extent and manner in which generalization ability is affected depends on the activation and… ▽ More

    Submitted 1 May, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

  34. arXiv:2007.08448  [pdf, ps, other

    cs.LG stat.ML

    Comparator-adaptive Convex Bandits

    Authors: Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

    Abstract: We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: 15 pages

  35. arXiv:2002.04726  [pdf, ps, other

    cs.LG math.OC stat.ML

    Online Learning with Imperfect Hints

    Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

    Abstract: We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint vector is guaranteed to have a positive correlation with the cost vector, then the online player can achieve a regret of $O(\log T)$, thus significantly improving ov… ▽ More

    Submitted 2 October, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: appeared in ICML 2020

  36. arXiv:2002.03963  [pdf, ps, other

    cs.LG stat.ML

    Adaptive Online Learning with Varying Norms

    Authors: Ashok Cutkosky

    Abstract: Given any increasing sequence of norms $\|\cdot\|_0,\dots,\|\cdot\|_{T-1}$, we provide an online convex optimization algorithm that outputs points $w_t$ in some domain $W$ in response to convex losses $\ell_t:W\to \mathbb{R}$ that guarantees regret $R_T(u)=\sum_{t=1}^T \ell_t(w_t)-\ell_t(u)\le \tilde O\left(\|u\|_{T-1}\sqrt{\sum_{t=1}^T \|g_t\|_{t-1,\star}^2}\right)$ where $g_t$ is a subgradient o… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  37. arXiv:2002.03305  [pdf, other

    cs.LG math.OC stat.ML

    Momentum Improves Normalized SGD

    Authors: Ashok Cutkosky, Harsh Mehta

    Abstract: We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $ε$-critical point in $O(1/ε^{3.5})$ iterations, matching the b… ▽ More

    Submitted 16 May, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

  38. arXiv:1905.12721  [pdf, other

    cs.LG math.OC stat.ML

    Matrix-Free Preconditioning in Online Learning

    Authors: Ashok Cutkosky, Tamas Sarlos

    Abstract: We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix. Our regret bound is never worse than that obtained by diagonal preconditioning, and in certain setting even surpasses that of algorithms with full-matrix preconditioning. Importantly, our algorit… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  39. arXiv:1905.10680  [pdf, other

    cs.LG stat.ML

    Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

    Authors: Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona

    Abstract: In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS). We propose a new randomized algorithm that has optimal generalization error bounds with respect to the square loss, closing a long-standing gap between upper and lower bounds. Moreover, we show that our algorithm has faster finite-time and asymptotic rates on problems where the Bayes… ▽ More

    Submitted 25 May, 2019; originally announced May 2019.

  40. arXiv:1905.10018  [pdf, other

    cs.LG math.OC stat.ML

    Momentum-Based Variance Reduction in Non-Convex SGD

    Authors: Ashok Cutkosky, Francesco Orabona

    Abstract: Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-bat… ▽ More

    Submitted 21 April, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Added Ack

  41. arXiv:1903.00974  [pdf, ps, other

    stat.ML cs.LG math.OC

    Anytime Online-to-Batch Conversions, Optimism, and Acceleration

    Authors: Ashok Cutkosky

    Abstract: A standard way to obtain convergence guarantees in stochastic convex optimization is to run an online learning algorithm and then output the average of its iterates: the actual iterates of the online learning algorithm do not come with individual guarantees. We close this gap by introducing a black-box modification to any online learning algorithm whose iterates converge to the optimum in stochast… ▽ More

    Submitted 3 March, 2019; originally announced March 2019.

  42. arXiv:1902.09013  [pdf, ps, other

    stat.ML cs.LG math.OC

    Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning

    Authors: Ashok Cutkosky

    Abstract: We provide algorithms that guarantee regret $R_T(u)\le \tilde O(G\|u\|^3 + G(\|u\|+1)\sqrt{T})$ or $R_T(u)\le \tilde O(G\|u\|^3T^{1/3} + GT^{1/3}+ G\|u\|\sqrt{T})$ for online convex optimization with $G$-Lipschitz losses for any comparison point $u$ without prior knowledge of either $G$ or $\|u\|$. Previous algorithms dispense with the $O(\|u\|^3)$ term at the expense of knowledge of one or both o… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

  43. arXiv:1902.09003  [pdf, ps, other

    stat.ML cs.LG math.OC

    Combining Online Learning Guarantees

    Authors: Ashok Cutkosky

    Abstract: We show how to take any two parameter-free online learning algorithms with different regret guarantees and obtain a single algorithm whose regret is the minimum of the two base algorithms. Our method is embarrassingly simple: just add the iterates. This trick can generate efficient algorithms that adapt to many norms simultaneously, as well as providing diagonal-style algorithms that still maintai… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

  44. arXiv:1901.09068  [pdf, other

    cs.LG math.OC stat.ML

    Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

    Authors: Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona

    Abstract: Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several years, a plethora of adaptive gradient-based algorithms have emerged to ameliorate this problem. They have proved efficient in reducing the labor of tuning in pract… ▽ More

    Submitted 7 June, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

  45. arXiv:1802.06293  [pdf, ps, other

    cs.LG math.OC stat.ML

    Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

    Authors: Ashok Cutkosky, Francesco Orabona

    Abstract: We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce o… ▽ More

    Submitted 25 June, 2018; v1 submitted 17 February, 2018; originally announced February 2018.

    Comments: Appears in Conference on Learning Theory 2018

  46. arXiv:1802.05811  [pdf, other

    stat.ML cs.LG

    Distributed Stochastic Optimization via Adaptive SGD

    Authors: Ashok Cutkosky, Robert Busa-Fekete

    Abstract: Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial method that is surprisingly hard to parallelize. In this paper, we propose an efficient distributed stochastic optimization method by combin… ▽ More

    Submitted 28 October, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: NIPS 2018, 21 Pages

  47. arXiv:1703.02629  [pdf, ps, other

    cs.LG stat.ML

    Online Learning Without Prior Information

    Authors: Ashok Cutkosky, Kwabena Boahen

    Abstract: The vast majority of optimization and online learning algorithms today require some prior information about the data (often in the form of bounds on gradients or on the optimal parameter value). When this information is not available, these algorithms require laborious manual tuning of various hyperparameters, motivating the search for algorithms that can adapt to the data with no prior informatio… ▽ More

    Submitted 5 June, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

    Comments: 12 pages main text; 35 pages total; COLT 2017

  48. arXiv:1703.02622  [pdf, other

    cs.LG stat.ML

    Online Convex Optimization with Unconstrained Domains and Losses

    Authors: Ashok Cutkosky, Kwabena Boahen

    Abstract: We propose an online convex optimization algorithm (RescaledExp) that achieves optimal regret in the unconstrained setting without prior knowledge of any bounds on the loss functions. We prove a lower bound showing an exponential separation between the regret of existing algorithms that require a known bound on the loss functions and any algorithm that does not require such knowledge. RescaledExp… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    Journal ref: Advances in Neural Information Processing Systems 29 (2016) 748-756