Zum Hauptinhalt springen

Showing 51–100 of 136 results for author: Srebro, N

.
  1. arXiv:2002.07839  [pdf, other

    cs.LG math.OC stat.ML

    Is Local SGD Better than Minibatch SGD?

    Authors: Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

    Abstract: We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibat… ▽ More

    Submitted 20 July, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: 29 pages

  2. arXiv:1912.02365  [pdf, other

    math.OC cs.IT cs.LG stat.ML

    Lower Bounds for Non-Convex Stochastic Optimization

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

    Abstract: We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an… ▽ More

    Submitted 27 February, 2022; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Correction to hard instance dimensions in Theorem 3

  3. arXiv:1910.01635  [pdf, other

    cs.LG stat.ML

    A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

    Authors: Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro

    Abstract: A key element of understanding the efficacy of overparameterized neural networks is characterizing how they represent functions as the number of weights in the network approaches infinity. In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where th… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

  4. arXiv:1907.00762  [pdf, other

    cs.LG math.OC stat.ML

    Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: 9 pages

  5. arXiv:1906.09231  [pdf, other

    cs.LG math.ST stat.ML

    Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

    Authors: Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

    Abstract: We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many setti… ▽ More

    Submitted 9 March, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: Accepted to appear in the proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

  6. arXiv:1906.05827   

    cs.LG stat.ML

    Kernel and Rich Regimes in Overparametrized Models

    Authors: Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

    Abstract: A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich… ▽ More

    Submitted 25 February, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: This paper has been substantially modified, updated, and expanded with additional content (arXiv:2002.09277). To avoid confusion with already existing citations, we are withdrawing the old version of this article

  7. arXiv:1905.07325  [pdf, ps, other

    stat.ML cs.LG

    Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

    Authors: Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

    Abstract: With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models. To this end we study the limit of loss minimization with a diverging norm… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Comments: ICML Camera ready version

  8. arXiv:1904.10120  [pdf, other

    cs.LG stat.ML

    Semi-Cyclic Stochastic Gradient Descent

    Authors: Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar

    Abstract: We consider convex SGD updates with a block-cyclic structure, i.e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in Federated Learning where the mobile devices available for updates at different times during the day have different characteristics. We show that such block-cyclic str… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

  9. arXiv:1902.05040  [pdf, other

    cs.LG stat.ML

    How do infinite width bounded norm networks look in function space?

    Authors: Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro

    Abstract: We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function. For functions… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

  10. arXiv:1902.04686  [pdf, ps, other

    cs.LG math.OC stat.ML

    The Complexity of Making the Gradient Small in Stochastic Convex Optimization

    Authors: Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

    Abstract: We give nearly matching upper and lower bounds on the oracle complexity of finding $ε$-stationary points ($\| \nabla F(x) \| \leqε$) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimizatio… ▽ More

    Submitted 14 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

  11. arXiv:1902.04217  [pdf, ps, other

    cs.LG stat.ML

    VC Classes are Adversarially Robustly Learnable, but Only Improperly

    Authors: Omar Montasser, Steve Hanneke, Nathan Srebro

    Abstract: We study the question of learning an adversarially robust predictor. We show that any hypothesis class $\mathcal{H}$ with finite VC dimension is robustly PAC learnable with an improper learning rule. The requirement of being improper is necessary as we exhibit examples of hypothesis classes $\mathcal{H}$ with finite VC dimension that are not robustly PAC learnable with any proper learning rule.

    Submitted 3 July, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: COLT 2019 Camera Ready

  12. arXiv:1901.09659  [pdf

    cs.SI cs.CY stat.AP

    Simple Surveys: Response Retrieval Inspired by Recommendation Systems

    Authors: Nandana Sengupta, Nati Srebro, James Evans

    Abstract: In the last decade, the use of simple rating and comparison surveys has proliferated on social and digital media platforms to fuel recommendations. These simple surveys and their extrapolation with machine learning algorithms shed light on user preferences over large and growing pools of items, such as movies, songs and ads. Social scientists have a long history of measuring perceptions, preferenc… ▽ More

    Submitted 11 December, 2018; originally announced January 2019.

  13. arXiv:1812.02952  [pdf, other

    cs.LG cs.CY stat.ML

    From Fair Decision Making to Social Equality

    Authors: Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro

    Abstract: The study of fairness in intelligent decision systems has mostly ignored long-term influence on the underlying population. Yet fairness considerations (e.g. affirmative action) have often the implicit goal of achieving balance among groups within the population. The most basic notion of balance is eventual equality between the qualifications of the groups. How can we incorporate influence dynamics… ▽ More

    Submitted 27 February, 2020; v1 submitted 7 December, 2018; originally announced December 2018.

    Comments: Short version appears in the proceedings of ACM FAT* 2019

    ACM Class: K.4

  14. arXiv:1810.11829  [pdf, ps, other

    cs.LG cs.DS stat.ML

    On preserving non-discrimination when combining expert advice

    Authors: Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nathan Srebro

    Abstract: We study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions. We consider the most basic extension of classical online learning: "Given a class of predictors that are individually non-discriminatory with respect to a particular metric, how can we combine them to perform as we… ▽ More

    Submitted 29 March, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

    Comments: Appeared in NIPS 2018

  15. arXiv:1807.00028  [pdf, other

    cs.LG stat.ML

    Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

    Authors: Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

    Abstract: Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals. We study the generalization performance for such constrained optimization problems, in terms of how well the constraints are satisfied at evaluation time, given that they are satisfied at training time. To improve generalization performa… ▽ More

    Submitted 28 September, 2018; v1 submitted 29 June, 2018; originally announced July 2018.

  16. arXiv:1806.10188  [pdf, ps, other

    math.OC cs.LG stat.ML

    A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

    Authors: Yossi Arjevani, Ohad Shamir, Nathan Srebro

    Abstract: We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $τ$ rounds ago. First, we show that without stochastic noise, delays strongly affect the attainable optimization error: In fact, the error can be as bad as non-delayed gradient descent ran on only $1/τ$ of the gradient… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

  17. arXiv:1806.01796  [pdf, other

    stat.ML cs.LG

    Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

    Authors: Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry

    Abstract: Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do no… ▽ More

    Submitted 18 April, 2022; v1 submitted 5 June, 2018; originally announced June 2018.

    Comments: Fixed a typo (Eq. (4) - missing σ_{max}^2 term in the denominator)

  18. arXiv:1806.00468  [pdf, other

    cs.LG stat.ML

    Implicit Bias of Gradient Descent on Linear Convolutional Networks

    Authors: Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

    Abstract: We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.

    Submitted 10 January, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

  19. arXiv:1805.12076  [pdf, other

    cs.LG stat.ML

    Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

    Authors: Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro

    Abstract: Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization. In this work we suggest a novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: 19 pages, 8 figures

  20. arXiv:1805.10222  [pdf, other

    math.OC cs.LG stat.ML

    Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

    Authors: Blake Woodworth, Jialei Wang, Adam Smith, Brendan McMahan, Nathan Srebro

    Abstract: We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight ga… ▽ More

    Submitted 11 February, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

  21. arXiv:1803.04307  [pdf, ps, other

    cs.LG

    The Everlasting Database: Statistical Validity at a Fair Price

    Authors: Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro

    Abstract: The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional s… ▽ More

    Submitted 2 April, 2019; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: 22 pages, accepted to NeurIPS 2018

  22. arXiv:1803.01905  [pdf, other

    stat.ML cs.LG

    Convergence of Gradient Descent on Separable Data

    Authors: Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

    Abstract: We provide a detailed study on the implicit bias of gradient descent when optimizing loss functions with strictly monotone tails, such as the logistic loss, over separable datasets. We look at two basic questions: (a) what are the conditions on the tail of the loss function under which gradient descent converges in the direction of the $L_2$ maximum-margin separator? (b) how does the rate of margi… ▽ More

    Submitted 24 March, 2019; v1 submitted 5 March, 2018; originally announced March 2018.

    Comments: AISTATS Camera ready version

  23. arXiv:1802.08246  [pdf, other

    stat.ML cs.LG

    Characterizing Implicit Bias in Terms of Optimization Geometry

    Authors: Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

    Abstract: We study the implicit bias of generic optimization methods, such as mirror descent, natural gradient descent, and steepest descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems. We explore the question of whether the specific global minimum (among the many possible global minima) reached by an algorithm… ▽ More

    Submitted 22 June, 2020; v1 submitted 22 February, 2018; originally announced February 2018.

    Comments: (1) A bug in the proof of implicit bias for matrix factorization was fixed. v2 gives a characterization of the asymptotic bias of the factor matrices, while v1 made a stronger claim on the limit direction of the unfactored matrix. (2) v2 also includes new results on implicit bias of mirror descent with realizable affine constraints

  24. arXiv:1802.03830  [pdf, other

    stat.ML cs.LG

    Distributed Stochastic Multi-Task Learning with Graph Regularization

    Authors: Weiran Wang, Jialei Wang, Mladen Kolar, Nathan Srebro

    Abstract: We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines. Uniform averaging or diminishing stepsize in these methods would yield consensus (single task) learning. We show how simply skewing the averaging weights or controlling the stepsize allows learning different, but related, tasks on the different machines.

    Submitted 11 February, 2018; originally announced February 2018.

  25. arXiv:1711.05305  [pdf, ps, other

    math.OC cs.LG

    An Accelerated Communication-Efficient Primal-Dual Optimization Framework for Structured Machine Learning

    Authors: Chenxin Ma, Martin Jaggi, Frank E. Curtis, Nathan Srebro, Martin Takáč

    Abstract: Distributed optimization algorithms are essential for training machine learning models on very large-scale datasets. However, they often suffer from communication bottlenecks. Confronting this issue, a communication-efficient primal-dual coordinate ascent framework (CoCoA) and its improved variant CoCoA+ have been proposed, achieving a convergence rate of $\mathcal{O}(1/t)$ for solving empirical r… ▽ More

    Submitted 14 November, 2017; originally announced November 2017.

  26. arXiv:1710.10345  [pdf, ps, other

    stat.ML cs.LG

    The Implicit Bias of Gradient Descent on Separable Data

    Authors: Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

    Abstract: We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a d… ▽ More

    Submitted 16 April, 2024; v1 submitted 27 October, 2017; originally announced October 2017.

    Comments: Change from v5: clarified the derivation between eqs. (41) and (42)

  27. arXiv:1709.08728  [pdf, ps, other

    cs.LG

    Stochastic Nonconvex Optimization with Large Minibatches

    Authors: Weiran Wang, Nathan Srebro

    Abstract: We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large minibatches of samples, using only first-order gradient information. Our algorithms provably converge to an approximate critical point of the expected objective w… ▽ More

    Submitted 8 March, 2019; v1 submitted 25 September, 2017; originally announced September 2017.

    Comments: Accepted by the ALT 2019

  28. arXiv:1709.03594  [pdf, ps, other

    math.OC

    Lower Bound for Randomized First Order Convex Optimization

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We provide an explicit construction and direct proof for the lower bound on the number of first order oracle accesses required for a randomized algorithm to minimize a convex Lipschitz function.

    Submitted 3 November, 2017; v1 submitted 11 September, 2017; originally announced September 2017.

    Comments: 8 pages

  29. arXiv:1707.09564  [pdf, ps, other

    cs.LG

    A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

    Authors: Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro

    Abstract: We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights. The generalization bound is derived using a PAC-Bayes analysis.

    Submitted 23 February, 2018; v1 submitted 29 July, 2017; originally announced July 2017.

    Comments: Accepted to ICLR 2018

  30. arXiv:1706.08947  [pdf, other

    cs.LG

    Exploring Generalization in Deep Learning

    Authors: Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro

    Abstract: With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures expl… ▽ More

    Submitted 6 July, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

    Comments: 19 pages, 8 figures

  31. arXiv:1705.09280  [pdf, other

    stat.ML cs.LG

    Implicit Regularization in Matrix Factorization

    Authors: Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

    Abstract: We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

    Submitted 25 May, 2017; originally announced May 2017.

  32. arXiv:1705.08292  [pdf, other

    stat.ML cs.LG

    The Marginal Value of Adaptive Gradient Methods in Machine Learning

    Authors: Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht

    Abstract: Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient desc… ▽ More

    Submitted 21 May, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

  33. arXiv:1705.03071  [pdf, other

    cs.LG

    Geometry of Optimization and Implicit Regularization in Deep Learning

    Authors: Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

    Abstract: We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not… ▽ More

    Submitted 8 May, 2017; originally announced May 2017.

    Comments: This survey chapter was done as a part of Intel Collaborative Research institute for Computational Intelligence (ICRI-CI) "Why & When Deep Learning works -- looking inside Deep Learning" compendium with the generous support of ICRI-CI. arXiv admin note: substantial text overlap with arXiv:1506.02617

  34. arXiv:1702.08169  [pdf, ps, other

    cs.LG

    Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

    Authors: Dan Garber, Ohad Shamir, Nathan Srebro

    Abstract: We study the fundamental problem of Principal Component Analysis in a statistical distributed setting in which each machine out of $m$ stores a sample of $n$ points sampled i.i.d. from a single unknown distribution. We study algorithms for estimating the leading principal component of the population covariance matrix that are both communication-efficient and achieve estimation error of the order o… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

  35. arXiv:1702.07834  [pdf, ps, other

    math.NA cs.LG stat.ML

    Efficient coordinate-wise leading eigenvector computation

    Authors: Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro

    Abstract: We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithm… ▽ More

    Submitted 25 February, 2017; originally announced February 2017.

  36. arXiv:1702.06818  [pdf, other

    cs.LG stat.ML

    Stochastic Approximation for Canonical Correlation Analysis

    Authors: Raman Arora, Teodor V. Marinov, Poorya Mianjy, Nathan Srebro

    Abstract: We propose novel first-order stochastic approximation algorithms for canonical correlation analysis (CCA). Algorithms presented are instances of inexact matrix stochastic gradient (MSG) and inexact matrix exponentiated gradient (MEG), and achieve $ε$-suboptimality in the population objective in $\operatorname{poly}(\frac{1}ε)$ iterations. We also consider practical variants of the proposed algorit… ▽ More

    Submitted 26 February, 2018; v1 submitted 22 February, 2017; originally announced February 2017.

  37. arXiv:1702.06533  [pdf, ps, other

    cs.LG stat.ML

    Stochastic Canonical Correlation Analysis

    Authors: Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang

    Abstract: We study the sample complexity of canonical correlation analysis (CCA), \ie, the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve $ε$-suboptimality in a properly defined measure of alignment between the estimated canonical directions and the popula… ▽ More

    Submitted 21 October, 2019; v1 submitted 20 February, 2017; originally announced February 2017.

    Comments: Accepted by JMLR

  38. arXiv:1702.06269  [pdf, ps, other

    cs.LG

    Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch-Prox

    Authors: Jialei Wang, Weiran Wang, Nathan Srebro

    Abstract: We present and analyze an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors). Our approach allows a communication-memory tradeoff, with either logarithmic communication but linear memory, or polynomial communication and a corresponding polynomial reduction in required memory. This communication-memory tradeo… ▽ More

    Submitted 9 June, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

  39. arXiv:1702.06081  [pdf, other

    cs.LG

    Learning Non-Discriminatory Predictors

    Authors: Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro

    Abstract: We consider learning a predictor which is non-discriminatory with respect to a "protected attribute" according to the notion of "equalized odds" proposed by Hardt et al. [2016]. We study the problem of learning such a non-discriminatory predictor from a finite training set, both statistically and computationally. We show that a post-hoc correction approach, as suggested by Hardt et al, can be high… ▽ More

    Submitted 1 November, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

    Comments: 28 pages

  40. arXiv:1610.03045  [pdf, other

    cs.LG math.OC stat.ML

    Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

    Authors: Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro

    Abstract: Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data. In this paper, we study sketching from an optimization point of view: we first show that the iterative Hessian sketch is an optimization process with preconditioning, and develop accelerate… ▽ More

    Submitted 10 October, 2016; originally announced October 2016.

  41. arXiv:1610.02413  [pdf, other

    cs.LG

    Equality of Opportunity in Supervised Learning

    Authors: Moritz Hardt, Eric Price, Nathan Srebro

    Abstract: We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework… ▽ More

    Submitted 7 October, 2016; originally announced October 2016.

  42. arXiv:1605.08003  [pdf, ps, other

    math.OC cs.LG stat.ML

    Tight Complexity Bounds for Optimizing Composite Objectives

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We provide tight upper and lower bounds on the complexity of minimizing the average of $m$ convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic… ▽ More

    Submitted 4 April, 2019; v1 submitted 25 May, 2016; originally announced May 2016.

  43. arXiv:1605.07991  [pdf, other

    stat.ML cs.LG

    Efficient Distributed Learning with Sparsity

    Authors: Jialei Wang, Mladen Kolar, Nathan Srebro, Tong Zhang

    Abstract: We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a shifted ell_1 regularized M-estimation problem, and other workers to compute the gradient. In respect of communication, the proposed approach provably matches the… ▽ More

    Submitted 25 May, 2016; originally announced May 2016.

  44. arXiv:1605.07221  [pdf, other

    stat.ML cs.LG math.OC

    Global Optimality of Local Search for Low Rank Matrix Recovery

    Authors: Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

    Abstract: We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergence guarantee for stochastic gradient descent {\em from random… ▽ More

    Submitted 26 May, 2016; v1 submitted 23 May, 2016; originally announced May 2016.

    Comments: 21 pages, 3 figures

  45. arXiv:1605.07154  [pdf, other

    cs.LG cs.NE

    Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

    Authors: Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro

    Abstract: We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, e… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

    Comments: 15 pages

  46. arXiv:1604.01870  [pdf, ps, other

    cs.LG

    Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis

    Authors: Weiran Wang, Jialei Wang, Dan Garber, Nathan Srebro

    Abstract: We study the stochastic optimization of canonical correlation analysis (CCA), whose objective is nonconvex and does not decouple over training samples. Although several stochastic gradient based optimization algorithms have been recently proposed to solve this problem, no global convergence guarantee was provided by any of them. Inspired by the alternating least squares/power iterations formulatio… ▽ More

    Submitted 14 November, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

    Comments: Accepted by NIPS 2016

  47. arXiv:1603.04379  [pdf, other

    math.OC

    On Data Dependence in Distributed Stochastic Optimization

    Authors: Avleen S. Bijral, Anand D. Sarwate, Nathan Srebro

    Abstract: We study a distributed consensus-based stochastic gradient descent (SGD) algorithm and show that the rate of convergence involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network topology and a new term depending on the spectral norm of the sample covariance matrix of the data. This data-dependent convergence rate shows that distributed SGD alg… ▽ More

    Submitted 31 August, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

  48. arXiv:1603.02185  [pdf, ps, other

    cs.LG stat.ML

    Distributed Multi-Task Learning with Shared Representation

    Authors: Jialei Wang, Mladen Kolar, Nathan Srebro

    Abstract: We study the problem of distributed multi-task learning with shared representation, where each machine aims to learn a separate, but related, task in an unknown shared low-dimensional subspaces, i.e. when the predictor matrix has low rank. We consider a setting where each task is handled by a different machine, with samples for the task available locally on the machine, and study communication-eff… ▽ More

    Submitted 7 March, 2016; originally announced March 2016.

  49. arXiv:1602.02136  [pdf, ps, other

    cs.LG stat.ML

    Reducing Runtime by Recycling Samples

    Authors: Jialei Wang, Hai Wang, Nathan Srebro

    Abstract: Contrary to the situation with stochastic gradient descent, we argue that when using stochastic methods with variance reduction, such as SDCA, SAG or SVRG, as well as their variants, it could be beneficial to reuse previously used samples instead of fresh samples, even when fresh samples are available. We demonstrate this empirically for SDCA, SAG and SVRG, studying the optimal sample size one sho… ▽ More

    Submitted 5 February, 2016; originally announced February 2016.

  50. arXiv:1511.06747  [pdf, other

    cs.LG

    Data-Dependent Path Normalization in Neural Networks

    Authors: Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

    Abstract: We propose a unified framework for neural net normalization, regularization and optimization, which includes Path-SGD and Batch-Normalization and interpolates between them across two different dimensions. Through this framework we investigate issue of invariance of the optimization, data dependence and the connection with natural gradients.

    Submitted 19 January, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: 17 pages, 3 figures