Zum Hauptinhalt springen

Showing 101–136 of 136 results for author: Srebro, N

.
  1. arXiv:1510.06002  [pdf, other

    cs.LG

    Fast and Scalable Structural SVM with Slack Rescaling

    Authors: Heejin Choi, Ofer Meshi, Nathan Srebro

    Abstract: We present an efficient method for training slack-rescaled structural SVM. Although finding the most violating label in a margin-rescaled formulation is often easy since the target function decomposes with respect to the structure, this is not the case for a slack-rescaled formulation, and finding the most violated label might be very difficult. Our core contribution is an efficient method for fin… ▽ More

    Submitted 27 October, 2015; v1 submitted 20 October, 2015; originally announced October 2015.

  2. arXiv:1510.02054  [pdf, ps, other

    cs.LG

    Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

    Authors: Weiran Wang, Raman Arora, Karen Livescu, Nathan Srebro

    Abstract: Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains. However, stochastic optimization of the deep CCA objective is not straightforward, because it does not decouple over training examples. Previous optimizers for deep CCA are either batch-based algori… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Comments: in 2015 Annual Allerton Conference on Communication, Control and Computing

  3. arXiv:1510.00633  [pdf, other

    stat.ML cs.LG

    Distributed Multitask Learning

    Authors: Jialei Wang, Mladen Kolar, Nathan Srebro

    Abstract: We consider the problem of distributed multi-task learning, where each machine learns a separate, but related, task. Specifically, each machine learns a linear predictor in high-dimensional space,where all tasks share the same small support. We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.

    Submitted 2 October, 2015; originally announced October 2015.

  4. arXiv:1508.02479  [pdf, other

    cs.LG

    Normalized Hierarchical SVM

    Authors: Heejin Choi, Yutaka Sasaki, Nathan Srebro

    Abstract: We present improved methods of using structured SVMs in a large-scale hierarchical classification problem, that is when labels are leaves, or sets of leaves, in a tree or a DAG. We examine the need to normalize both the regularization and the margin and show how doing so significantly improves performance, including allowing achieving state-of-the-art results where unnormalized structured SVMs do… ▽ More

    Submitted 4 March, 2016; v1 submitted 10 August, 2015; originally announced August 2015.

  5. arXiv:1507.08322  [pdf, ps, other

    cs.LG math.OC

    Distributed Mini-Batch SDCA

    Authors: Martin Takáč, Peter Richtárik, Nathan Srebro

    Abstract: We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i.e. SVM and SVM-type objectives). Our analysis allows for flexible sampling schemes, including where data is distribute across machines, and combines a dependence on the smoothness of the loss and/or the data spread (measured through the spectral norm).

    Submitted 29 July, 2015; originally announced July 2015.

  6. arXiv:1506.02617  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Path-SGD: Path-Normalized Optimization in Deep Neural Networks

    Authors: Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro

    Abstract: We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD… ▽ More

    Submitted 8 June, 2015; originally announced June 2015.

    Comments: 12 pages, 5 figures

  7. arXiv:1503.00036  [pdf, ps, other

    cs.LG cs.AI cs.NE stat.ML

    Norm-Based Capacity Control in Neural Networks

    Authors: Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

    Abstract: We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

    Submitted 14 April, 2015; v1 submitted 27 February, 2015; originally announced March 2015.

    Comments: 29 pages

  8. arXiv:1412.6614  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

    Authors: Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

    Abstract: We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

    Submitted 16 April, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: 9 pages, 2 figures

  9. arXiv:1410.5518  [pdf, ps, other

    stat.ML cs.DS cs.IR cs.LG

    On Symmetric and Asymmetric LSHs for Inner Product Search

    Authors: Behnam Neyshabur, Nathan Srebro

    Abstract: We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context. Shrivastava and Li argue that there is no symmetric LSH for the problem and propose an asymmetric LSH based on different mappings for query and database points. However, we show there does exist a simple symmetric LSH that enjoys stronger guarant… ▽ More

    Submitted 8 June, 2015; v1 submitted 20 October, 2014; originally announced October 2014.

    Comments: 11 pages, 3 figures, In Proceedings of The 32nd International Conference on Machine Learning (ICML)

  10. arXiv:1405.3167  [pdf, ps, other

    cs.LG

    Clustering, Hamming Embedding, Generalized LSH and the Max Norm

    Authors: Behnam Neyshabur, Yury Makarychev, Nathan Srebro

    Abstract: We study the convex relaxation of clustering and hamming embedding, focusing on the asymmetric case (co-clustering and asymmetric hamming embedding), understanding their relationship to LSH as studied by (Charikar 2002) and to the max-norm ball, and the differences between their symmetric and asymmetric versions.

    Submitted 13 May, 2014; originally announced May 2014.

    Comments: 17 pages

  11. arXiv:1312.7853  [pdf, other

    cs.LG math.OC stat.ML

    Communication Efficient Distributed Optimization using an Approximate Newton-type Method

    Authors: Ohad Shamir, Nathan Srebro, Tong Zhang

    Abstract: We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which provably \emph{improves} with the data size, requiring an essentially constant number of iterations under reasonable assumptions. We provide theoretical and empirical e… ▽ More

    Submitted 13 May, 2014; v1 submitted 30 December, 2013; originally announced December 2013.

  12. arXiv:1311.7662  [pdf, other

    cs.LG cs.CV cs.IR

    The Power of Asymmetry in Binary Hashing

    Authors: Behnam Neyshabur, Payman Yadollahpour, Yury Makarychev, Ruslan Salakhutdinov, Nathan Srebro

    Abstract: When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between $x$ and $x'$ as the hamming distance between $f(x)$ and $g(x')$, for two distinct binary codes $f,g$, rather than as the hamming dista… ▽ More

    Submitted 29 November, 2013; originally announced November 2013.

    Comments: Accepted to NIPS 2013, 9 pages, 5 figures

  13. arXiv:1310.5715  [pdf, ps, other

    math.NA cs.CV cs.LG math.OC stat.ML

    Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

    Authors: Deanna Needell, Nathan Srebro, Rachel Ward

    Abstract: We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L/μ)^2$ (where $L$ is a bound on the smoothness and $μ$ on the strong convexity) to a linear dependence on $L/μ$. Furthermore, we show how reweighting the sampling distribution (i.e. importance… ▽ More

    Submitted 16 January, 2015; v1 submitted 21 October, 2013; originally announced October 2013.

    Comments: 22 pages, 6 figures

    MSC Class: 65B99; 52A99; 60G99; 62L20

  14. arXiv:1307.1674  [pdf, other

    stat.ML cs.LG

    Stochastic Optimization of PCA with Capped MSG

    Authors: Raman Arora, Andrew Cotter, Nathan Srebro

    Abstract: We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as "Matrix Stochastic Gradient" (MSG), as well as a practical variant, Capped MSG. We study the method both theoretically and empirically.

    Submitted 5 July, 2013; originally announced July 2013.

  15. arXiv:1306.2347  [pdf, other

    cs.LG

    Auditing: Active Learning with Outcome-Dependent Query Costs

    Authors: Sivan Sabato, Anand D. Sarwate, Nathan Srebro

    Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting audit… ▽ More

    Submitted 12 July, 2015; v1 submitted 10 June, 2013; originally announced June 2013.

    Comments: Corrections in section 5

    Journal ref: Neural Information Processing Systems 26 (NIPS), 512-520, 2013

  16. arXiv:1303.2314  [pdf, ps, other

    cs.LG math.OC

    Mini-Batch Primal and Dual Methods for SVMs

    Authors: Martin Takáč, Avleen Bijral, Peter Richtárik, Nathan Srebro

    Abstract: We address the issue of using mini-batches in stochastic optimization of SVMs. We show that the same quantity, the spectral norm of the data, controls the parallelization speedup obtained for both primal stochastic subgradient descent (SGD) and stochastic dual coordinate ascent (SCDA) methods and use it to derive novel variants of mini-batched SDCA. Our guarantees for both methods are expressed in… ▽ More

    Submitted 10 March, 2013; originally announced March 2013.

  17. arXiv:1301.2311  [pdf

    cs.LG cs.AI stat.ML

    Maximum Likelihood Bounded Tree-Width Markov Networks

    Authors: Nathan Srebro

    Abstract: Chow and Liu (1968) studied the problem of learning a maximumlikelihood Markov tree. We generalize their work to more complexMarkov networks by considering the problem of learning a maximumlikelihood Markov network of bounded complexity. We discuss howtree-width is in many ways the appropriate measure of complexity andthus analyze the problem of learning a maximum likelihood Markovnetwork of bound… ▽ More

    Submitted 10 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

    Report number: UAI-P-2001-PG-504-511

  18. arXiv:1212.3276  [pdf, ps, other

    stat.ML cs.LG

    Learning Sparse Low-Threshold Linear Classifiers

    Authors: Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang

    Abstract: We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss. This problem generalizes the problem of learning a $k$-monotone disjunction. We prove that we can learn efficiently in this setting, at a rate which is linear in both $k$ and the size of the threshold, and that this is the best possible rate. We provide… ▽ More

    Submitted 18 April, 2016; v1 submitted 13 December, 2012; originally announced December 2012.

    Journal ref: Journal of Machine Learning Research, 16(Jul):1275-1304, 2015

  19. arXiv:1210.5196  [pdf, other

    stat.ML cs.LG

    Matrix reconstruction with the local max norm

    Authors: Rina Foygel, Nathan Srebro, Ruslan Salakhutdinov

    Abstract: We introduce a new family of matrix norms, the "local max" norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction problems. We show that this new family can be used to interpolate between the (weighted or unweighted) trac… ▽ More

    Submitted 18 October, 2012; originally announced October 2012.

  20. arXiv:1206.6442  [pdf

    cs.LG stat.ML

    Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss

    Authors: Shai Ben-David, David Loker, Nathan Srebro, Karthik Sridharan

    Abstract: We carefully study how well minimizing convex surrogate loss functions, corresponds to minimizing the misclassification error rate for the problem of binary classification with linear predictors. In particular, we show that amongst all convex surrogate losses, the hinge loss gives essentially the best possible bound, of all convex loss functions, for the misclassification error rate of the resulti… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  21. arXiv:1206.3240  [pdf

    cs.DS cs.AI

    Complexity of Inference in Graphical Models

    Authors: Venkat Chandrasekaran, Nathan Srebro, Prahladh Harsha

    Abstract: It is well-known that inference in graphical models is hard in the worst case, but tractable for models with bounded treewidth. We ask whether treewidth is the only structural criterion of the underlying graph that enables tractable inference. In other words, is there some class of structures with unbounded treewidth in which inference is tractable? Subject to a combinatorial hypothesis due to Rob… ▽ More

    Submitted 13 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

    Report number: UAI-P-2008-PG-70-78

  22. arXiv:1206.2372  [pdf, other

    math.OC cs.LG

    PRISMA: PRoximal Iterative SMoothing Algorithm

    Authors: Francesco Orabona, Andreas Argyriou, Nathan Srebro

    Abstract: Motivated by learning problems including max-norm regularized matrix completion and clustering, robust PCA and sparse inverse covariance selection, we propose a novel optimization algorithm for minimizing a convex objective which decomposes into three parts: a smooth part, a simple non-smooth Lipschitz part, and a simple non-smooth non-Lipschitz part. We use a time variant smoothing strategy that… ▽ More

    Submitted 18 November, 2012; v1 submitted 11 June, 2012; originally announced June 2012.

  23. arXiv:1204.5043  [pdf, ps, other

    stat.ML cs.LG

    Sparse Prediction with the $k$-Support Norm

    Authors: Andreas Argyriou, Rina Foygel, Nathan Srebro

    Abstract: We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an $\ell_2$ penalty. We show that this new {\em $k$-support norm} provides a tighter relaxation than the elastic net and is thus a good replacement for the Lasso or the elastic net in sparse prediction problems. Through the study of the $k$-support norm, we also bound the looseness of the elastic ne… ▽ More

    Submitted 12 June, 2012; v1 submitted 23 April, 2012; originally announced April 2012.

  24. arXiv:1204.1276  [pdf, ps, other

    stat.ML cs.LG

    Distribution-Dependent Sample Complexity of Large Margin Learning

    Authors: Sivan Sabato, Nathan Srebro, Naftali Tishby

    Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the margin-adapted dimension, which is a simple function of the second order statistics of the data distribution, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the margin-adapted dimension of the dat… ▽ More

    Submitted 18 September, 2013; v1 submitted 5 April, 2012; originally announced April 2012.

    Comments: arXiv admin note: text overlap with arXiv:1011.5053

    Journal ref: S. Sabato, N. Srebro and N. Tishby, "Distribution-Dependent Sample Complexity of Large Margin Learning", Journal of Machine Learning Research, 14(Jul):2119-2149, 2013

  25. arXiv:1204.0566  [pdf, ps, other

    cs.LG

    The Kernelized Stochastic Batch Perceptron

    Authors: Andrew Cotter, Shai Shalev-Shwartz, Nathan Srebro

    Abstract: We present a novel approach for training kernel Support Vector Machines, establish learning runtime guarantees for our method that are better then those of any other known kernelized SVM optimization approach, and show that our method works well in practice compared to existing alternatives.

    Submitted 21 June, 2012; v1 submitted 2 April, 2012; originally announced April 2012.

  26. arXiv:1202.5598  [pdf, other

    cs.LG stat.ML

    Clustering using Max-norm Constrained Optimization

    Authors: Ali Jalali, Nathan Srebro

    Abstract: We suggest using the max-norm as a convex surrogate constraint for clustering. We show how this yields a better exact cluster recovery guarantee than previously suggested nuclear-norm relaxation, and study the effectiveness of our method, and other related convex relaxations, compared to other clustering approaches.

    Submitted 13 April, 2012; v1 submitted 24 February, 2012; originally announced February 2012.

  27. arXiv:1202.3702  [pdf

    cs.LG stat.ML

    Semi-supervised Learning with Density Based Distances

    Authors: Avleen S. Bijral, Nathan Ratliff, Nathan Srebro

    Abstract: We present a simple, yet effective, approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distance-based supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Report number: UAI-P-2011-PG-43-50

  28. arXiv:1109.4603  [pdf, other

    cs.AI

    Explicit Approximations of the Gaussian Kernel

    Authors: Andrew Cotter, Joseph Keshet, Nathan Srebro

    Abstract: We investigate training and using Gaussian kernel SVMs by approximating the kernel with an explicit finite- dimensional polynomial feature representation based on the Taylor expansion of the exponential. Although not as efficient as the recently-proposed random Fourier features [Rahimi and Recht, 2007] in terms of the number of features, we show how this polynomial representation can provide a bet… ▽ More

    Submitted 21 September, 2011; originally announced September 2011.

    Comments: 11 pages, 2 tables, 2 figures

  29. arXiv:1108.0373  [pdf, ps, other

    math.ST

    Fast-rate and optimistic-rate error bounds for L1-regularized regression

    Authors: Rina Foygel, Nathan Srebro

    Abstract: We consider the prediction error of linear regression with L1 regularization when the number of covariates p is large relative to the sample size n. When the model is k-sparse and well-specified, and restricted isometry or similar conditions hold, the excess squared-error in prediction can be bounded on the order of sigma^2*(k*log(p)/n), where sigma^2 is the noise variance. Although these conditio… ▽ More

    Submitted 1 August, 2011; originally announced August 2011.

  30. arXiv:1107.4080  [pdf, other

    cs.LG

    On the Universality of Online Mirror Descent

    Authors: Nathan Srebro, Karthik Sridharan, Ambuj Tewari

    Abstract: We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee.

    Submitted 20 July, 2011; originally announced July 2011.

  31. arXiv:1106.4574  [pdf, other

    cs.LG

    Better Mini-Batch Algorithms via Accelerated Gradient Methods

    Authors: Andrew Cotter, Ohad Shamir, Nathan Srebro, Karthik Sridharan

    Abstract: Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and propose a novel accelerated gradient algorithm, which deals with this deficien… ▽ More

    Submitted 22 June, 2011; originally announced June 2011.

  32. arXiv:1106.4251  [pdf, other

    cs.LG stat.ML

    Learning with the Weighted Trace-norm under Arbitrary Sampling Distributions

    Authors: Rina Foygel, Ruslan Salakhutdinov, Ohad Shamir, Nathan Srebro

    Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted trace-norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it… ▽ More

    Submitted 21 June, 2011; originally announced June 2011.

  33. arXiv:1102.3923  [pdf, ps, other

    cs.LG stat.ML

    Concentration-Based Guarantees for Low-Rank Matrix Reconstruction

    Authors: Rina Foygel, Nathan Srebro

    Abstract: We consider the problem of approximately reconstructing a partially-observed, approximately low-rank matrix. This problem has received much attention lately, mostly using the trace-norm as a surrogate to the rank. Here we study low-rank matrix reconstruction using both the trace-norm, as well as the less-studied max-norm, and present reconstruction guarantees based on existing analysis on the Rade… ▽ More

    Submitted 26 May, 2011; v1 submitted 18 February, 2011; originally announced February 2011.

  34. arXiv:1011.5053  [pdf, ps, other

    cs.LG math.PR math.ST stat.ML

    Tight Sample Complexity of Large-Margin Learning

    Authors: Sivan Sabato, Nathan Srebro, Naftali Tishby

    Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L_2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution's covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distrib… ▽ More

    Submitted 5 April, 2012; v1 submitted 23 November, 2010; originally announced November 2010.

    Comments: Appearing in Neural Information Processing Systems (NIPS) 2010; This is the full version, including appendix with proofs; Also with some corrections

    Journal ref: Advances in Neural Information Processing Systems 23 (NIPS), 2038-2046, 2010

  35. arXiv:1009.3896  [pdf, ps, other

    cs.LG

    Optimistic Rates for Learning with a Smooth Loss

    Authors: Nathan Srebro, Karthik Sridharan, Ambuj Tewari

    Abstract: We establish an excess risk bound of O(H R_n^2 + R_n \sqrt{H L*}) for empirical risk minimization with an H-smooth loss function and a hypothesis class with Rademacher complexity R_n, where L* is the best risk achievable by the hypothesis class. For typical hypothesis classes where R_n = \sqrt{R/n}, this translates to a learning rate of O(RH/n) in the separable (L*=0) case and O(RH/n + \sqrt{L^* R… ▽ More

    Submitted 26 November, 2012; v1 submitted 20 September, 2010; originally announced September 2010.

  36. arXiv:1002.2780  [pdf, ps, other

    cs.LG

    Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

    Authors: Ruslan Salakhutdinov, Nathan Srebro

    Abstract: We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly. We introduce a weighted version of the trace-norm regularizer that works well also with non-uniform sampling. Our experimental results demonstrate that the weighted trace-norm regularization indeed yields significant gains on the (highly non-uniformly samp… ▽ More

    Submitted 14 February, 2010; originally announced February 2010.

    Comments: 9 pages