Skip to main content

Showing 1–48 of 48 results for author: Daniely, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.07606  [pdf, ps, other

    cs.LG math.OC stat.ML

    RedEx: Beyond Fixed Representation Methods via Convex Optimization

    Authors: Amit Daniely, Mariano Schain, Gilad Yehudai

    Abstract: Optimizing Neural networks is a difficult task which is still not well understood. On the other hand, fixed representation methods such as kernels and random features have provable optimization guarantees but inferior performance due to their inherent inability to learn the representations. In this paper, we aim at bridging this gap by presenting a novel architecture called RedEx (Reduced Expander… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  2. arXiv:2311.13877  [pdf, other

    cs.LG math.OC stat.ML

    Locally Optimal Descent for Dynamic Stepsize Scheduling

    Authors: Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

    Abstract: We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method wit… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  3. arXiv:2307.00642  [pdf, ps, other

    cs.LG

    Multiclass Boosting: Simple and Intuitive Weak Learning Criteria

    Authors: Nataly Brukhim, Amit Daniely, Yishay Mansour, Shay Moran

    Abstract: We study a generalization of boosting to the multiclass setting. We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being "slightly better than random guessing". We give a simple and efficient boosting algorithm, that does not require realizability assumptions and its sample and oracle complexity bounds are independent of… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  4. arXiv:2305.16508  [pdf, other

    cs.LG stat.ML

    Most Neural Networks Are Almost Learnable

    Authors: Amit Daniely, Nathan Srebro, Gal Vardi

    Abstract: We present a PTAS for learning random constant-depth networks. We show that for any fixed $ε>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of depth $i$, up to an additive error of $ε$. The algorithm runs in time and sample complexity of $(\bar{d})^{\mathrm{poly}(ε^{-1})}$, where $\bar d$ is the size of… ▽ More

    Submitted 24 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Small fixes after review

  5. arXiv:2302.07426  [pdf, ps, other

    cs.LG stat.ML

    Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy

    Authors: Amit Daniely, Nathan Srebro, Gal Vardi

    Abstract: Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-$2$ networks can be efficiently learned under the assumptions that the input distribution… ▽ More

    Submitted 4 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Changed the title, and made some other minor modifications. arXiv admin note: text overlap with arXiv:2101.08303

  6. arXiv:2211.09634  [pdf, ps, other

    cs.LG

    On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

    Authors: Amit Daniely, Elad Granot

    Abstract: We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class $$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, σ\circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ where the spectral norm… ▽ More

    Submitted 20 January, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 13 pages

    Journal ref: 35th International Conference on Algorithmic Learning Theory, 2024

  7. arXiv:2209.12882  [pdf, other

    cs.LG cs.DS stat.ML

    Approximate Description Length, Covering Numbers, and VC Dimension

    Authors: Amit Daniely, Gal Katzhendler

    Abstract: Recently, Daniely and Granot [arXiv:1910.05697] introduced a new notion of complexity called Approximate Description Length (ADL). They used it to derive novel generalization bounds for neural networks, that despite substantial work, were out of reach for more classical techniques such as discretization, Covering Numbers and Rademacher Complexity. In this paper we explore how ADL relates to classi… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  8. arXiv:2202.05246  [pdf, other

    cs.LG cs.AI cs.IT math.ST

    Monotone Learning

    Authors: Olivier Bousquet, Amit Daniely, Haim Kaplan, Yishay Mansour, Shay Moran, Uri Stemmer

    Abstract: The amount of training-data is one of the key factors which determines the generalization capacity of learning algorithms. Intuitively, one expects the error rate to decrease as the amount of training-data increases. Perhaps surprisingly, natural attempts to formalize this intuition give rise to interesting and challenging mathematical questions. For example, in their classical book on pattern rec… ▽ More

    Submitted 6 November, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: Fixed a calculation error in Lemma 2.5

  9. arXiv:2106.11879  [pdf, other

    math.OC cs.LG

    Asynchronous Stochastic Optimization Robust to Arbitrary Delays

    Authors: Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

    Abstract: We consider stochastic optimization with delayed gradients where, at each time step $t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communicat… ▽ More

    Submitted 15 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

  10. arXiv:2105.09673  [pdf, other

    cs.LG

    An Exact Poly-Time Membership-Queries Algorithm for Extraction a three-Layer ReLU Network

    Authors: Amit Daniely, Elad Granot

    Abstract: We consider the natural problem of learning a ReLU network from queries, which was recently remotivated by model extraction attacks. In this work, we present a polynomial-time algorithm that can learn a depth-two ReLU network from queries under mild general position assumptions. We also present a polynomial-time algorithm that, under mild general position assumptions, can learn a rich class of dep… ▽ More

    Submitted 14 February, 2023; v1 submitted 20 May, 2021; originally announced May 2021.

  11. arXiv:2101.08303  [pdf, ps, other

    cs.LG stat.ML

    From Local Pseudorandom Generators to Hardness of Learning

    Authors: Amit Daniely, Gal Vardi

    Abstract: We prove hardness-of-learning results under a well-studied assumption on the existence of local pseudorandom generators. As we show, this assumption allows us to surpass the current state of the art, and prove hardness of various basic problems, with no hardness results to date. Our results include: hardness of learning shallow ReLU neural networks under the Gaussian distribution and other distr… ▽ More

    Submitted 8 June, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

  12. arXiv:2010.14927  [pdf, other

    cs.LG cs.CR stat.ML

    Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

    Authors: Amit Daniely, Hadas Schacham

    Abstract: We consider ReLU networks with random weights, in which the dimension decreases at each layer. We show that for most such networks, most examples $x$ admit an adversarial perturbation at an Euclidean distance of $O\left(\frac{\|x\|}{\sqrt{d}}\right)$, where $d$ is the input dimension. Moreover, this perturbation can be found via gradient flow, as well as gradient descent with sufficiently small st… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

  13. arXiv:2006.03177  [pdf, ps, other

    cs.LG cs.CC stat.ML

    Hardness of Learning Neural Networks with Natural Weights

    Authors: Amit Daniely, Gal Vardi

    Abstract: Neural networks are nowadays highly successful despite strong hardness results. The existing hardness results focus on the network architecture, and assume that the network's weights are arbitrary. A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning. This approach is supported b… ▽ More

    Submitted 13 October, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

  14. arXiv:2003.12895  [pdf, other

    cs.LG stat.ML

    Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

    Authors: Amit Daniely

    Abstract: We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $Ω\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$. The result is valid for a large class of activation functions, which includes the absolute value.

    Submitted 28 March, 2020; originally announced March 2020.

  15. arXiv:2002.07400  [pdf, other

    cs.LG stat.ML

    Learning Parities with Neural Networks

    Authors: Amit Daniely, Eran Malach

    Abstract: In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data… ▽ More

    Submitted 3 July, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  16. arXiv:2002.03273  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

    Authors: Yossi Arjevani, Amit Daniely, Stefanie Jegelka, Hongzhou Lin

    Abstract: Recent advances in randomized incremental methods for minimizing $L$-smooth $μ$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/μ})\log(1/ε))$ and $O(n+\sqrt{nL/ε})$, where $μ>0$ and $μ=0$, respectively, and $n$ denotes the number of individual functions. Unlike incremental methods, stochastic methods for finite sums do not rely on an explicit knowledge o… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  17. arXiv:1911.09873  [pdf, other

    cs.LG stat.ML

    Neural Networks Learning and Memorization with (almost) no Over-Parameterization

    Authors: Amit Daniely

    Abstract: Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize sam… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

  18. arXiv:1910.05697  [pdf, other

    cs.LG stat.ML

    Generalization Bounds for Neural Networks via Approximate Description Length

    Authors: Amit Daniely, Elad Granot

    Abstract: We investigate the sample complexity of networks with bounds on the magnitude of its weights. In particular, we consider the class \[ H=\left\{W_t\circρ\circ \ldots\circρ\circ W_{1} :W_1,\ldots,W_{t-1}\in M_{d, d}, W_t\in M_{1,d}\right\} \] where the spectral norm of each $W_i$ is bounded by $O(1)$, the Frobenius norm is bounded by $R$, and $ρ$ is the sigmoid function $\frac{e^x}{1+e^x}$ or the sm… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

    Comments: To appear in NeurIPS

  19. arXiv:1909.12051  [pdf, other

    cs.LG stat.ML

    The Implicit Bias of Depth: How Incremental Learning Drives Generalization

    Authors: Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely

    Abstract: A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear… ▽ More

    Submitted 28 December, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: 25 pages, 7 figures, published at the International Conference on Learning Representations (ICLR) 2020

  20. arXiv:1907.05444  [pdf, other

    cs.LG stat.ML

    On the Optimality of Trees Generated by ID3

    Authors: Alon Brutzkus, Amit Daniely, Eran Malach

    Abstract: Since its inception in the 1980s, ID3 has become one of the most successful and widely used algorithms for learning decision trees. However, its theoretical properties remain poorly understood. In this work, we introduce a novel metric of a decision tree algorithm's performance, called mean iteration statistical consistency (MIC), which measures optimality of trees generated by ID3. As opposed to… ▽ More

    Submitted 23 February, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

  21. arXiv:1906.08654  [pdf, ps, other

    cs.LG stat.ML

    ID3 Learns Juntas for Smoothed Product Distributions

    Authors: Alon Brutzkus, Amit Daniely, Eran Malach

    Abstract: In recent years, there are many attempts to understand popular heuristics. An example of such a heuristic algorithm is the ID3 algorithm for learning decision trees. This algorithm is commonly used in practice, but there are very few theoretical works studying its behavior. In this paper, we analyze the ID3 algorithm, when the target function is a $k$-Junta, a function that depends on $k$ out of… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  22. arXiv:1904.03602  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Competitive ratio versus regret minimization: achieving the best of both worlds

    Authors: Amit Daniely, Yishay Mansour

    Abstract: We consider online algorithms under both the competitive ratio criteria and the regret minimization one. Our main goal is to build a unified methodology that would be able to guarantee both criteria simultaneously. For a general class of online algorithms, namely any Metrical Task System (MTS), we show that one can simultaneously guarantee the best known competitive ratio and a natural regret bo… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

  23. arXiv:1809.09165  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Locally Private Learning without Interaction Requires Separation

    Authors: Amit Daniely, Vitaly Feldman

    Abstract: We consider learning under the constraint of local differential privacy (LDP). For many learning problems known efficient algorithms in this model require many rounds of communication between the server and the clients holding the data points. Yet multi-round protocols are prohibitively slow in practice due to network latency and, as a result, currently deployed large-scale systems are limited to… ▽ More

    Submitted 28 October, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

  24. arXiv:1805.02363  [pdf, other

    cs.AI

    Planning and Learning with Stochastic Action Sets

    Authors: Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans

    Abstract: In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundation… ▽ More

    Submitted 12 February, 2021; v1 submitted 7 May, 2018; originally announced May 2018.

  25. arXiv:1803.03155  [pdf, other

    cs.LG

    Learning Rules-First Classifiers

    Authors: Deborah Cohen, Amit Daniely, Amir Globerson, Gal Elidan

    Abstract: Complex classifiers may exhibit "embarassing" failures in cases where humans can easily provide a justified classification. Avoiding such failures is obviously of key importance. In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier. We define a hypothesis class tha… ▽ More

    Submitted 13 June, 2019; v1 submitted 8 March, 2018; originally announced March 2018.

  26. arXiv:1703.07872  [pdf, other

    cs.LG

    Random Features for Compositional Kernels

    Authors: Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer

    Abstract: We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels. The compositional kernels we use are inspired by the structure of convolutional neural networks and kernels. The resulting scheme yields sparse and efficiently computable features. Each random feature can be represented as an algebraic expression over a small number of (random) paths in a compositio… ▽ More

    Submitted 22 March, 2017; originally announced March 2017.

  27. arXiv:1702.08503  [pdf, other

    cs.LG cs.DS stat.ML

    SGD Learns the Conjugate Kernel Class of the Network

    Authors: Amit Daniely

    Abstract: We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guara… ▽ More

    Submitted 19 May, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

  28. arXiv:1702.08489  [pdf, other

    cs.LG cs.CC stat.ML

    Depth Separation for Neural Networks

    Authors: Amit Daniely

    Abstract: Let $f:\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}\to\mathbb{S}$ be a function of the form $f(\mathbf{x},\mathbf{x}') = g(\langle\mathbf{x},\mathbf{x}'\rangle)$ for $g:[-1,1]\to \mathbb{R}$. We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate $f$ whenever $g$ cannot be approximated by a low degree polynomial. Moreover, for… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

  29. arXiv:1611.10228  [pdf, ps, other

    cs.LG cs.GT

    Behavior-Based Machine-Learning: A Hybrid Approach for Predicting Human Decision Making

    Authors: Gali Noti, Effi Levi, Yoav Kolumbus, Amit Daniely

    Abstract: A large body of work in behavioral fields attempts to develop models that describe the way people, as opposed to rational agents, make decisions. A recent Choice Prediction Competition (2015) challenged researchers to suggest a model that captures 14 classic choice biases and can predict human decisions under risk and ambiguity. The competition focused on simple decision problems, in which human s… ▽ More

    Submitted 30 November, 2016; originally announced November 2016.

    ACM Class: J.4; I.2.6; H.1.2

  30. arXiv:1604.05753  [pdf, other

    cs.LG cs.AI

    Sketching and Neural Networks

    Authors: Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar

    Abstract: High-dimensional sparse data present computational and statistical challenges for supervised learning. We propose compact linear sketches for reducing the dimensionality of the input, followed by a single layer neural network. We show that any sparse polynomial function can be computed, on nearly all sparse binary vectors, by a single layer neural network that takes a compact sketch of the vector… ▽ More

    Submitted 19 April, 2016; originally announced April 2016.

  31. arXiv:1603.03714  [pdf, other

    cs.LG

    Distribution Free Learning with Local Queries

    Authors: Galit Bary-Weisberg, Amit Daniely, Shai Shalev-Shwartz

    Abstract: The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set. This model, recently proposed and studied by Awasthi, Feldman and Kanade, aims to facilitate practical use of membership queries. We continue this line of work, p… ▽ More

    Submitted 11 March, 2016; originally announced March 2016.

  32. arXiv:1602.05897  [pdf, other

    cs.LG cs.AI cs.CC cs.DS stat.ML

    Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

    Authors: Amit Daniely, Roy Frostig, Yoram Singer

    Abstract: We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good s… ▽ More

    Submitted 19 May, 2017; v1 submitted 18 February, 2016; originally announced February 2016.

  33. arXiv:1505.05800  [pdf, other

    cs.CC cs.LG

    Complexity Theoretic Limitations on Learning Halfspaces

    Authors: Amit Daniely

    Abstract: We study the problem of agnostically learning halfspaces which is defined by a fixed but unknown distribution $\mathcal{D}$ on $\mathbb{Q}^n\times \{\pm 1\}$. We define $\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D})$ as the least error of a halfspace classifier for $\mathcal{D}$. A learner who can access $\mathcal{D}$ has to return a hypothesis whose error is small compared to… ▽ More

    Submitted 13 March, 2016; v1 submitted 21 May, 2015; originally announced May 2015.

  34. arXiv:1502.07073  [pdf, ps, other

    cs.LG

    Strongly Adaptive Online Learning

    Authors: Amit Daniely, Alon Gonen, Shai Shalev-Shwartz

    Abstract: Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal. We present a reduction that can transform standard low-regret algorithms to strongly adaptive. As a consequence, we derive simple, yet efficient, strongly adaptive algorithms for a handful of problems.

    Submitted 19 June, 2015; v1 submitted 25 February, 2015; originally announced February 2015.

  35. arXiv:1412.6265  [pdf, other

    cs.GT cs.CC cs.DM math.CO

    Inapproximability of Truthful Mechanisms via Generalizations of the VC Dimension

    Authors: Amit Daniely, Michael Schapira, Gal Shahaf

    Abstract: Algorithmic mechanism design (AMD) studies the delicate interplay between computational efficiency, truthfulness, and optimality. We focus on AMD's paradigmatic problem: combinatorial auctions. We present a new generalization of the VC dimension to multivalued collections of functions, which encompasses the classical VC dimension, Natarajan dimension, and Steele dimension. We present a correspondi… ▽ More

    Submitted 4 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

  36. arXiv:1410.7050  [pdf, ps, other

    cs.DS cs.LG

    A PTAS for Agnostically Learning Halfspaces

    Authors: Amit Daniely

    Abstract: We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the $d$ dimensional sphere. Namely, we show that for every $μ>0$ there is an algorithm that runs in time $\mathrm{poly}(d,\frac{1}ε)$, and is guaranteed to return a classifier with error at most $(1+μ)\mathrm{opt}+ε$, where $\mathrm{opt}$ is the error of the best halfspace classifier. This improves on Awasthi… ▽ More

    Submitted 25 June, 2015; v1 submitted 26 October, 2014; originally announced October 2014.

  37. arXiv:1407.7937  [pdf, ps, other

    cs.GT cs.LG

    Learning Economic Parameters from Revealed Preferences

    Authors: Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, Vijay V. Vazirani

    Abstract: A recent line of work, starting with Beigman and Vohra (2006) and Zadimoghaddam and Roth (2012), has addressed the problem of {\em learning} a utility function from revealed preference data. The goal here is to make use of past data describing the purchases of a utility maximizing agent when faced with certain prices and budget constraints in order to produce a hypothesis function that can accurat… ▽ More

    Submitted 30 July, 2014; originally announced July 2014.

  38. arXiv:1405.2420  [pdf, ps, other

    cs.LG

    Optimal Learners for Multiclass Problems

    Authors: Amit Daniely, Shai Shalev-Shwartz

    Abstract: The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for multiclass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be improper, namely, it must have the ability to o… ▽ More

    Submitted 10 May, 2014; originally announced May 2014.

  39. arXiv:1404.3378  [pdf, other

    cs.LG cs.CC

    Complexity theoretic limitations on learning DNF's

    Authors: Amit Daniely, Shai Shalev-Shwatz

    Abstract: Using the recently developed framework of [Daniely et al, 2014], we show that under a natural assumption on the complexity of refuting random K-SAT formulas, learning DNF formulas is hard. Furthermore, the same assumption implies the hardness of learning intersections of $ω(\log(n))$ halfspaces, agnostically learning conjunctions, as well as virtually all (distribution free) learning problems that… ▽ More

    Submitted 4 November, 2014; v1 submitted 13 April, 2014; originally announced April 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1311.2272

  40. arXiv:1311.2272  [pdf, other

    cs.LG cs.CC

    From average case complexity to improper learning complexity

    Authors: Amit Daniely, Nati Linial, Shai Shalev-Shwartz

    Abstract: The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.… ▽ More

    Submitted 9 March, 2014; v1 submitted 10 November, 2013; originally announced November 2013.

    Comments: 34 pages

  41. arXiv:1311.2271  [pdf, ps, other

    cs.LG

    More data speeds up training time in learning halfspaces over sparse vectors

    Authors: Amit Daniely, Nati Linial, Shai Shalev Shwartz

    Abstract: The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em… ▽ More

    Submitted 10 November, 2013; originally announced November 2013.

    Comments: 13 pages

  42. arXiv:1308.2893  [pdf, other

    cs.LG

    Multiclass learnability and the ERM principle

    Authors: Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz

    Abstract: We study the sample complexity of multiclass prediction in several learning settings. For the PAC setting our analysis reveals a surprising phenomenon: In sharp contrast to binary classification, we show that there exist multiclass hypothesis classes for which some Empirical Risk Minimizers (ERM learners) have lower sample complexity than others. Furthermore, there are classes that are learnable b… ▽ More

    Submitted 24 November, 2014; v1 submitted 13 August, 2013; originally announced August 2013.

    Journal ref: Journal of Machine Learning Research, 16(Jul):1275-1304, 2015

  43. arXiv:1302.1043  [pdf, other

    cs.LG

    The price of bandit information in multiclass online classification

    Authors: Amit Daniely, Tom Helbertal

    Abstract: We consider two scenarios of multiclass online learning of a hypothesis class $H\subseteq Y^X$. In the {\em full information} scenario, the learner is exposed to instances together with their labels. In the {\em bandit} scenario, the true label is not exposed, but rather an indication whether the learner's prediction is correct or not. We show that the ratio between the error rates in the two scen… ▽ More

    Submitted 9 July, 2013; v1 submitted 5 February, 2013; originally announced February 2013.

  44. arXiv:1211.0616  [pdf, other

    cs.LG cs.DS

    The complexity of learning halfspaces using generalized linear methods

    Authors: Amit Daniely, Nati Linial, Shai Shalev-Shwartz

    Abstract: Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a vector space of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF's. In addition they are widely used in numerous appl… ▽ More

    Submitted 10 May, 2014; v1 submitted 3 November, 2012; originally announced November 2012.

  45. arXiv:1205.6432  [pdf, other

    cs.LG

    Multiclass Learning Approaches: A Theoretical Comparison with Implications

    Authors: Amit Daniely, Sivan Sabato, Shai Shalev Shwartz

    Abstract: We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a… ▽ More

    Submitted 1 June, 2012; v1 submitted 29 May, 2012; originally announced May 2012.

    Journal ref: Advances in Neural Information Processing Systems, 2012, pages 494-502

  46. arXiv:1205.4893  [pdf, other

    cs.CC cs.LG

    On the practically interesting instances of MAXCUT

    Authors: Yonatan Bilu, Amit Daniely, Nati Linial, Michael Saks

    Abstract: The complexity of a computational problem is traditionally quantified based on the hardness of its worst case. This approach has many advantages and has led to a deep and beautiful theory. However, from the practical perspective, this leaves much to be desired. In application areas, practically interesting instances very often occupy just a tiny part of an algorithm's space of instances, and the v… ▽ More

    Submitted 22 May, 2012; originally announced May 2012.

  47. arXiv:1205.4891  [pdf, other

    cs.LG cs.DS

    Clustering is difficult only when it does not matter

    Authors: Amit Daniely, Nati Linial, Michael Saks

    Abstract: Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the great importance of clustering, we still do not have a satisfactory mathematical theory of clustering. In order to properly understand clustering, it is clearly… ▽ More

    Submitted 22 May, 2012; originally announced May 2012.

  48. arXiv:1001.3661  [pdf, ps, other

    cs.DM

    Tight products and Expansion

    Authors: Amit Daniely, Nathan Linial

    Abstract: In this paper we study a new product of graphs called {\em tight product}. A graph $H$ is said to be a tight product of two (undirected multi) graphs $G_1$ and $G_2$, if $V(H)=V(G_1)\times V(G_2)$ and both projection maps $V(H)\to V(G_1)$ and $V(H)\to V(G_2)$ are covering maps. It is not a priori clear when two given graphs have a tight product (in fact, it is $NP$-hard to decide). We investigate… ▽ More

    Submitted 3 November, 2012; v1 submitted 20 January, 2010; originally announced January 2010.

    Journal ref: J. Graph Theory, (2012) 69, 426 - 440