Zum Hauptinhalt springen

Showing 201–250 of 287 results for author: Anandkumar, A

.
  1. arXiv:1901.11261  [pdf, other

    stat.ML cs.LG

    Higher-order Count Sketch: Dimensionality Reduction That Retains Efficient Tensor Operations

    Authors: Yang Shi, Animashree Anandkumar

    Abstract: Sketching is a randomized dimensionality-reduction method that aims to preserve relevant information in large-scale datasets. Count sketch is a simple popular sketch which uses a randomized hash function to achieve compression. In this paper, we propose a novel extension known as Higher-order Count Sketch (HCS). While count sketch uses a single hash function, HCS uses multiple (smaller) hash funct… ▽ More

    Submitted 4 November, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  2. arXiv:1901.09490  [pdf, other

    cs.LG stat.ML

    Stochastic Linear Bandits with Hidden Low Rank Structure

    Authors: Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar, Babak Hassibi

    Abstract: High-dimensional representations often have a lower dimensional underlying structure. This is particularly the case in many decision making settings. For example, when the representation of actions is generated from a deep neural network, it is reasonable to expect a low-rank structure whereas conventional structures like sparsity are not valid anymore. Subspace recovery methods, such as Principle… ▽ More

    Submitted 27 January, 2019; originally announced January 2019.

  3. Neural Lander: Stable Drone Landing Control using Learned Dynamics

    Authors: Guanya Shi, Xichen Shi, Michael O'Connell, Rose Yu, Kamyar Azizzadenesheli, Animashree Anandkumar, Yisong Yue, Soon-Jo Chung

    Abstract: Precise near-ground trajectory control is difficult for multi-rotor drones, due to the complex aerodynamic effects caused by interactions between multi-rotor airflow and the environment. Conventional control methods often fail to properly account for these complex effects and fall short in accomplishing smooth landing. In this paper, we present a novel deep-learning-based robust nonlinear controll… ▽ More

    Submitted 4 March, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

    Comments: 7 pages, 5 figures, https://youtu.be/FLLsG0S78ik

    Journal ref: International Conferenceon Robotics and Automation (ICRA), 2019, pp. 9784-9790

  4. arXiv:1811.02657  [pdf, other

    cs.CV cs.AI cs.LG cs.NE stat.ML

    A Bayesian Perspective of Convolutional Neural Networks through a Deconvolutional Generative Model

    Authors: Tan Nguyen, Nhat Ho, Ankit Patel, Anima Anandkumar, Michael I. Jordan, Richard G. Baraniuk

    Abstract: Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Deconvolutional Generative Model (DGM), a new probabilistic generative model whose inference calculations correspond to those in a given CNN architecture. The DGM uses a CNN to design the prior distribution in the probabilistic model. Furthermore, the DGM generates images from coarse… ▽ More

    Submitted 9 December, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: Keywords: neural nets, generative models, semi-supervised learning, cross-entropy, statistical guarantees 80 pages, 7 figures, 8 tables

  5. arXiv:1810.08305  [pdf, other

    cs.LG stat.ML

    Open Vocabulary Learning on Source Code with a Graph-Structured Cache

    Authors: Milan Cvitkovic, Badal Singh, Anima Anandkumar

    Abstract: Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-St… ▽ More

    Submitted 19 May, 2019; v1 submitted 18 October, 2018; originally announced October 2018.

    Comments: Published in the International Conference on Machine Learning (ICML 2019), 13 pages

  6. arXiv:1810.07900  [pdf, ps, other

    cs.LG stat.ML

    Policy Gradient in Partially Observable Environments: Approximation and Convergence

    Authors: Kamyar Azizzadenesheli, Yisong Yue, Animashree Anandkumar

    Abstract: Policy gradient is a generic and flexible reinforcement learning approach that generally enjoys simplicity in analysis, implementation, and deployment. In the last few decades, this approach has been extensively advanced for fully observable environments. In this paper, we generalize a variety of these advances to partially observable settings, and similar to the fully observable case, we keep our… ▽ More

    Submitted 24 May, 2020; v1 submitted 18 October, 2018; originally announced October 2018.

  7. arXiv:1810.05291  [pdf, other

    cs.DC cs.AI cs.LG

    signSGD with Majority Vote is Communication Efficient And Fault Tolerant

    Authors: Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, Anima Anandkumar

    Abstract: Training neural networks on large datasets can be accelerated by distributing the workload over a network of machines. As datasets grow ever larger, networks of hundreds or thousands of machines become economically viable. The time cost of communicating gradients limits the effectiveness of using such large machine counts, as may the increased chance of network faults. We explore a particularly si… ▽ More

    Submitted 22 February, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

  8. arXiv:1806.05780  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Surprising Negative Results for Generative Adversarial Tree Search

    Authors: Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar

    Abstract: While many recent advances in deep reinforcement learning (RL) rely on model-free methods, model-based approaches remain an alluring prospect for their potential to exploit unsupervised data to learn environment model. In this work, we provide an extensive study on the design of deep generative models for RL environments and propose a sample efficient and robust method to learn the model of Atari… ▽ More

    Submitted 4 September, 2019; v1 submitted 14 June, 2018; originally announced June 2018.

  9. arXiv:1806.02901  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Probabilistic FastText for Multi-Sense Word Embeddings

    Authors: Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar

    Abstract: We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share statistical strength across sub-word structures (e.g. La… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: Published at ACL 2018

  10. arXiv:1805.04770  [pdf, other

    stat.ML cs.AI cs.LG

    Born Again Neural Networks

    Authors: Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar

    Abstract: Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness, without sacrificing too much performance. We study KD from a new p… ▽ More

    Submitted 29 June, 2018; v1 submitted 12 May, 2018; originally announced May 2018.

    Comments: Published @ICML 2018

  11. arXiv:1804.02088  [pdf, other

    cs.CV

    Question Type Guided Attention in Visual Question Answering

    Authors: Yang Shi, Tommaso Furlanello, Sheng Zha, Animashree Anandkumar

    Abstract: Visual Question Answering (VQA) requires integration of feature maps with drastically different structures and focus of the correct regions. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently follow a temporal sequence and naturally cluster into semantically different question types. A lot of previous works use complex models to extract feature representa… ▽ More

    Submitted 18 July, 2018; v1 submitted 5 April, 2018; originally announced April 2018.

  12. arXiv:1803.01442  [pdf, other

    cs.LG stat.ML

    Stochastic Activation Pruning for Robust Adversarial Defense

    Authors: Guneet S. Dhillon, Kamyar Azizzadenesheli, Zachary C. Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, Anima Anandkumar

    Abstract: Neural networks are known to be vulnerable to adversarial examples. Carefully chosen perturbations to real images, while imperceptible to humans, induce misclassification and threaten the reliability of deep learning systems in the wild. To guard against adversarial examples, we take inspiration from game theory and cast the problem as a minimax zero-sum game between the adversary and the model. I… ▽ More

    Submitted 4 March, 2018; originally announced March 2018.

    Comments: ICLR 2018

  13. arXiv:1802.07427  [pdf, other

    cs.LG

    Active Learning with Partial Feedback

    Authors: Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan

    Abstract: While many active learning papers assume that the learner can simply ask for a label and receive it, real annotation often presents a mismatch between the form of a label (say, one among many classes), and the form of an annotation (typically yes/no binary feedback). To annotate examples corpora for multiclass classification, we might need to ask multiple yes/no questions, exploiting a label hiera… ▽ More

    Submitted 8 July, 2019; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: ICLR 2019

  14. arXiv:1802.04434  [pdf, other

    cs.LG cs.DC math.OC

    signSGD: Compressed Optimisation for Non-Convex Problems

    Authors: Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar

    Abstract: Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative $\ell_1/\ell_2$ geome… ▽ More

    Submitted 7 August, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

  15. arXiv:1802.04412  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Efficient Exploration through Bayesian Deep Q-Networks

    Authors: Kamyar Azizzadenesheli, Animashree Anandkumar

    Abstract: We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism,… ▽ More

    Submitted 6 September, 2019; v1 submitted 12 February, 2018; originally announced February 2018.

  16. arXiv:1801.04342  [pdf, other

    cs.LG cs.AI stat.ML

    Combining Symbolic Expressions and Black-box Function Evaluations in Neural Programs

    Authors: Forough Arabshahi, Sameer Singh, Animashree Anandkumar

    Abstract: Neural programming involves training neural networks to learn programs, mathematics, or logic from data. Previous works have failed to achieve good generalization performance, especially on problems and programs with high complexity or on large domains. This is because they mostly rely either on black-box function evaluations that do not capture the structure of the program, or on detailed executi… ▽ More

    Submitted 26 April, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

    Comments: Published as a conference paper at the sixth International Conference on Learning Representations (ICLR), 2018

  17. arXiv:1712.04577  [pdf, other

    cs.LG

    Learning From Noisy Singly-labeled Data

    Authors: Ashish Khetan, Zachary C. Lipton, Anima Anandkumar

    Abstract: Supervised learning depends on annotated examples, which are taken to be the \emph{ground truth}. But these labels often come from noisy crowdsourcing platforms, like Amazon Mechanical Turk. Practitioners typically collect multiple labels per example and aggregate the results to mitigate noise (the classic crowdsourcing problem). Given a fixed annotation budget and unlimited unlabeled data, redund… ▽ More

    Submitted 20 May, 2018; v1 submitted 12 December, 2017; originally announced December 2017.

    Comments: 18 pages, 3 figures

  18. arXiv:1712.03942  [pdf, other

    cs.LG cs.CV

    StrassenNets: Deep Learning with a Multiplication Budget

    Authors: Michael Tschannen, Aran Khanna, Anima Anandkumar

    Abstract: A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary)… ▽ More

    Submitted 8 June, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

    Comments: ICML 2018. Code available at https://github.com/mitscha/strassennets

  19. arXiv:1711.00073  [pdf, other

    cs.LG

    Long-term Forecasting using Higher Order Tensor RNNs

    Authors: Rose Yu, Stephan Zheng, Anima Anandkumar, Yisong Yue

    Abstract: We present Higher-Order Tensor RNN (HOT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed recurrent architecture addresses these issues b… ▽ More

    Submitted 23 August, 2019; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: 24 pages including appendix, updated JMLR version

  20. arXiv:1707.08308  [pdf, other

    cs.LG

    Tensor Regression Networks

    Authors: Jean Kossaifi, Zachary C. Lipton, Arinbjorn Kolbeinsson, Aran Khanna, Tommaso Furlanello, Anima Anandkumar

    Abstract: Convolutional neural networks typically consist of many convolutional layers followed by one or more fully connected layers. While convolutional layers map between high-order activation tensors, the fully connected layers operate on flattened activation vectors. Despite empirical success, this approach has notable drawbacks. Flattening followed by fully connected layers discards multilinear struct… ▽ More

    Submitted 20 July, 2020; v1 submitted 26 July, 2017; originally announced July 2017.

  21. arXiv:1707.05928  [pdf, other

    cs.CL

    Deep Active Learning for Named Entity Recognition

    Authors: Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree Anandkumar

    Abstract: Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it c… ▽ More

    Submitted 3 February, 2018; v1 submitted 18 July, 2017; originally announced July 2017.

  22. arXiv:1706.06706  [pdf, other

    cs.CV

    Compact Tensor Pooling for Visual Question Answering

    Authors: Yang Shi, Tommaso Furlanello, Anima Anandkumar

    Abstract: Performing high level cognitive tasks requires the integration of feature maps with drastically different structure. In Visual Question Answering (VQA) image descriptors have spatial structures, while lexical inputs inherently follow a temporal sequence. The recently proposed Multimodal Compact Bilinear pooling (MCB) forms the outer products, via count-sketch approximation, of the visual and textu… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

  23. arXiv:1706.00439  [pdf, other

    cs.LG

    Tensor Contraction Layers for Parsimonious Deep Nets

    Authors: Jean Kossaifi, Aran Khanna, Zachary C. Lipton, Tommaso Furlanello, Anima Anandkumar

    Abstract: Tensors offer a natural representation for many kinds of data frequently encountered in machine learning. Images, for example, are naturally represented as third order tensors, where the modes correspond to height, width, and channels. Tensor methods are noted for their ability to discover multi-dimensional dependencies, and tensor decompositions in particular, have been used to produce compact lo… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

  24. arXiv:1705.02553  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

    Authors: Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

    Abstract: We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the futur… ▽ More

    Submitted 6 May, 2017; originally announced May 2017.

    Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain

    Journal ref: NIPS-DeepRL-Workshop-2016Barcelona

  25. arXiv:1611.03907  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

    Authors: Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

    Abstract: Reinforcement learning (RL) in Markov decision processes (MDPs) with large state spaces is a challenging problem. The performance of standard RL algorithms degrades drastically with the dimensionality of state space. However, in practice, these large MDPs typically incorporate a latent or hidden low-dimensional structure. In this paper, we study the setting of rich-observation Markov decision proc… ▽ More

    Submitted 19 June, 2018; v1 submitted 11 November, 2016; originally announced November 2016.

  26. arXiv:1610.09555  [pdf, other

    cs.LG

    TensorLy: Tensor Learning in Python

    Authors: Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, Maja Pantic

    Abstract: Tensors are higher-order extensions of matrices. While matrix methods form the cornerstone of machine learning and data analysis, tensor methods have been gaining increasing traction. However, software support for tensor operations is not on the same footing. In order to bridge this gap, we have developed \emph{TensorLy}, a high-level API for tensor methods and deep tensorized neural networks in P… ▽ More

    Submitted 9 May, 2018; v1 submitted 29 October, 2016; originally announced October 2016.

  27. arXiv:1610.09322  [pdf, other

    stat.ML cs.LG

    Homotopy Analysis for Tensor PCA

    Authors: Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

    Abstract: Developing efficient and guaranteed nonconvex algorithms has been an important challenge in modern machine learning. Algorithms with good empirical performance such as stochastic gradient descent often lack theoretical guarantees. In this paper, we analyze the class of homotopy or continuation methods for global optimization of nonconvex functions. These methods start from an objective function th… ▽ More

    Submitted 13 June, 2017; v1 submitted 28 October, 2016; originally announced October 2016.

    Comments: Accepted to COLT 2017

  28. arXiv:1609.06335  [pdf, other

    q-bio.MN cs.LG

    Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

    Authors: Anthony Gitter, Furong Huang, Ragupathyraj Valluvan, Ernest Fraenkel, Animashree Anandkumar

    Abstract: Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regul… ▽ More

    Submitted 20 September, 2016; originally announced September 2016.

    Comments: 37 pages, 9 figures

  29. arXiv:1608.04996  [pdf, ps, other

    cs.AI

    Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

    Authors: Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

    Abstract: Planning plays an important role in the broad class of decision theory. Planning has drawn much attention in recent work in the robotics and sequential decision making areas. Recently, Reinforcement Learning (RL), as an agent-environment interaction problem, has brought further attention to planning methods. Generally in RL, one can assume a generative model, e.g. graphical models, for the environ… ▽ More

    Submitted 17 August, 2016; originally announced August 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1602.07764

    Journal ref: 29th Annual Conference on Learning Theory (2016) 1639--1642

  30. arXiv:1606.06237  [pdf, other

    stat.ML cs.LG

    Online and Differentially-Private Tensor Decomposition

    Authors: Yining Wang, Animashree Anandkumar

    Abstract: In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We present the first guarantees for online tensor power method which has a linear memory requirement. Moreover, we present a noise calibrated tensor power… ▽ More

    Submitted 15 December, 2016; v1 submitted 20 June, 2016; originally announced June 2016.

    Comments: 19 pages, 9 figures. To appear at the 30th Annual Conference on Advances in Neural Information Processing Systems (NIPS 2016), to be held at Barcelona, Spain. Fix small typos in proofs of Lemmas C.5 and C.6

  31. Tensor Contractions with Extended BLAS Kernels on CPU and GPU

    Authors: Yang Shi, U. N. Niranjan, Animashree Anandkumar, Cris Cecka

    Abstract: Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. Existing approaches for tensor contractions typically involves explicit copy and transpose operations. In this paper, we propose and evaluate a new BLAS-like primitive STRIDED… ▽ More

    Submitted 2 October, 2016; v1 submitted 17 June, 2016; originally announced June 2016.

  32. arXiv:1606.03153   

    cs.CL cs.LG

    Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

    Authors: Furong Huang, Animashree Anandkumar

    Abstract: Unsupervised text embeddings extraction is crucial for text understanding in machine learning. Word2Vec and its variants have received substantial success in mapping words with similar syntactic or semantic meaning to vectors close to each other. However, extracting context-aware word-sequence embedding remains a challenging task. Training over large corpus is difficult as labels are difficult to… ▽ More

    Submitted 28 May, 2018; v1 submitted 9 June, 2016; originally announced June 2016.

    Comments: There was an error in section 3, there is a bug in the experiment section. We would like to take it down

  33. arXiv:1605.09080  [pdf, ps, other

    cs.LG stat.ML

    Spectral Methods for Correlated Topic Models

    Authors: Forough Arabshahi, Animashree Anandkumar

    Abstract: In this paper, we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are gen… ▽ More

    Submitted 13 November, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

  34. arXiv:1603.00954  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Training Input-Output Recurrent Neural Networks through Spectral Methods

    Authors: Hanie Sedghi, Anima Anandkumar

    Abstract: We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks. We propose a novel spectral approach for learning the network parameters. It is based on decomposition of the cross-moment tensor between the output and a non-linear transformation of the input, based on score functions. We guarantee consistent learning with polynomial sample and computati… ▽ More

    Submitted 31 October, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

  35. arXiv:1602.07764  [pdf, ps, other

    cs.AI cs.LG math.NA math.OC stat.ML

    Reinforcement Learning of POMDPs using Spectral Methods

    Authors: Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

    Abstract: We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the futur… ▽ More

    Submitted 29 May, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Journal ref: 29th Annual Conference on Learning Theory, PMLR 49:193-256, 2016

  36. arXiv:1602.05908  [pdf, other

    cs.LG stat.ML

    Efficient approaches for escaping higher order saddle points in non-convex optimization

    Authors: Anima Anandkumar, Rong Ge

    Abstract: Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle point structures in high dimensions. Many functions have degenerate saddle points such that the first and second order derivatives cannot distinguish them with loca… ▽ More

    Submitted 18 February, 2016; originally announced February 2016.

  37. arXiv:1602.01889  [pdf, ps, other

    q-bio.NC stat.ML

    Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model

    Authors: Furong Huang, Animashree Anandkumar, Christian Borgs, Jennifer Chayes, Ernest Fraenkel, Michael Hawrylycz, Ed Lein, Alessandro Ingrosso, Srinivas Turaga

    Abstract: Cataloging the neuronal cell types that comprise circuitry of individual brain regions is a major goal of modern neuroscience and the BRAIN initiative. Single-cell RNA sequencing can now be used to measure the gene expression profiles of individual neurons and to categorize neurons based on their gene expression profiles. While the single-cell techniques are extremely powerful and hold great promi… ▽ More

    Submitted 10 June, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

  38. arXiv:1510.04747  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

    Authors: Animashree Anandkumar, Prateek Jain, Yang Shi, U. N. Niranjan

    Abstract: Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoher… ▽ More

    Submitted 27 April, 2016; v1 submitted 15 October, 2015; originally announced October 2015.

  39. arXiv:1506.08473  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

    Authors: Majid Janzamin, Hanie Sedghi, Anima Anandkumar

    Abstract: Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimens… ▽ More

    Submitted 11 January, 2016; v1 submitted 28 June, 2015; originally announced June 2015.

    Comments: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural network

  40. arXiv:1506.04448  [pdf, other

    stat.ML cs.LG

    Fast and Guaranteed Tensor Decomposition via Sketching

    Authors: Yining Wang, Hsiao-Yu Tung, Alexander Smola, Animashree Anandkumar

    Abstract: Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tensor c… ▽ More

    Submitted 20 October, 2015; v1 submitted 14 June, 2015; originally announced June 2015.

    Comments: 29 pages. Appeared in Proceedings of Advances in Neural Information Processing Systems (NIPS), held at Montreal, Canada in 2015

  41. arXiv:1506.03509  [pdf, ps, other

    cs.LG stat.ML

    Convolutional Dictionary Learning through Tensor Factorization

    Authors: Furong Huang, Animashree Anandkumar

    Abstract: Tensor methods have emerged as a powerful paradigm for consistent learning of many latent variable models such as topic models, independent component analysis and dictionary learning. Model parameters are estimated via CP decomposition of the observed higher order input moments. However, in many domains, additional invariances such as shift invariances exist, enforced via models such as convolutio… ▽ More

    Submitted 18 June, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

  42. arXiv:1506.03208  [pdf, other

    stat.ML

    A Scale Mixture Perspective of Multiplicative Noise in Neural Networks

    Authors: Eric Nalisnick, Anima Anandkumar, Padhraic Smyth

    Abstract: Corrupting the input and hidden layers of deep neural networks (DNNs) with multiplicative noise, often drawn from the Bernoulli distribution (or 'dropout'), provides regularization that has significantly contributed to deep learning's success. However, understanding how multiplicative corruptions prevent overfitting has been difficult due to the complexity of a DNN's functional form. In this paper… ▽ More

    Submitted 10 June, 2015; originally announced June 2015.

  43. arXiv:1505.00308  [pdf, ps, other

    cs.CV cs.LG

    Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

    Authors: Tejaswi Nimmagadda, Anima Anandkumar

    Abstract: Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Mic… ▽ More

    Submitted 1 May, 2015; originally announced May 2015.

  44. arXiv:1503.04567  [pdf, ps, other

    cs.LG cs.SI stat.ML

    Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods

    Authors: Anima Anandkumar, Hanie Sedghi

    Abstract: Community detection in graphs has been extensively studied both in theory and in applications. However, detecting communities in hypergraphs is more challenging. In this paper, we propose a tensor decomposition approach for guaranteed learning of communities in a special class of hypergraphs modeling social tagging systems or folksonomies. A folksonomy is a tripartite 3-uniform hypergraph consisti… ▽ More

    Submitted 22 April, 2015; v1 submitted 16 March, 2015; originally announced March 2015.

  45. arXiv:1412.6514  [pdf, ps, other

    cs.LG stat.ML

    Score Function Features for Discriminative Learning

    Authors: Majid Janzamin, Hanie Sedghi, Anima Anandkumar

    Abstract: Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and l… ▽ More

    Submitted 19 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: Accepted as a workshop contribution at ICLR 2015. A longer version of this work is also available on arXiv: http://arxiv.org/abs/1412.2863

  46. arXiv:1412.3046  [pdf, ps, other

    cs.LG stat.ML

    Provable Tensor Methods for Learning Mixtures of Generalized Linear Models

    Authors: Hanie Sedghi, Majid Janzamin, Anima Anandkumar

    Abstract: We consider the problem of learning mixtures of generalized linear models (GLM) which arise in classification and regression problems. Typical learning approaches such as expectation maximization (EM) or variational Bayes can get stuck in spurious local optima. In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters. The key insight is to emplo… ▽ More

    Submitted 12 January, 2016; v1 submitted 9 December, 2014; originally announced December 2014.

    Comments: To appear in Proceeding of AI and Statistics (AISTATS) 2016

  47. arXiv:1412.2863  [pdf, ps, other

    cs.LG stat.ML

    Score Function Features for Discriminative Learning: Matrix and Tensor Framework

    Authors: Majid Janzamin, Hanie Sedghi, Anima Anandkumar

    Abstract: Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and l… ▽ More

    Submitted 11 December, 2014; v1 submitted 9 December, 2014; originally announced December 2014.

    Comments: 29 pages

  48. arXiv:1412.2693  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Provable Methods for Training Neural Networks with Sparse Connectivity

    Authors: Hanie Sedghi, Anima Anandkumar

    Abstract: We provide novel guaranteed approaches for training feedforward neural networks with sparse connectivity. We leverage on the techniques developed previously for learning linear networks and show that they can also be effectively adopted to learn non-linear networks. We operate on the moments involving label and the score function of the input, and show that their factorization provably yields the… ▽ More

    Submitted 28 April, 2015; v1 submitted 8 December, 2014; originally announced December 2014.

    Comments: Accepted for presentation at Neural Information Processing Systems(NIPS) 2014 Deep Learning workshop and Accepted as a workshop contribution at ICLR 2015

  49. arXiv:1411.1488  [pdf, ps, other

    cs.LG stat.ML

    Analyzing Tensor Power Method Dynamics in Overcomplete Regime

    Authors: Anima Anandkumar, Rong Ge, Majid Janzamin

    Abstract: We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension. Finding the CP decomposition of an overcomplete tensor is NP-hard in general. We consider the case where the tensor components are randomly drawn, and show that the simple power iteration recovers the components with bounded error under mild… ▽ More

    Submitted 14 September, 2015; v1 submitted 5 November, 2014; originally announced November 2014.

    Comments: 38 pages; analysis of noise added to the previous version

  50. arXiv:1411.1132  [pdf, ps, other

    cs.SI

    Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]

    Authors: Forough Arabshahi, Furong Huang, Animashree Anandkumar, Carter T. Butts, Sean M. Fitshugh

    Abstract: Scalable probabilistic modeling and prediction in high dimensional multivariate time-series is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Examples of such problems include dynamic social networks with co-evolving nodes and edges and dynamic student learning in online courses. Here, we address these problems through the discovery of hierarc… ▽ More

    Submitted 5 June, 2016; v1 submitted 4 November, 2014; originally announced November 2014.