Zum Hauptinhalt springen

Showing 1–35 of 35 results for author: Sedghi, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07852  [pdf, other

    cs.CL cs.AI cs.LG

    Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

    Authors: Jiri Hron, Laura Culp, Gamaleldin Elsayed, Rosanne Liu, Ben Adlam, Maxwell Bileschi, Bernd Bohnet, JD Co-Reyes, Noah Fiedel, C. Daniel Freeman, Izzeddin Gur, Kathleen Kenealy, Jaehoon Lee, Peter J. Liu, Gaurav Mishra, Igor Mordatch, Azade Nova, Roman Novak, Aaron Parisi, Jeffrey Pennington, Alex Rizkowsky, Isabelle Simpson, Hanie Sedghi, Jascha Sohl-dickstein, Kevin Swersky , et al. (6 additional authors not shown)

    Abstract: While many capabilities of language models (LMs) improve with increased training budget, the influence of scale on hallucinations is not yet fully understood. Hallucinations come in many forms, and there is no universally accepted definition. We thus focus on studying only those hallucinations where a correct answer appears verbatim in the training set. To fully control the training data content,… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Published at COLM 2024. 16 pages, 11 figures

  2. arXiv:2406.13094  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring and Benchmarking the Planning Capabilities of Large Language Models

    Authors: Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

    Abstract: We seek to elevate the planning capabilities of Large Language Models (LLMs)investigating four main directions. First, we construct a comprehensive benchmark suite encompassing both classical planning domains and natural language scenarios. This suite includes algorithms to generate instances with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Sec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.00179  [pdf, other

    cs.CL cs.AI

    Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

    Authors: Bernd Bohnet, Kevin Swersky, Rosanne Liu, Pranjal Awasthi, Azade Nova, Javier Snaider, Hanie Sedghi, Aaron T Parisi, Michael Collins, Angeliki Lazaridou, Orhan Firat, Noah Fiedel

    Abstract: We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing, but the emergence of transformers with a context size of 1 million or more tokens now enables entirely automatic approaches. Our objective is to test the capabilities of LLMs to analyze, unde… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  4. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  6. arXiv:2311.07587  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

    Authors: C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant , et al. (5 additional authors not shown)

    Abstract: We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that mak… ▽ More

    Submitted 15 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  7. arXiv:2307.09542  [pdf, other

    cs.LG cs.CV

    Can Neural Network Memorization Be Localized?

    Authors: Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang

    Abstract: Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $\textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual lay… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted at ICML 2023

  8. arXiv:2302.13602  [pdf, other

    cs.CV cs.LG

    The Role of Pre-training Data in Transfer Learning

    Authors: Rahim Entezari, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, Ludwig Schmidt

    Abstract: The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  9. arXiv:2212.04461  [pdf, other

    cs.LG

    Leveraging Unlabeled Data to Track Memorization

    Authors: Mahsa Forouzesh, Hanie Sedghi, Patrick Thiran

    Abstract: Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called susceptibility, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  10. arXiv:2211.10193  [pdf, other

    cs.LG

    Layer-Stack Temperature Scaling

    Authors: Amr Khalifa, Michael C. Mozer, Hanie Sedghi, Behnam Neyshabur, Ibrahim Alabdulmohsin

    Abstract: Recent works demonstrate that early layers in a neural network contain useful information for prediction. Inspired by this, we show that extending temperature scaling across all layers improves both calibration and accuracy. We call this procedure "layer-stack temperature scaling" (LATES). Informally, LATES grants each layer a weighted vote during inference. We evaluate it on five popular convolut… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: 10 pages, 7 figures, 3 tables

    ACM Class: I.2.6; I.2.10

  11. arXiv:2211.09066  [pdf, other

    cs.LG cs.AI cs.CL

    Teaching Algorithmic Reasoning via In-context Learning

    Authors: Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi

    Abstract: Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. 2022 showed that even simple algorithmic reasoning tasks such… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  12. arXiv:2211.08403  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    REPAIR: REnormalizing Permuted Activations for Interpolation Repair

    Authors: Keller Jordan, Hanie Sedghi, Olga Saukh, Rahim Entezari, Behnam Neyshabur

    Abstract: In this paper we look into the conjecture of Entezari et al. (2021) which states that if the permutation invariance of neural networks is taken into account, then there is likely no loss barrier to the linear interpolation between SGD solutions. First, we observe that neuron alignment methods alone are insufficient to establish low-barrier linear connectivity between SGD solutions due to a phenome… ▽ More

    Submitted 25 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  13. arXiv:2206.10915  [pdf, other

    cs.CV

    Understanding the effect of sparsity on neural networks robustness

    Authors: Lukas Timpl, Rahim Entezari, Hanie Sedghi, Behnam Neyshabur, Olga Saukh

    Abstract: This paper examines the impact of static sparsity on the robustness of a trained network to weight perturbations, data corruption, and adversarial examples. We show that, up to a certain sparsity achieved by increasing network width and depth while keeping the network capacity fixed, sparsified networks consistently match and often outperform their initially dense versions. Robustness and accuracy… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  14. arXiv:2201.04234  [pdf, other

    cs.LG stat.ML

    Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

    Authors: Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi

    Abstract: Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on… ▽ More

    Submitted 14 October, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: Accepted at ICLR 2022

  15. arXiv:2110.06296  [pdf, other

    cs.LG

    The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

    Authors: Rahim Entezari, Hanie Sedghi, Olga Saukh, Behnam Neyshabur

    Abstract: In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them. Although it is a bold conjecture, we show how extensive empirical attempts fall short of refuting it. We further provide a preliminary theoretical result to support our conjecture. Our conjecture has implicatio… ▽ More

    Submitted 5 July, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

  16. arXiv:2110.02095  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Exploring the Limits of Large Scale Pre-training

    Authors: Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

    Abstract: Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks. In this work, we systematically study this phenomena and establish that, as we increase the upstream accuracy, the performance of downstream tasks saturates. In particular,… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  17. arXiv:2106.06080  [pdf, other

    cs.LG cs.AI

    Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

    Authors: Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

    Abstract: We focus on the problem of domain adaptation when the goal is shifting the model towards the target distribution, rather than learning domain invariant representations. It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be success… ▽ More

    Submitted 13 July, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  18. arXiv:2010.08127  [pdf, other

    cs.LG cs.CV cs.NE math.ST stat.ML

    The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

    Authors: Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi

    Abstract: We propose a new framework for reasoning about generalization in deep learning. The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1) the Ideal World test error plus (2) the gap between the two worlds. If… ▽ More

    Submitted 18 February, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted to ICLR 2021

  19. arXiv:2008.11687  [pdf, other

    cs.LG stat.ML

    What is being transferred in transfer learning?

    Authors: Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang

    Abstract: One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce. Despite ample adaptation of transfer learning in various deep learning applications, we yet do not understand what enables a successful transfer and which part of the network is responsible for that. In this paper, we provide new tools and analyses to address thes… ▽ More

    Submitted 14 January, 2021; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: Equal contribution, authors ordered randomly

    Journal ref: NeurIPS 2020

  20. arXiv:1912.00528  [pdf, other

    cs.LG stat.ML

    The intriguing role of module criticality in the generalization of deep networks

    Authors: Niladri S. Chatterji, Behnam Neyshabur, Hanie Sedghi

    Abstract: We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network's performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measur… ▽ More

    Submitted 14 February, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

  21. arXiv:1905.12600  [pdf, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    Generalization bounds for deep convolutional neural networks

    Authors: Philip M. Long, Hanie Sedghi

    Abstract: We prove bounds on the generalization error of convolutional networks. The bounds are in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights. They are independent of the number of pixels in the input, and the height and width of hidden feature maps. We present experiments using CIFAR-10 with varying hyper… ▽ More

    Submitted 8 April, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Published as a conference paper at ICLR 2020

  22. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  23. arXiv:1901.02104  [pdf, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    On the effect of the activation function on the distribution of hidden nodes in a deep network

    Authors: Philip M. Long, Hanie Sedghi

    Abstract: We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in $\{ -1, 1\}^N$. We show that, if the activation function $φ$ satisfies a minimal set of assumptions, satisfied by all activation functions that… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

  24. arXiv:1805.10408  [pdf, other

    cs.LG cs.AI stat.ML

    The Singular Values of Convolutional Layers

    Authors: Hanie Sedghi, Vineet Gupta, Philip M. Long

    Abstract: We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network usin… ▽ More

    Submitted 5 March, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: Published as a conference paper at ICLR 2019

  25. arXiv:1612.03871  [pdf, other

    cs.AI cs.LG stat.ML

    Knowledge Completion for Generics using Guided Tensor Factorization

    Authors: Hanie Sedghi, Ashish Sabharwal

    Abstract: Given a knowledge base or KB containing (noisy) facts about common nouns or generics, such as "all trees produce oxygen" or "some animals live in forests", we consider the problem of inferring additional such facts at a precision similar to that of the starting KB. Such KBs capture general knowledge about the world, and are crucial for various applications such as question answering. Different fro… ▽ More

    Submitted 28 March, 2018; v1 submitted 12 December, 2016; originally announced December 2016.

    Comments: To appear in TACL

  26. arXiv:1603.00954  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Training Input-Output Recurrent Neural Networks through Spectral Methods

    Authors: Hanie Sedghi, Anima Anandkumar

    Abstract: We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks. We propose a novel spectral approach for learning the network parameters. It is based on decomposition of the cross-moment tensor between the output and a non-linear transformation of the input, based on score functions. We guarantee consistent learning with polynomial sample and computati… ▽ More

    Submitted 31 October, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

  27. arXiv:1506.08473  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

    Authors: Majid Janzamin, Hanie Sedghi, Anima Anandkumar

    Abstract: Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimens… ▽ More

    Submitted 11 January, 2016; v1 submitted 28 June, 2015; originally announced June 2015.

    Comments: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural network

  28. arXiv:1506.00099  [pdf

    cs.AI cs.NI

    A Novel Energy Aware Node Clustering Algorithm for Wireless Sensor Networks Using a Modified Artificial Fish Swarm Algorithm

    Authors: Reza Azizi, Hasan Sedghi, Hamid Shoja, Alireza Sepas-Moghaddam

    Abstract: Clustering problems are considered amongst the prominent challenges in statistics and computational science. Clustering of nodes in wireless sensor networks which is used to prolong the life-time of networks is one of the difficult tasks of clustering procedure. In order to perform nodes clustering, a number of nodes are determined as cluster heads and other ones are joined to one of these heads,… ▽ More

    Submitted 30 May, 2015; originally announced June 2015.

    Comments: 13 pages, 5 figures, 2 tables, International Journal of Computer Networks & Communications(IJCNC) Vol.7, No.3, May 2015

  29. arXiv:1503.04567  [pdf, ps, other

    cs.LG cs.SI stat.ML

    Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods

    Authors: Anima Anandkumar, Hanie Sedghi

    Abstract: Community detection in graphs has been extensively studied both in theory and in applications. However, detecting communities in hypergraphs is more challenging. In this paper, we propose a tensor decomposition approach for guaranteed learning of communities in a special class of hypergraphs modeling social tagging systems or folksonomies. A folksonomy is a tripartite 3-uniform hypergraph consisti… ▽ More

    Submitted 22 April, 2015; v1 submitted 16 March, 2015; originally announced March 2015.

  30. arXiv:1412.6514  [pdf, ps, other

    cs.LG stat.ML

    Score Function Features for Discriminative Learning

    Authors: Majid Janzamin, Hanie Sedghi, Anima Anandkumar

    Abstract: Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and l… ▽ More

    Submitted 19 April, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: Accepted as a workshop contribution at ICLR 2015. A longer version of this work is also available on arXiv: http://arxiv.org/abs/1412.2863

  31. arXiv:1412.3046  [pdf, ps, other

    cs.LG stat.ML

    Provable Tensor Methods for Learning Mixtures of Generalized Linear Models

    Authors: Hanie Sedghi, Majid Janzamin, Anima Anandkumar

    Abstract: We consider the problem of learning mixtures of generalized linear models (GLM) which arise in classification and regression problems. Typical learning approaches such as expectation maximization (EM) or variational Bayes can get stuck in spurious local optima. In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters. The key insight is to emplo… ▽ More

    Submitted 12 January, 2016; v1 submitted 9 December, 2014; originally announced December 2014.

    Comments: To appear in Proceeding of AI and Statistics (AISTATS) 2016

  32. arXiv:1412.2863  [pdf, ps, other

    cs.LG stat.ML

    Score Function Features for Discriminative Learning: Matrix and Tensor Framework

    Authors: Majid Janzamin, Hanie Sedghi, Anima Anandkumar

    Abstract: Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and l… ▽ More

    Submitted 11 December, 2014; v1 submitted 9 December, 2014; originally announced December 2014.

    Comments: 29 pages

  33. arXiv:1412.2693  [pdf, ps, other

    cs.LG cs.NE stat.ML

    Provable Methods for Training Neural Networks with Sparse Connectivity

    Authors: Hanie Sedghi, Anima Anandkumar

    Abstract: We provide novel guaranteed approaches for training feedforward neural networks with sparse connectivity. We leverage on the techniques developed previously for learning linear networks and show that they can also be effectively adopted to learn non-linear networks. We operate on the moments involving label and the score function of the input, and show that their factorization provably yields the… ▽ More

    Submitted 28 April, 2015; v1 submitted 8 December, 2014; originally announced December 2014.

    Comments: Accepted for presentation at Neural Information Processing Systems(NIPS) 2014 Deep Learning workshop and Accepted as a workshop contribution at ICLR 2015

  34. arXiv:1403.1863  [pdf, other

    cs.LG eess.SY

    Statistical Structure Learning, Towards a Robust Smart Grid

    Authors: Hanie Sedghi, Edmond Jonckheere

    Abstract: Robust control and maintenance of the grid relies on accurate data. Both PMUs and state estimators are prone to false data injection attacks. Thus, it is crucial to have a mechanism for fast and accurate detection of an agent maliciously tampering with the data---for both preventing attacks that may lead to blackouts, and for routine monitoring and control tasks of current and future grids. We pro… ▽ More

    Submitted 7 March, 2014; originally announced March 2014.

  35. arXiv:1402.5131  [pdf, ps, other

    cs.LG math.OC stat.ML

    Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition

    Authors: Hanie Sedghi, Anima Anandkumar, Edmond Jonckheere

    Abstract: We propose an efficient ADMM method with guarantees for high-dimensional problems. We provide explicit bounds for the sparse optimization problem and the noisy matrix decomposition problem. For sparse optimization, we establish that the modified ADMM method has an optimal convergence rate of $\mathcal{O}(s\log d/T)$, where $s$ is the sparsity level, $d$ is the data dimension and $T$ is the number… ▽ More

    Submitted 6 July, 2015; v1 submitted 20 February, 2014; originally announced February 2014.

    Comments: appeared in Neural Information Processing Systems(NIPS) 2014. arXiv admin note: text overlap with arXiv:1207.4421 by other authors