Skip to main content

Showing 1–50 of 57 results for author: Dhillon, I S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17188  [pdf, other

    cs.LG cs.AI

    Geometric Median (GM) Matching for Robust Data Pruning

    Authors: Anish Acharya, Inderjit S Dhillon, Sujay Sanghavi

    Abstract: Data pruning, the combinatorial task of selecting a small and informative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large-scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in pr… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.11206  [pdf, other

    cs.LG cs.CR stat.ML

    Retraining with Predicted Hard Labels Provably Increases Model Accuracy

    Authors: Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong

    Abstract: The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2402.07114  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Towards Quantifying the Preconditioning Effect of Adam

    Authors: Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a notable dearth of results characterizing the preconditioning effect of Adam and showing how it may alleviate the curse of ill-conditioning -- an issue plaguing gradient descent (GD). In this work, we perform a detailed analysis of Adam's preconditioning effect for quadratic functions and quantify to what extent Adam can mitigate the dependence on the condition number of the Hessian. Our… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  4. arXiv:2311.10117  [pdf, other

    cs.AI cs.LG

    Automatic Engineering of Long Prompts

    Authors: Cho-Jui Hsieh, Si Si, Felix X. Yu, Inderjit S. Dhillon

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in solving complex open-domain tasks, guided by comprehensive instructions and demonstrations provided in the form of prompts. However, these prompts can be lengthy, often comprising hundreds of lines and thousands of tokens, and their design often requires considerable human effort. Recent research has explored automatic promp… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  5. arXiv:2211.00635  [pdf, other

    cs.CL cs.LG

    Two-stage LLM Fine-tuning with Less Specialization and More Generalization

    Authors: Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

    Abstract: Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tun… ▽ More

    Submitted 12 March, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: ICLR 2024

  6. arXiv:2210.08410  [pdf, other

    cs.LG cs.IR

    ELIAS: End-to-End Learning to Index and Search in Large Output Spaces

    Authors: Nilesh Gupta, Patrick H. Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S Dhillon

    Abstract: Extreme multi-label classification (XMC) is a popular framework for solving many real-world problems that require accurate prediction from a very large number of potential output choices. A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search. Existing methods ini… ▽ More

    Submitted 9 January, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

    Comments: 21 pages, 9 figures, NeurIPS 2022 camera-ready publication

  7. arXiv:2206.11408  [pdf, other

    cs.LG

    FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search

    Authors: Patrick H. Chen, Chang Wei-cheng, Yu Hsiang-fu, Inderjit S. Dhillon, Hsieh Cho-jui

    Abstract: Approximate K-Nearest Neighbor Search (AKNNS) has now become ubiquitous in modern applications, for example, as a fast search procedure with two tower deep learning models. Graph-based methods for AKNNS in particular have received great attention due to their superior performance. These methods rely on greedy graph search to traverse the data points as embedding vectors in a database. Under this g… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  8. arXiv:2204.10936  [pdf, other

    cs.IR cs.LG stat.ML

    Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

    Authors: Adam Block, Rahul Kidambi, Daniel N. Hill, Thorsten Joachims, Inderjit S. Dhillon

    Abstract: Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  9. arXiv:2202.12230  [pdf, other

    cs.LG

    Sample Efficiency of Data Augmentation Consistency Regularization

    Authors: Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S. Dhillon, Sujay Sanghavi, Qi Lei

    Abstract: Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data. In this paper, we take a step in this direction - we first present a simple and novel analysis for linear regression with label invariant augmentations, demonstrating that data augmentation consistenc… ▽ More

    Submitted 16 June, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  10. arXiv:2111.00064  [pdf, other

    cs.LG

    Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

    Authors: Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, Inderjit S Dhillon

    Abstract: Learning on graphs has attracted significant attention in the learning community due to numerous real-world applications. In particular, graph neural networks (GNNs), which take numerical node features and graph structure as inputs, have been shown to achieve state-of-the-art performance on various graph-related learning tasks. Recent works exploring the correlation between numerical node features… ▽ More

    Submitted 11 March, 2022; v1 submitted 29 October, 2021; originally announced November 2021.

    Comments: Published in ICLR 2022

  11. arXiv:2110.14011  [pdf, other

    cs.LG stat.ML

    Cluster-and-Conquer: A Framework For Time-Series Forecasting

    Authors: Reese Pathak, Rajat Sen, Nikhil Rao, N. Benjamin Erichson, Michael I. Jordan, Inderjit S. Dhillon

    Abstract: We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time ser… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: 25 pages, 3 figures

  12. arXiv:2110.00685  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

    Authors: Jiong Zhang, Wei-cheng Chang, Hsiang-fu Yu, Inderjit S. Dhillon

    Abstract: Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC… ▽ More

    Submitted 28 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

  13. arXiv:2106.12751  [pdf, other

    stat.ML cs.LG

    Label Disentanglement in Partition-based Extreme Multilabel Classification

    Authors: Xuanqing Liu, Wei-Cheng Chang, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S. Dhillon

    Abstract: Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

  14. arXiv:2106.12657  [pdf, other

    cs.IR cs.LG

    Extreme Multi-label Learning for Semantic Matching in Product Search

    Authors: Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon-Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, Inderjit S. Dhillon

    Abstract: We consider the problem of semantic matching in product search: given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more. Because of large catalog spaces and real-time latency constraints, semantic matching algorithms not only desire high recall but also need to have low latency. Conventional lexical matching approaches (e.g., Okapi-BM25)… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted in KDD 2021 Applied Data Science Track

  15. arXiv:2106.08882  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

    Authors: Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{G… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  16. arXiv:2106.07094  [pdf, other

    cs.LG cs.DC eess.SP math.OC stat.ML

    On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

    Authors: Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update i… ▽ More

    Submitted 15 April, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

  17. arXiv:2106.00730  [pdf, other

    cs.LG cs.DS

    Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification

    Authors: Tavor Z. Baharav, Daniel L. Jiang, Kedarnath Kolluri, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For… ▽ More

    Submitted 21 September, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: To appear in CIKM 2021

  18. arXiv:2103.02741  [pdf, other

    cs.LG

    Combinatorial Bandits without Total Order for Arms

    Authors: Shuo Yang, Tongzheng Ren, Inderjit S. Dhillon, Sujay Sanghavi

    Abstract: We consider the combinatorial bandits problem, where at each time step, the online learner selects a size-$k$ subset $s$ from the arms set $\mathcal{A}$, where $\left|\mathcal{A}\right| = n$, and observes a stochastic reward of each arm in the selected set $s$. The goal of the online learner is to minimize the regret, induced by not selecting $s^*$ which maximizes the expected total reward. Specif… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

  19. arXiv:2103.02729  [pdf, other

    cs.LG

    Linear Bandit Algorithms with Sublinear Time Complexity

    Authors: Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi

    Abstract: We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate M… ▽ More

    Submitted 9 June, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted at ICML 2022

  20. Session-Aware Query Auto-completion using Extreme Multi-label Ranking

    Authors: Nishant Yadav, Rajat Sen, Daniel N. Hill, Arya Mazumdar, Inderjit S. Dhillon

    Abstract: Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the user's intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the user's prefix. Such session-aware QACs can be generated by re… ▽ More

    Submitted 21 August, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: Accepted in KDD 2021. Updated results for baseline XMR

  21. arXiv:2012.04061  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Faster Non-Convex Federated Learning via Global and Local Momentum

    Authors: Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key… ▽ More

    Submitted 24 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

  22. arXiv:2010.05878  [pdf, other

    cs.LG

    PECOS: Prediction for Enormous and Correlated Output Spaces

    Authors: Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang, Inderjit S. Dhillon

    Abstract: Many large-scale applications amount to finding relevant results from an enormous output space of potential candidates. For example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can range from millions to billions, and can even be infinite in some applications. Moreover, training data… ▽ More

    Submitted 18 January, 2022; v1 submitted 12 October, 2020; originally announced October 2020.

  23. arXiv:2009.12947  [pdf, other

    stat.ML cs.LG

    Learning from eXtreme Bandit Feedback

    Authors: Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

    Abstract: We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces. Learning from extreme bandit feedback is ubiquitous in recommendation systems, in which billions of decisions are made over sets consisting of millions of choices in a single day, yielding massive observational data. In these large-scale real-world applications, supervised learning framewor… ▽ More

    Submitted 22 February, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Journal ref: AAAI Conference on Artificial Intelligence 2021

  24. Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis

    Authors: Joyce Jiyoung Whang, Inderjit S. Dhillon

    Abstract: The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix. A number of co-clustering techniques have been proposed including information-theoretic co-clustering and the minimum sum-squared residue co-clustering method. However, most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clust… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Journal ref: "Non-Exhaustive, Overlapping Co-Clustering", Proceedings of the 26th ACM Conference on Information and Knowledge Management (CIKM), pages 2367-2370, November 2017

  25. arXiv:1908.10408  [pdf, other

    cs.LG cs.IR stat.ML

    Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

    Authors: Vikas K. Garg, Inderjit S. Dhillon, Hsiang-Fu Yu

    Abstract: The architecture of Transformer is based entirely on self-attention, and has been shown to outperform models that employ recurrence on sequence transduction tasks such as machine translation. The superior performance of Transformer has been attributed to propagating signals over shorter distances, between positions in the input and the output, compared to the recurrent architectures. We establish… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

    Comments: Initial version

  26. arXiv:1906.07437  [pdf, other

    cs.LG stat.ML

    Inverting Deep Generative models, One layer at a time

    Authors: Qi Lei, Ajil Jalal, Inderjit S. Dhillon, Alexandros G. Dimakis

    Abstract: We study the problem of inverting a deep generative model with ReLU activations. Inversion corresponds to finding a latent code vector that explains observed measurements as much as possible. In most prior works this is performed by attempting to solve a non-convex optimization problem involving the generator. In this paper we obtain several novel theoretical results for the inversion problem. W… ▽ More

    Submitted 19 June, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

  27. arXiv:1906.02436  [pdf, other

    cs.LG math.OC stat.ML

    Primal-Dual Block Frank-Wolfe

    Authors: Qi Lei, Jiacheng Zhuo, Constantine Caramanis, Inderjit S. Dhillon, Alexandros G. Dimakis

    Abstract: We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural compl… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

  28. arXiv:1905.03381  [pdf, other

    cs.LG cs.AI stat.ML

    AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

    Authors: Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon

    Abstract: Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

  29. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  30. arXiv:1901.04684  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    The Limitations of Adversarial Training and the Blind-Spot Attack

    Authors: Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh

    Abstract: The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance betw… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2019. Huan Zhang and Hongge Chen contributed equally

  31. arXiv:1812.00151  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

    Authors: Qi Lei, Lingfei Wu, Pin-Yu Chen, Alexandros G. Dimakis, Inderjit S. Dhillon, Michael Witbrock

    Abstract: Adversarial examples are carefully constructed modifications to an input that completely change the output of a classifier but are imperceptible to humans. Despite these successful attacks for continuous data (such as image and audio samples), generating adversarial examples for discrete structures such as text has proven significantly more challenging. In this paper we formulate the attacks with… ▽ More

    Submitted 4 April, 2019; v1 submitted 1 December, 2018; originally announced December 2018.

    Comments: In SysML 2019

  32. arXiv:1805.10477  [pdf, other

    cs.LG stat.ML

    Nonlinear Inductive Matrix Completion based on One-layer Neural Networks

    Authors: Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon

    Abstract: The goal of a recommendation system is to predict the interest of a user in a given item by exploiting the existing set of ratings as well as certain user/item features. A standard approach to modeling this problem is Inductive Matrix Completion where the predicted rating is modeled as an inner product of the user and the item features projected onto a latent space. In order to learn the parameter… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

  33. arXiv:1804.09699  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    Towards Fast Computation of Certified Robustness for ReLU Networks

    Authors: Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, Luca Daniel

    Abstract: Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low qua… ▽ More

    Submitted 2 October, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: Tsui-Wei Weng and Huan Zhang contributed equally

  34. arXiv:1803.09327  [pdf, other

    cs.LG stat.ML

    Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

    Authors: Jiong Zhang, Qi Lei, Inderjit S. Dhillon

    Abstract: Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks~(RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by it… ▽ More

    Submitted 25 March, 2018; originally announced March 2018.

    Comments: main text 13 pages, 22 pages including reference and appendix

  35. arXiv:1803.06585  [pdf, other

    cs.LG stat.ML

    Learning Long Term Dependencies via Fourier Recurrent Units

    Authors: Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon

    Abstract: It is a known fact that training recurrent neural networks for tasks that have long term dependencies is challenging. One of the main reasons is the vanishing or exploding gradient problem, which prevents gradient information from propagating to early layers. In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its… ▽ More

    Submitted 17 March, 2018; originally announced March 2018.

  36. arXiv:1711.03440  [pdf, other

    cs.LG cs.DS stat.ML

    Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels

    Authors: Kai Zhong, Zhao Song, Inderjit S. Dhillon

    Abstract: In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels. We show that when the inputs follow Gaussian distribution and the sample size is sufficiently large, the squared loss of such CNNs is $\mathit{~locally~strongly~convex}$ in a basin of attraction near the global optima for most popular activation functions, like ReLU, Leaky… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: text overlap with arXiv:1706.03175

  37. arXiv:1706.03175  [pdf, other

    cs.LG cs.DS stat.ML

    Recovery Guarantees for One-hidden-layer Neural Networks

    Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

    Abstract: In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to $\mathit{local~strong~convexity}$ in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective. Most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), le… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

    Comments: ICML 2017

  38. arXiv:1702.03584  [pdf, other

    cs.AI cs.LG

    Similarity Preserving Representation Learning for Time Series Clustering

    Authors: Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, Inderjit S. Dhillon

    Abstract: A considerable amount of clustering algorithms take instance-feature matrices as their inputs. As such, they cannot directly analyze time series data due to its temporal nature, usually unequal lengths, and complex properties. This is a great pity since many of these algorithms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representa… ▽ More

    Submitted 2 June, 2019; v1 submitted 12 February, 2017; originally announced February 2017.

  39. arXiv:1610.03317  [pdf, other

    cs.DS cs.LG

    A Greedy Approach for Budgeted Maximum Inner Product Search

    Authors: Hsiang-Fu Yu, Cho-Jui Hsieh, Qi Lei, Inderjit S. Dhillon

    Abstract: Maximum Inner Product Search (MIPS) is an important task in many machine learning applications such as the prediction phase of a low-rank matrix factorization model for a recommender system. There have been some works on how to perform MIPS in sub-linear time recently. However, most of them do not have the flexibility to control the trade-off between search efficient and search quality. In this pa… ▽ More

    Submitted 11 October, 2016; originally announced October 2016.

  40. arXiv:1608.02010  [pdf, other

    cs.LG

    Communication-Efficient Parallel Block Minimization for Kernel Machines

    Authors: Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

    Abstract: Kernel machines often yield superior predictive performance on various tasks; however, they suffer from severe computational challenges. In this paper, we show how to overcome the important challenge of speeding up kernel machines. In particular, we develop a parallel block minimization framework for solving kernel machines, including kernel SVM and kernel logistic regression. Our framework procee… ▽ More

    Submitted 5 August, 2016; originally announced August 2016.

  41. arXiv:1602.01910  [pdf, other

    cs.LG

    Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering

    Authors: Yangyang Hou, Joyce Jiyoung Whang, David F. Gleich, Inderjit S. Dhillon

    Abstract: Clustering is one of the most fundamental and important tasks in data mining. Traditional clustering algorithms, such as K-means, assign every data point to exactly one cluster. However, in real-world datasets, the clusters may overlap with each other. Furthermore, often, there are outliers that should not belong to any cluster. We recently proposed the NEO-K-Means (Non-Exhaustive, Overlapping K-M… ▽ More

    Submitted 4 February, 2016; originally announced February 2016.

    Comments: 9 pages. 2 figures

  42. arXiv:1509.08333  [pdf, other

    cs.LG stat.ML

    High-dimensional Time Series Prediction with Missing Values

    Authors: Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon

    Abstract: High-dimensional time series prediction is needed in applications as diverse as demand forecasting and climatology. Often, such applications require methods that are both highly scalable, and deal with noisy data in terms of corruptions or missing values. Classical time series methods usually fall short of handling both these issues. In this paper, we propose to adapt matrix matrix completion appr… ▽ More

    Submitted 16 February, 2016; v1 submitted 28 September, 2015; originally announced September 2015.

  43. arXiv:1507.04457  [pdf, other

    stat.ML cs.LG

    Preference Completion: Large-scale Collaborative Ranking from Pairwise Comparisons

    Authors: Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: In this paper we consider the collaborative ranking setting: a pool of users each provides a small number of pairwise preferences between $d$ possible items; from these we need to predict preferences of the users for items they have not yet seen. We do so by fitting a rank $r$ score matrix to the pairwise data, and provide two main contributions: (a) we show that an algorithm based on convex optim… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

  44. arXiv:1505.01802  [pdf, ps, other

    cs.LG stat.ML

    Optimal Decision-Theoretic Classification Using Non-Decomposable Performance Metrics

    Authors: Nagarajan Natarajan, Oluwasanmi Koyejo, Pradeep Ravikumar, Inderjit S. Dhillon

    Abstract: We provide a general theoretical analysis of expected out-of-sample utility, also referred to as decision-theoretic classification, for non-decomposable binary classification metrics such as F-measure and Jaccard coefficient. Our key result is that the expected out-of-sample utility for many performance metrics is provably optimized by a classifier which is equivalent to a signed thresholding of t… ▽ More

    Submitted 7 May, 2015; originally announced May 2015.

  45. arXiv:1504.01365  [pdf, other

    cs.LG

    PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent

    Authors: Cho-Jui Hsieh, Hsiang-Fu Yu, Inderjit S. Dhillon

    Abstract: Stochastic Dual Coordinate Descent (SDCD) has become one of the most efficient ways to solve the family of $\ell_2$-regularized empirical risk minimization problems, including linear SVM, logistic regression, and many others. The vanilla implementation of DCD is quite slow; however, by maintaining primal variables while updating dual variables, the time complexity of SDCD can be significantly redu… ▽ More

    Submitted 6 April, 2015; originally announced April 2015.

  46. arXiv:1503.07439  [pdf, ps, other

    cs.SI physics.soc-ph

    Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion

    Authors: Joyce Jiyoung Whang, David F. Gleich, Inderjit S. Dhillon

    Abstract: Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities naturally overlap. For instance, in a social network, each vertex in a graph corresponds to an individual who usually participates in multiple commun… ▽ More

    Submitted 3 April, 2015; v1 submitted 25 March, 2015; originally announced March 2015.

  47. arXiv:1412.4986  [pdf, other

    cs.DC cs.IR cs.LG

    A Scalable Asynchronous Distributed Algorithm for Topic Modeling

    Authors: Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S. V. N Vishwanathan, Inderjit S. Dhillon

    Abstract: Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel alg… ▽ More

    Submitted 16 December, 2014; originally announced December 2014.

  48. arXiv:1411.6081  [pdf, other

    cs.LG math.NA stat.ML

    PU Learning for Matrix Completion

    Authors: Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon

    Abstract: In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem of learning from only positive and… ▽ More

    Submitted 21 November, 2014; originally announced November 2014.

  49. arXiv:1311.0914  [pdf, other

    cs.LG

    A Divide-and-Conquer Solver for Kernel Support Vector Machines

    Authors: Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

    Abstract: The kernel support vector machine (SVM) is one of the most widely used classification methods; however, the amount of computation required becomes the bottleneck when facing millions of samples. In this paper, we propose and analyze a novel divide-and-conquer solver for kernel SVMs (DC-SVM). In the division step, we partition the kernel SVM problem into smaller subproblems by clustering the data,… ▽ More

    Submitted 4 November, 2013; originally announced November 2013.

  50. arXiv:1307.5101  [pdf, other

    cs.LG

    Large-scale Multi-label Learning with Missing Labels

    Authors: Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, Inderjit S. Dhillon

    Abstract: The multi-label classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) the ability to tackle problems with a large number (say millions) of labels, and (b) the ability to handle data with missing labels. In this paper, we directly address both these problems by studying the multi-label problem in a g… ▽ More

    Submitted 25 November, 2013; v1 submitted 18 July, 2013; originally announced July 2013.