Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Raman, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.10630  [pdf, other

    cs.CL cs.LG

    HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

    Authors: Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan

    Abstract: Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML had led to a scarcity of the expensive conventional accelerators (su… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  2. arXiv:2404.10575  [pdf, other

    cs.LG cs.AI cs.CV math.OC

    EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

    Authors: Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

    Abstract: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 20 pages

  3. arXiv:2404.08080  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

    Authors: Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha

    Abstract: Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-sh… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 29 pages, 25 tables, 9 figures

  4. arXiv:2401.08893  [pdf, other

    cs.LG math.OC

    MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

    Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

    Abstract: Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  5. arXiv:2401.03058  [pdf, other

    math.OC cs.LG stat.ML

    Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

    Authors: Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

    Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 27 pages, 2 figures

  6. arXiv:2312.08538  [pdf, other

    cs.LG cs.AI

    Contractive error feedback for gradient compression

    Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis

    Abstract: On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  7. arXiv:2209.04447  [pdf

    cs.LG physics.optics

    Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures

    Authors: Christopher Yeung, Benjamin Pham, Zihan Zhang, Katherine T. Fountaine, Aaswath P. Raman

    Abstract: From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning appr… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  8. arXiv:2011.00105  [pdf, other

    cs.CL cs.AI

    Learning Structured Representations of Entity Names using Active Learning and Weak Supervision

    Authors: Kun Qian, Poornima Chozhiyath Raman, Yunyao Li, Lucian Popa

    Abstract: Structured representations of entity names are useful for many entity-related tasks such as entity normalization and variant generation. Learning the implicit structured representations of entity names without context and external knowledge is particularly challenging. In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: Accepted to EMNLP 2020

  9. arXiv:2004.13940  [pdf, other

    cs.LG stat.ML

    DS-FACTO: Doubly Separable Factorization Machines

    Authors: Parameswaran Raman, S. V. N. Vishwanathan

    Abstract: Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models. They have been used successfully in several real-world tasks such as click-prediction, ranking and recommender systems. Despite using a low-rank representation for the pairwise features, the memory overheads of using factorization machines… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

  10. arXiv:1909.06463  [pdf, other

    cs.CG cs.LG math.OC

    Optimization on the Surface of the (Hyper)-Sphere

    Authors: Parameswaran Raman, Jiasen Yang

    Abstract: Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions. When $k=2$, i.e. a 2-sphere (a circle), the particles appear at equally spaced points. Such a configuration can be computed analytically. However, for higher dimensions such as $k \ge 3$, i.e. the case of 3-sphere (standard sphere), ther… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

  11. arXiv:1604.04706  [pdf, other

    cs.LG stat.ML

    DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

    Authors: Parameswaran Raman, Sriram Srinivasan, Shin Matsushima, Xinhua Zhang, Hyokun Yun, S. V. N. Vishwanathan

    Abstract: Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic re… ▽ More

    Submitted 3 August, 2018; v1 submitted 16 April, 2016; originally announced April 2016.

  12. arXiv:1402.2676  [pdf, other

    stat.ML cs.DC cs.LG stat.CO

    Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data

    Authors: Hyokun Yun, Parameswaran Raman, S. V. N. Vishwanathan

    Abstract: We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative algorithms in the literature. On the other hand, in large scale problems where explicit feature vectors… ▽ More

    Submitted 21 August, 2014; v1 submitted 11 February, 2014; originally announced February 2014.

  13. arXiv:1305.4757  [pdf, other

    cs.LG cs.CG

    Power to the Points: Validating Data Memberships in Clusterings

    Authors: Parasaran Raman, Suresh Venkatasubramanian

    Abstract: A clustering is an implicit assignment of labels of points, based on proximity to other points. It is these labels that are then used for downstream analysis (either focusing on individual clusters, or identifying representatives of clusters and so on). Thus, in order to trust a clustering as a first step in exploratory data analysis, we must trust the labels assigned to individual data. Without s… ▽ More

    Submitted 21 May, 2013; originally announced May 2013.

    Comments: 18 pages, 9 figures, 5 tables

  14. arXiv:1206.5580  [pdf, other

    cs.LG stat.ML

    A Geometric Algorithm for Scalable Multiple Kernel Learning

    Authors: John Moeller, Parasaran Raman, Avishek Saha, Suresh Venkatasubramanian

    Abstract: We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization r… ▽ More

    Submitted 15 March, 2014; v1 submitted 25 June, 2012; originally announced June 2012.

    Comments: 20 pages

  15. arXiv:1108.0017  [pdf, other

    cs.LG cs.DB

    Generating a Diverse Set of High-Quality Clusterings

    Authors: Jeff M. Phillips, Parasaran Raman, Suresh Venkatasubramanian

    Abstract: We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of r… ▽ More

    Submitted 29 July, 2011; originally announced August 2011.

    Comments: 12 Pages, 5 Figures, 2nd MultiClust Workshop at ECML PKDD 2011

  16. arXiv:1102.0026  [pdf, other

    cs.LG cs.CG cs.DB

    Spatially-Aware Comparison and Consensus for Clusterings

    Authors: Parasaran Raman, Jeff M. Phillips, Suresh Venkatasubramanian

    Abstract: This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clusterin… ▽ More

    Submitted 31 January, 2011; originally announced February 2011.

    Comments: 12 Pages, 9 figures, Proceedings of 2011 Siam International Conference on Data Mining