Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Kaul, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.01616  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative Active Learning for the Search of Small-molecule Protein Binders

    Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

    Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  2. arXiv:2304.06941  [pdf, other

    cs.LG cs.AI

    AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

    Authors: Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul, Pradeep Dubey

    Abstract: Sparse training is emerging as a promising avenue for reducing the computational cost of training neural networks. Several recent studies have proposed pruning methods using learnable thresholds to efficiently explore the non-uniform distribution of sparsity inherent within the models. In this paper, we propose Gradient Annealing (GA), where gradients of masked weights are scaled down in a non-lin… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  3. arXiv:2104.08002  [pdf, other

    cs.LG cs.AI cs.DC

    Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

    Authors: Narendra Chaudhary, Sanchit Misra, Dhiraj Kalamkar, Alexander Heinecke, Evangelos Georganas, Barukh Ziv, Menachem Adelman, Bharat Kaul

    Abstract: Convolutional neural networks (CNNs) have found many applications in tasks involving two-dimensional (2D) data, such as image classification and image processing. Therefore, 2D convolution layers have been heavily optimized on CPUs and GPUs. However, in many applications - for example genomics and speech recognition, the data can be one-dimensional (1D). Such applications can benefit from optimize… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  4. arXiv:2104.05573  [pdf, other

    cs.PL

    AI Powered Compiler Techniques for DL Code Optimization

    Authors: Sanket Tavarageri, Gagandeep Goyal, Sasikanth Avancha, Bharat Kaul, Ramakrishna Upadrasta

    Abstract: Creating high performance implementations of deep learning primitives on CPUs is a challenging task. Multiple considerations including multi-level cache hierarchy, and wide SIMD units of CPU platforms influence the choice of program transformations to apply for performance optimization. In this paper, we present machine learning powered compiler techniques to optimize loop nests. We take a two-pro… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2006.02230, arXiv:2002.02145

  5. arXiv:2103.10836  [pdf, other

    cs.AR

    GNNerator: A Hardware/Software Framework for Accelerating Graph Neural Networks

    Authors: Jacob R. Stevens, Dipankar Das, Sasikanth Avancha, Bharat Kaul, Anand Raghunathan

    Abstract: Graph Neural Networks (GNNs) use a fully-connected layer to extract features from the nodes of a graph and aggregate these features using message passing between nodes, combining two distinct computational patterns: dense, regular computations and sparse, irregular computations. To address this challenge, we propose GNNerator, an accelerator with heterogeneous compute engines optimized for these… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: To appear in Proceedings of the 58th Design Automation Conference (DAC '21)

  6. arXiv:2010.00993  [pdf, other

    cs.RO cs.LG cs.MA

    MADRaS : Multi Agent Driving Simulator

    Authors: Anirban Santara, Sohan Rudra, Sree Aditya Buridi, Meha Kaushik, Abhishek Naik, Bharat Kaul, Balaraman Ravindran

    Abstract: In this work, we present MADRaS, an open-source multi-agent driving simulator for use in the design and evaluation of motion planning algorithms for autonomous driving. MADRaS provides a platform for constructing a wide variety of highway and track driving scenarios where multiple driving agents can train for motion planning tasks using reinforcement learning and other machine learning algorithms.… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  7. arXiv:2006.02230  [pdf, other

    cs.DC cs.AI cs.PL

    PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

    Authors: Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul

    Abstract: Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becoming ubiquitous including in softwares for image recognition, speech recognition, speech synthesis, language translation, to name a few. he training of DNN architectures however is computationally expensive. Once the model is created, its use in the intended application - the inference task, is comput… ▽ More

    Submitted 17 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2002.02145

  8. arXiv:2002.02145  [pdf, other

    cs.PL cs.LG

    PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

    Authors: Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul

    Abstract: At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high performance implementations of deep learning kernels, namely, 1) library development exemplified by Intel MKL-DNN for CPUs, 2) automatic compilation represented by… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

  9. SEERL: Sample Efficient Ensemble Reinforcement Learning

    Authors: Rohan Saphal, Balaraman Ravindran, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

    Abstract: Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to their ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved… ▽ More

    Submitted 16 May, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted at Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

  10. arXiv:1909.07729  [pdf, other

    cs.LG cs.NE stat.ML

    K-TanH: Efficient TanH For Deep Learning

    Authors: Abhisek Kundu, Alex Heinecke, Dhiraj Kalamkar, Sudarshan Srinivasan, Eric C. Qin, Naveen K. Mellempudi, Dipankar Das, Kunal Banerjee, Bharat Kaul, Pradeep Dubey

    Abstract: We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical format… ▽ More

    Submitted 7 June, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

    Comments: 6 pages, 1 figures

  11. arXiv:1908.11809  [pdf, other

    cs.DC cs.LG

    High Performance Scalable FPGA Accelerator for Deep Neural Networks

    Authors: Sudarshan Srinivasan, Pradeep Janedula, Saurabh Dhoble, Sasikanth Avancha, Dipankar Das, Naveen Mellempudi, Bharat Daga, Martin Langhammer, Gregg Baeckler, Bharat Kaul

    Abstract: Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a natural choice for Deep Neural Network inference, In this work we present an FPGA-based accelerator for CNN inference acceleration. We use {\it INT-8-2} compute… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  12. arXiv:1906.08168  [pdf, other

    cs.DC cs.LG

    Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support

    Authors: Sanket Tavarageri, Srinivas Sridharan, Bharat Kaul

    Abstract: The deep neural networks (DNNs) have been enormously successful in tasks that were hitherto in the human-only realm such as image recognition, and language translation. Owing to their success the DNNs are being explored for use in ever more sophisticated tasks. One of the ways that the DNNs are made to scale for the complex undertakings is by increasing their size -- deeper and wider networks can… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

  13. arXiv:1905.12334  [pdf, other

    cs.LG stat.ML

    Mixed Precision Training With 8-bit Floating Point

    Authors: Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul

    Abstract: Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit precision, with significant gains in performance and energy efficiency. However, attempts to train DNNs at 8-bit precision have met with significant challenges because… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  14. arXiv:1905.12322  [pdf, other

    cs.LG stat.ML

    A Study of BFLOAT16 for Deep Learning Training

    Authors: Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey

    Abstract: This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can repr… ▽ More

    Submitted 13 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

  15. arXiv:1809.03576  [pdf, other

    cs.LG cs.CV stat.ML

    Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers

    Authors: Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, Theodore L. Willke

    Abstract: As deep learning methods form a critical part in commercially important applications such as autonomous driving and medical diagnostics, it is important to reliably detect out-of-distribution (OOD) inputs while employing these algorithms. In this work, we propose an OOD detection algorithm which comprises of an ensemble of classifiers. We train each classifier in a self-supervised manner by leavin… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

  16. arXiv:1802.00930  [pdf, other

    cs.NE cs.LG math.NA

    Mixed Precision Training of Convolutional Neural Networks using Integer Operations

    Authors: Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, Vadim Pirogov

    Abstract: The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017). On the other hand, while a lot of research has also happened in the domain of low and mixed-precision Integer training, these works either present results for non-SOTA networks (for instance only Ale… ▽ More

    Submitted 23 February, 2018; v1 submitted 3 February, 2018; originally announced February 2018.

    Comments: Published as a conference paper at ICLR 2018

  17. arXiv:1801.08030  [pdf, other

    cs.DC cs.LG

    On Scale-out Deep Learning Training for Cloud and HPC

    Authors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey

    Abstract: The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today's state-of-the-art deep neural networks. However, scaling synchronous Stochasti… ▽ More

    Submitted 24 January, 2018; originally announced January 2018.

    Comments: Accepted in SysML 2018 conference

  18. arXiv:1707.06658  [pdf, other

    cs.LG cs.AI

    RAIL: Risk-Averse Imitation Learning

    Authors: Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

    Abstract: Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-c… ▽ More

    Submitted 29 November, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Accepted for presentation in Deep Reinforcement Learning Symposium at NIPS 2017

  19. arXiv:1707.04679  [pdf, other

    cs.IT cs.AI

    Ternary Residual Networks

    Authors: Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

    Abstract: Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitiv… ▽ More

    Submitted 31 October, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

  20. arXiv:1705.01462  [pdf, other

    cs.LG cs.NE

    Ternary Neural Networks with Fine-Grained Quantization

    Authors: Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

    Abstract: We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using F… ▽ More

    Submitted 30 May, 2017; v1 submitted 2 May, 2017; originally announced May 2017.

  21. arXiv:1701.08978  [pdf, other

    cs.LG cs.NE

    Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

    Authors: Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul

    Abstract: We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing the… ▽ More

    Submitted 31 January, 2017; v1 submitted 31 January, 2017; originally announced January 2017.

  22. arXiv:1602.06709  [pdf, other

    cs.DC cs.LG

    Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

    Authors: Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey

    Abstract: We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior. We perform a detailed analysis of scaling, and identify optimal design points for different networks. We demonstrate scaling of CNNs on 100s of nodes, and present what we believe to be record training throughputs. A 512 minibatch VGG-A… ▽ More

    Submitted 22 February, 2016; originally announced February 2016.