Skip to main content

Showing 1–19 of 19 results for author: Pande, P P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05282  [pdf, other

    cs.AR

    Scalable and Programmable Look-Up Table based Neural Acceleration (LUT-NA) for Extreme Energy Efficiency

    Authors: Ovishake Sen, Chukwufumnanya Ogbogu, Peyman Dehghanzadeh, Janardhan Rao Doppa, Swarup Bhunia, Partha Pratim Pande, Baibhab Chatterjee

    Abstract: Traditional digital implementations of neural accelerators are limited by high power and area overheads, while analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This paper introduces a CMOS Look-Up Table (LUT)-based Neural Accelerator (LUT-NA) framework that reduces the power, latency, and area consumption of traditional digital accelerators through pr… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 7 pages

  2. arXiv:2403.19073  [pdf

    cs.AR cs.AI cs.ET

    Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads

    Authors: Harsh Sharma, Gaurav Narang, Janardhan Rao Doppa, Umit Ogras, Partha Pratim Pande

    Abstract: Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads. Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM. However, as the complexity of Deep convolutional neural networks (DNNs) grows, we need to design a manycore architecture with multiple ReRAM-based processin… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Presented at DATE Conference, Valencia, Spain 2024

  3. arXiv:2401.10522  [pdf

    cs.AR cs.LG

    FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators

    Authors: Pratyush Dhingra, Chukwufumnanya Ogbogu, Biresh Kumar Joardar, Janardhan Rao Doppa, Ananth Kalyanaraman, Partha Pratim Pande

    Abstract: Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architecture is an attractive solution for training Graph Neural Networks (GNNs) on edge platforms. However, the immature fabrication process and limited write endurance of ReRAMs make them prone to hardware faults, thereby limiting their widespread adoption for GNN training. Further, the existing fault-tolerant solutions prov… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted to the conference DATE (Design, Automation and Test in Europe) - 2024

    ACM Class: B.8.1

  4. arXiv:2312.11750  [pdf

    cs.AR cs.DC

    A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

    Authors: Harsh Sharma, Pratyush Dhingra, Janardhan Rao Doppa, Umit Ogras, Partha Pratim Pande

    Abstract: Transformers have revolutionized deep learning and generative modeling, enabling unprecedented advancements in natural language processing tasks. However, the size of transformer models is increasing continuously, driven by enhanced capabilities across various deep-learning tasks. This trend of ever-increasing model size has given rise to new challenges in terms of memory and computing requirement… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Preprint for a Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

  5. arXiv:2310.12182  [pdf, other

    cs.AR

    Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

    Authors: Xueying Wu, Edward Hanson, Nansu Wang, Qilin Zheng, Xiaoxuan Yang, Huanrui Yang, Shiyu Li, Feng Cheng, Partha Pratim Pande, Janardhan Rao Doppa, Krishnendu Chakrabarty, Hai Li

    Abstract: Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate Deep Neural Network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the non-idealities, such as the conductance variation of ReRAM cells. The impact of these non-idealities worsens as the number of concurrently activ… ▽ More

    Submitted 27 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 12 pages, 13 figures

  6. arXiv:2111.09272  [pdf

    cs.AR cs.ET

    ReaLPrune: ReRAM Crossbar-aware Lottery Ticket Pruned CNNs

    Authors: Biresh Kumar Joardar, Janardhan Rao Doppa, Hai Li, Krishnendu Chakrabarty, Partha Pratim Pande

    Abstract: Training machine learning (ML) models at the edge (on-chip training on end user devices) can address many pressing challenges including data privacy/security, increase the accessibility of ML applications to different parts of the world by reducing the dependence on the communication fabric and the cloud infrastructure, and meet the real-time requirements of AR/VR applications. However, existing e… ▽ More

    Submitted 23 March, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: 13 pages, 9 figures

  7. arXiv:2109.05437  [pdf, other

    cs.ET

    Multi-Objective Optimization of ReRAM Crossbars for Robust DNN Inferencing under Stochastic Noise

    Authors: Xiaoxuan Yang, Syrine Belakaria, Biresh Kumar Joardar, Huanrui Yang, Janardhan Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty, Hai Li

    Abstract: Resistive random-access memory (ReRAM) is a promising technology for designing hardware accelerators for deep neural network (DNN) inferencing. However, stochastic noise in ReRAM crossbars can degrade the DNN inferencing accuracy. We propose the design and optimization of a high-performance, area-and energy-efficient ReRAM-based hardware accelerator to achieve robust DNN inferencing in the presenc… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: To appear in ICCAD 2021

  8. arXiv:2105.09282  [pdf, other

    cs.AR cs.DC eess.SY

    Learning Pareto-Frontier Resource Management Policies for Heterogeneous SoCs: An Information-Theoretic Approach

    Authors: Aryan Deshwal, Syrine Belakaria, Ganapati Bhat, Janardhan Rao Doppa, Partha Pratim Pande

    Abstract: Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arm's Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multipl… ▽ More

    Submitted 14 April, 2021; originally announced May 2021.

    Comments: To be published in proceedings DAC

  9. arXiv:2103.12896  [pdf, other

    cs.CV cs.LG eess.IV

    SETGAN: Scale and Energy Trade-off GANs for Image Applications on Mobile Platforms

    Authors: Nitthilan Kannappan Jayakodi, Janardhan Rao Doppa, Partha Pratim Pande

    Abstract: We consider the task of photo-realistic unconditional image generation (generate high quality, diverse samples that carry the same visual content as the image) on mobile platforms using Generative Adversarial Networks (GANs). In this paper, we propose a novel approach to trade-off image generation accuracy of a GAN for the energy consumed (compute) at run-time called Scale-Energy Tradeoff GAN (SET… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

  10. arXiv:2102.07959  [pdf

    cs.AR cs.ET

    ReGraphX: NoC-enabled 3D Heterogeneous ReRAM Architecture for Training Graph Neural Networks

    Authors: Aqeeb Iqbal Arka, Biresh Kumar Joardar, Janardhan Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty

    Abstract: Graph Neural Network (GNN) is a variant of Deep Neural Networks (DNNs) operating on graphs. However, GNNs are more complex compared to traditional DNNs as they simultaneously exhibit features of both DNN and graph applications. As a result, architectures specifically optimized for either DNNs or graph applications are not suited for GNN training. In this work, we propose a 3D heterogeneous manycor… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: This paper has been accepted and presented at Design Automation and Test in Europe (DATE) 2021

  11. HeM3D: Heterogeneous Manycore Architecture Based on Monolithic 3D Vertical Integration

    Authors: Aqeeb Iqbal Arka, Biresh Kumar Joardar, Ryan Gary Kim, Dae Hyun Kim, Janardhan Rao Doppa, Partha Pratim Pande

    Abstract: Heterogeneous manycore architectures are the key to efficiently execute compute- and data-intensive applications. Through silicon via (TSV)-based 3D manycore system is a promising solution in this direction as it enables integration of disparate computing cores on a single system. However, the achievable performance of conventional through-silicon-via (TSV)-based 3D systems is ultimately bottlenec… ▽ More

    Submitted 7 December, 2020; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: This work has been accepted in ACM Transactions on Design Automation of Electronic Systems

    ACM Class: C.2

  12. arXiv:2008.09728  [pdf, other

    cs.DC cs.AI cs.LG eess.SY

    Online Adaptive Learning for Runtime Resource Management of Heterogeneous SoCs

    Authors: Sumit K. Mandal, Umit Y. Ogras, Janardhan Rao Doppa, Raid Z. Ayoub, Michael Kishinevsky, Partha P. Pande

    Abstract: Dynamic resource management has become one of the major areas of research in modern computer and communication system design due to lower power consumption and higher performance demands. The number of integrated cores, level of heterogeneity and amount of control knobs increase steadily. As a result, the system complexity is increasing faster than our ability to optimize and dynamically manage th… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

    Comments: This paper appeared in the Proceedings of Design Automation Conference 2020

  13. arXiv:2003.09526  [pdf, other

    cs.DC cs.LG eess.SY

    An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms

    Authors: Sumit K. Mandal, Ganapati Bhat, Janardhan Rao Doppa, Partha Pratim Pande, Umit Y. Ogras

    Abstract: Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, system-on-chips (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: This paper has been accepted to be published in a future issue of ACM TODAES

  14. arXiv:1906.04293  [pdf

    cs.ET cs.NI

    Inter-Tier Process Variation-Aware Monolithic 3D NoC Architectures

    Authors: Shouvik Musavvir, Anwesha Chatterjee, Ryan Gary Kim, Dae Hyun Kim, Partha Pratim Pande

    Abstract: Monolithic 3D (M3D) technology enables high density integration, performance, and energy-efficiency by sequentially stacking tiers on top of each other. M3D-based network-on-chip (NoC) architectures can exploit these benefits by adopting tier partitioning for intra-router stages. However, conventional fabrication methods are infeasible for M3D-enabled designs due to temperature related issues. Thi… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Submitted to IEEE TVLSI (Under Review)

  15. Trading-off Accuracy and Energy of Deep Inference on Embedded Systems: A Co-Design Approach

    Authors: Nitthilan Kannappan Jayakodi, Anwesha Chatterjee, Wonje Choi, Janardhan Rao Doppa, Partha Pratim Pande

    Abstract: Deep neural networks have seen tremendous success for different modalities of data including images, videos, and speech. This success has led to their deployment in mobile and embedded systems for real-time applications. However, making repeated inferences using deep networks on embedded systems poses significant challenges due to constrained resources (e.g., energy and computing power). To addres… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: Published in IEEE Trans. on CAD of Integrated Circuits and Systems

    Journal ref: Vol. 37, No. 11, Pages 2881-2893, Nov 2018

  16. arXiv:1810.08869  [pdf

    cs.DC cs.LG stat.ML

    Learning-based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems

    Authors: Biresh Kumar Joardar, Ryan Gary Kim, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu

    Abstract: The rising use of deep learning and other big-data algorithms has led to an increasing demand for hardware platforms that are computationally powerful, yet energy-efficient. Due to the amount of data parallelism in these algorithms, high-performance 3D manycore platforms that incorporate both CPUs and GPUs present a promising direction. However, as systems use heterogeneity (e.g., a combination of… ▽ More

    Submitted 5 October, 2019; v1 submitted 20 October, 2018; originally announced October 2018.

    Comments: Published in IEEE Transactions on Computers

    Journal ref: IEEE Transactions on Computers, vol. 68, no. 6, June 2019

  17. On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

    Authors: Wonje Choi, Karthi Duraisamy, Ryan Gary Kim, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu

    Abstract: Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: Accepted in a future publication of IEEE Transactions on Computers

  18. arXiv:1712.00076  [pdf

    cs.LG cs.AR

    Machine Learning and Manycore Systems Design: A Serendipitous Symbiosis

    Authors: Ryan Gary Kim, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu

    Abstract: Tight collaboration between experts of machine learning and manycore system design is necessary to create a data-driven manycore design framework that integrates both learning and expert knowledge. Such a framework will be necessary to address the rising complexity of designing large-scale manycore systems and machine learning techniques.

    Submitted 30 November, 2017; originally announced December 2017.

    Comments: To appear in a future publication of IEEE Computer

  19. arXiv:1608.06972  [pdf

    cs.ET cs.DC cs.NI

    Design-Space Exploration and Optimization of an Energy-Efficient and Reliable 3D Small-world Network-on-Chip

    Authors: Sourav Das, Janardhan Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty

    Abstract: A three-dimensional (3D) Network-on-Chip (NoC) enables the design of high performance and low power many-core chips. Existing 3D NoCs are inadequate for meeting the ever-increasing performance requirements of many-core processors since they are simple extensions of regular 2D architectures and they do not fully exploit the advantages provided by 3D integration. Moreover, the anticipated performanc… ▽ More

    Submitted 24 August, 2016; originally announced August 2016.