Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Datta, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15980  [pdf, other

    cs.RO cs.AI

    In-Context Imitation Learning via Next-Token Prediction

    Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

    Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2407.12067  [pdf, other

    cs.CV cs.LG

    MaskVD: Region Masking for Efficient Video Object Detection

    Authors: Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel

    Abstract: Video tasks are compute-heavy and thus pose a challenge when deploying in real-time applications, particularly for tasks that require state-of-the-art Vision Transformers (ViTs). Several research efforts have tried to address this challenge by leveraging the fact that large portions of the video undergo very little change across frames, leading to redundant computations in frame-based video proces… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2406.03744  [pdf, other

    cs.CV cs.LG

    ReDistill: Residual Encoded Distillation for Peak Memory Reduction

    Authors: Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

    Abstract: The expansion of neural network sizes and the enhancement of image resolution through modern camera sensors result in heightened memory and power demands for neural networks. Reducing peak memory, which is the maximum memory consumed during the execution of a neural network, is critical to deploy neural networks on edge devices with limited memory budget. A naive approach to reducing peak memory i… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2402.15121  [pdf, other

    cs.AR cs.ET eess.IV

    Toward High Performance, Programmable Extreme-Edge Intelligence for Neuromorphic Vision Sensors utilizing Magnetic Domain Wall Motion-based MTJ

    Authors: Md Abdullah-Al Kaiser, Gourav Datta, Peter A. Beerel, Akhilesh R. Jaiswal

    Abstract: The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data. To address the challenge, this work proposes an energy-efficient non-von-Neumann in-pixel processing solution for neuromorphic vision sensors employing emerging (X) magnetic domain wall magnetic tunnel junction (MDWMTJ) for the firs… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures, 2 table

  5. arXiv:2402.13232  [pdf, other

    cs.CV cs.RO

    A Touch, Vision, and Language Dataset for Multimodal Alignment

    Authors: Letian Fu, Gaurav Datta, Huang Huang, William Chung-Ho Panitch, Jaimyn Drake, Joseph Ortiz, Mustafa Mukadam, Mike Lambeta, Roberto Calandra, Ken Goldberg

    Abstract: Touch is an important sensing modality for humans, but it has not yet been incorporated into a multimodal generative language model. This is partially due to the difficulty of obtaining natural language labels for tactile data and the complexity of aligning tactile readings with both visual observations and language descriptions. As a step towards bridging that gap, this work introduces a new data… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  6. arXiv:2402.04882  [pdf, other

    cs.NE cs.AI cs.LG cs.SD eess.AS

    LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre Memory Units

    Authors: Zeyu Liu, Gourav Datta, Anni Li, Peter Anthony Beerel

    Abstract: Transformer models have demonstrated high accuracy in numerous applications but have high complexity and lack sequential processing capability making them ill-suited for many streaming applications at the edge where devices are heavily resource-constrained. Thus motivated, many researchers have proposed reformulating the transformer models as RNN modules which modify the self-attention computation… ▽ More

    Submitted 19 January, 2024; originally announced February 2024.

    Comments: The 12th International Conference on Learning Representations (ICLR 2024)

  7. arXiv:2312.06900  [pdf, other

    cs.CV

    When Bio-Inspired Computing meets Deep Learning: Low-Latency, Accurate, & Energy-Efficient Spiking Neural Networks from Artificial Neural Networks

    Authors: Gourav Datta, Zeyu Liu, James Diffenderfer, Bhavya Kailkhura, Peter A. Beerel

    Abstract: Bio-inspired Spiking Neural Networks (SNN) are now demonstrating comparable accuracy to intricate convolutional neural networks (CNN), all while delivering remarkable energy and latency efficiency when deployed on neuromorphic hardware. In particular, ANN-to-SNN conversion has recently gained significant traction in developing deep SNNs with close to state-of-the-art (SOTA) test accuracy on comple… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Under review

  8. arXiv:2311.16456  [pdf, other

    cs.CV eess.IV

    Spiking Neural Networks with Dynamic Time Steps for Vision Transformers

    Authors: Gourav Datta, Zeyu Liu, Anni Li, Peter A. Beerel

    Abstract: Spiking Neural Networks (SNNs) have emerged as a popular spatio-temporal computing paradigm for complex vision tasks. Recently proposed SNN training algorithms have significantly reduced the number of time steps (down to 1) for improved latency and energy efficiency, however, they target only convolutional neural networks (CNN). These algorithms, when applied on the recently spotlighted vision tra… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Under review

  9. arXiv:2309.08136  [pdf, other

    cs.CV

    Let's Roll: Synthetic Dataset Analysis for Pedestrian Detection Across Different Shutter Types

    Authors: Yue Hu, Gourav Datta, Kira Beerel, Peter Beerel

    Abstract: Computer vision (CV) pipelines are typically evaluated on datasets processed by image signal processing (ISP) pipelines even though, for resource-constrained applications, an important research goal is to avoid as many ISP steps as possible. In particular, most CV datasets consist of global shutter (GS) images even though most cameras today use a rolling shutter (RS). This paper studies the impact… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    ACM Class: I.4

  10. arXiv:2308.03164  [pdf, other

    cs.CV cs.LG

    FireFly A Synthetic Dataset for Ember Detection in Wildfire

    Authors: Yue Hu, Xinan Ye, Yifei Liu, Souvik Kundu, Gourav Datta, Srikar Mutnuri, Namo Asavisanu, Nora Ayanian, Konstantinos Psounis, Peter Beerel

    Abstract: This paper presents "FireFly", a synthetic dataset for ember detection created using Unreal Engine 4 (UE4), designed to overcome the current lack of ember-specific training resources. To create the dataset, we present a tool that allows the automated generation of the synthetic labeled dataset with adjustable parameters, enabling data diversity from various environmental conditions, making the dat… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: Artificial Intelligence (AI) and Humanitarian Assistance and Disaster Recovery (HADR) workshop, ICCV 2023 in Paris, France

    ACM Class: I.4

  11. arXiv:2306.15228  [pdf, other

    cs.RO cs.AI

    IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

    Authors: Gaurav Datta, Ryan Hoque, Anrui Gu, Eugen Solowjow, Ken Goldberg

    Abstract: Imitation learning has been applied to a range of robotic tasks, but can struggle when robots encounter edge cases that are not represented in the training data (i.e., distribution shift). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human supervisors during task execution and learn from them over time, but different supervisors may demonstrate… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: CoRL 2023

  12. Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)

    Authors: Md Abdullah-Al Kaiser, Gourav Datta, Sreetama Sarkar, Souvik Kundu, Zihan Yin, Manas Garg, Ajey P. Jacob, Peter A. Beerel, Akhilesh R. Jaiswal

    Abstract: The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integ… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Journal ref: GLSVLSI '23: Great Lakes Symposium on VLSI 2023 Proceedings

  13. ViTA: A Vision Transformer Inference Accelerator for Edge Applications

    Authors: Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, Peter A. Beerel

    Abstract: Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks due to their ability to capture the global relation between features which leads to superior performance. However, they are compute-heavy and difficult to deploy in resource-constrained edge devices. Existing hardware accelerators, including t… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted at ISCAS 2023

    Journal ref: 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 2023, pp. 1-5

  14. arXiv:2301.04741  [pdf, other

    cs.LG

    Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

    Authors: Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

    Abstract: Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we stu… ▽ More

    Submitted 9 February, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: In proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)

  15. arXiv:2212.10881  [pdf, other

    cs.CV

    In-Sensor & Neuromorphic Computing are all you need for Energy Efficient Computer Vision

    Authors: Gourav Datta, Zeyu Liu, Md Abdullah-Al Kaiser, Souvik Kundu, Joe Mathai, Zihan Yin, Ajey P. Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

    Abstract: Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  16. arXiv:2212.10170  [pdf, other

    cs.CV

    Hoyer regularizer is all you need for ultra low-latency spiking neural networks

    Authors: Gourav Datta, Zeyu Liu, Peter A. Beerel

    Abstract: Spiking Neural networks (SNN) have emerged as an attractive spatio-temporal computing paradigm for a wide range of low-power vision tasks. However, state-of-the-art (SOTA) SNN models either incur multiple time steps which hinder their deployment in real-time use cases or increase the training complexity significantly. To mitigate this concern, we present a training framework (from scratch) for one… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 16 pages, 4 figures

  17. arXiv:2210.12613  [pdf, other

    cs.NE cs.ET

    Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs

    Authors: Gourav Datta, Haoqin Deng, Robert Aviles, Peter A. Beerel

    Abstract: Spiking Neural Networks (SNNs) have emerged as an attractive spatio-temporal computing paradigm for complex vision tasks. However, most existing works yield models that require many time steps and do not leverage the inherent temporal dynamics of spiking neural networks, even for sequential tasks. Motivated by this observation, we propose an \rev{optimized spiking long short-term memory networks (… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

  18. arXiv:2210.07420  [pdf, other

    cs.RO cs.AI cs.LG

    Learning to Efficiently Plan Robust Frictional Multi-Object Grasps

    Authors: Wisdom C. Agboh, Satvik Sharma, Kishore Srinivas, Mallika Parulekar, Gaurav Datta, Tianshuang Qiu, Jeffrey Ichnowski, Eugen Solowjow, Mehmet Dogar, Ken Goldberg

    Abstract: We consider a decluttering problem where multiple rigid convex polygonal objects rest in randomly placed positions and orientations on a planar surface and must be efficiently transported to a packing box using both single and multi-object grasps. Prior work considered frictionless multi-object grasping. In this paper, we introduce friction to increase the number of potential grasps for a given gr… ▽ More

    Submitted 2 August, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: IEEE IROS 2023

  19. arXiv:2210.05451  [pdf, other

    cs.CV eess.IV

    Enabling ISP-less Low-Power Computer Vision

    Authors: Gourav Datta, Zeyu Liu, Zihan Yin, Linyu Sun, Akhilesh R. Jaiswal, Peter A. Beerel

    Abstract: In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to WACV 2023

  20. arXiv:2209.07659  [pdf, other

    cs.CV cs.LG

    Self-Attentive Pooling for Efficient Deep Learning

    Authors: Fang Chen, Gourav Datta, Souvik Kundu, Peter Beerel

    Abstract: Efficient custom pooling techniques that can aggressively trim the dimensions of a feature map and thereby reduce inference compute and memory footprint for resource-constrained computer vision applications have recently gained significant traction. However, prior pooling works extract only the local context of the activation maps, limiting their effectiveness. In contrast, we propose a novel non-… ▽ More

    Submitted 28 December, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: 9 pages, 4 figures, conference

  21. arXiv:2203.05696  [pdf, other

    eess.IV cs.CV

    Toward Efficient Hyperspectral Image Processing inside Camera Pixels

    Authors: Gourav Datta, Zihan Yin, Ajey Jacob, Akhilesh R. Jaiswal, Peter A. Beerel

    Abstract: Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth an… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: 6 pages, 3 figures

  22. arXiv:2203.04737  [pdf, other

    cs.LG cs.AR cs.CV

    P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications

    Authors: Gourav Datta, Souvik Kundu, Zihan Yin, Ravi Teja Lakkireddy, Joe Mathai, Ajey Jacob, Peter A. Beerel, Akhilesh R. Jaiswal

    Abstract: The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tri… ▽ More

    Submitted 16 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: 15 pages, 8 figures

  23. arXiv:2112.12133  [pdf, other

    cs.CV

    Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks?

    Authors: Gourav Datta, Peter A. Beerel

    Abstract: Spiking neural networks (SNNs), that operate via binary spikes distributed over time, have emerged as a promising energy efficient ML paradigm for resource-constrained devices. However, the current state-of-the-art (SOTA) SNNs require multiple time steps for acceptable inference accuracy, increasing spiking activity and, consequently, energy consumption. SOTA training strategies for SNNs involve c… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: Accepted to DATE 2022

  24. arXiv:2107.12445  [pdf, other

    cs.NE cs.LG

    Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression

    Authors: Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

    Abstract: Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks, due to their promise to provide increased compute efficiency on event-driven neuromorphic hardware. However, to perform well on complex vision applications, most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy eff… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

    Comments: 10 Pages, 8 Figures, 5 Tables

  25. arXiv:2107.12374  [pdf, other

    cs.NE

    Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike Hybrid Input Encoding

    Authors: Gourav Datta, Souvik Kundu, Peter A. Beerel

    Abstract: Spiking Neural Networks (SNNs) have emerged as an attractive alternative to traditional deep learning frameworks, since they provide higher computational efficiency in event driven neuromorphic hardware. However, the state-of-the-art (SOTA) SNNs suffer from high inference latency, resulting from inefficient input encoding and training techniques. The most widely used input coding schemes, such as… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: Extended version of arXiv:2107.11979

  26. arXiv:2107.11979  [pdf, other

    cs.NE

    HYPER-SNN: Towards Energy-efficient Quantized Deep Spiking Neural Networks for Hyperspectral Image Classification

    Authors: Gourav Datta, Souvik Kundu, Akhilesh R. Jaiswal, Peter A. Beerel

    Abstract: Hyper spectral images (HSI) provide rich spectral and spatial information across a series of contiguous spectral bands. However, the accurate processing of the spectral and spatial correlation between the bands requires the use of energy-expensive 3-D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-ar… ▽ More

    Submitted 28 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  27. arXiv:2001.10715  [pdf, other

    cs.AR

    qBSA: Logic Design of a 32-bit Block-Skewed RSFQ Arithmetic Logic Unit

    Authors: Souvik Kundu, Gourav datta, Peter A. Beerel, Massoud Pedram

    Abstract: Single flux quantum (SFQ) circuits are an attractive beyond-CMOS technology because they promise two orders of magnitude lower power at clock frequencies exceeding 25 GHz.However, every SFQ gate is clocked creating very deep gate-level pipelines that are difficult to keep full, particularly for sequences that include data-dependent operations. This paper proposes to increase the throughput of SFQ… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: 3 pages, 3 figures

  28. arXiv:1910.04907  [pdf, other

    cs.ET physics.app-ph

    Metastability-Resilient Synchronization FIFO for SFQ Logic

    Authors: Gourav Datta, Haolin Cong, Souvik Kundu, Peter A. Beerel

    Abstract: Digital single-flux quantum (SFQ) technology promises to meet the demands of ultra low power and high speed computing needed for future exascale supercomputing systems. The combination of ultra high clock frequencies, gate-level pipelines, and numerous sources of variability in SFQ circuits, however, make low-skew global clock distribution a challenge. This motivates the support of multiple indepe… ▽ More

    Submitted 23 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted in ISEC 2019