Zum Hauptinhalt springen

Showing 1–50 of 323 results for author: Shrivastava, A

.
  1. arXiv:2408.16930  [pdf, other

    cs.CV

    VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition

    Authors: Zaiwei Zhang, Gregory P. Meyer, Zhichao Lu, Ashish Shrivastava, Avinash Ravichandran, Eric M. Wolff

    Abstract: For visual recognition, knowledge distillation typically involves transferring knowledge from a large, well-trained teacher model to a smaller student model. In this paper, we introduce an effective method to distill knowledge from an off-the-shelf vision-language model (VLM), demonstrating that it provides novel supervision in addition to those from a conventional vision-only teacher model. Our k… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. arXiv:2408.11219  [pdf, other

    cs.CL cs.AI

    CoDi: Conversational Distillation for Grounded Question Answering

    Authors: Patrick Huber, Arash Einolghozati, Rylan Conway, Kanika Narang, Matt Smith, Waqar Nayyar, Adithya Sagar, Ahmed Aly, Akshat Shrivastava

    Abstract: Distilling conversational skills into Small Language Models (SLMs) with approximately 1 billion parameters presents significant challenges. Firstly, SLMs have limited capacity in their model parameters to learn extensive knowledge compared to larger models. Secondly, high-quality conversational datasets are often scarce, small, and domain-specific. Addressing these challenges, we introduce a novel… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 13 pages

  3. arXiv:2408.11205  [pdf, other

    eess.SP cs.CL

    DSP-MLIR: A MLIR Dialect for Digital Signal Processing

    Authors: Abhinav Kumar, Atharva Khedkar, Aviral Shrivastava

    Abstract: Traditional Digital Signal Processing ( DSP ) compilers work at low level ( C-level / assembly level ) and hence lose much of the optimization opportunities present at high-level ( domain-level ). The emerging multi-level compiler infrastructure MLIR ( Multi-level Intermediate Representation ) allows to specify optimizations at higher level. In this paper, we utilize MLIR framework to introduce a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.02672  [pdf, other

    cs.CV

    Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

    Authors: Shishira R Maiya, Anubhav Gupta, Matthew Gwilliam, Max Ehrlich, Abhinav Shrivastava

    Abstract: Implicit Neural Networks (INRs) have emerged as powerful representations to encode all forms of data, including images, videos, audios, and scenes. With video, many INRs for video have been proposed for the compression task, and recent methods feature significant improvements with respect to encoding time, storage, and reconstruction quality. However, these encoded representations lack semantic me… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: equal contribution for first two authors; accepted to ECCV2024; 14 pages, 4 tables, 10 figures in main paper, supplementary after bibliography

  5. arXiv:2408.00996  [pdf, other

    cs.LG cs.AI

    IncidentNet: Traffic Incident Detection, Localization and Severity Estimation with Sparse Sensing

    Authors: Sai Shashank Peddiraju, Kaustubh Harapanahalli, Edward Andert, Aviral Shrivastava

    Abstract: Prior art in traffic incident detection relies on high sensor coverage and is primarily based on decision-tree and random forest models that have limited representation capacity and, as a result, cannot detect incidents with high accuracy. This paper presents IncidentNet - a novel approach for classifying, localizing, and estimating the severity of traffic incidents using deep learning models trai… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 6 pages, 6 figures, 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)

  6. arXiv:2407.21770  [pdf, other

    cs.AI cs.LG

    MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

    Authors: Xi Victoria Lin, Akshat Shrivastava, Liang Luo, Srinivasan Iyer, Mike Lewis, Gargi Ghosh, Luke Zettlemoyer, Armen Aghajanyan

    Abstract: We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adap… ▽ More

    Submitted 12 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: v2 -> update related work section v3 -> fix spelling

  7. arXiv:2407.18249  [pdf, other

    cs.CV

    Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

    Authors: Pulkit Kumar, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava

    Abstract: We propose a simple yet effective approach for few-shot action recognition, emphasizing the disentanglement of motion and appearance representations. By harnessing recent progress in tracking, specifically point trajectories and self-supervised representation learning, we build trajectory-aligned tokens (TATs) that capture motion and appearance information. This approach significantly reduces the… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  8. arXiv:2407.15849  [pdf, other

    cs.RO cs.AI

    WayEx: Waypoint Exploration using a Single Demonstration

    Authors: Mara Levy, Nirat Saini, Abhinav Shrivastava

    Abstract: We propose WayEx, a new method for learning complex goal-conditioned robotics tasks from a single demonstration. Our approach distinguishes itself from existing imitation learning methods by demanding fewer expert examples and eliminating the need for information about the actions taken during the demonstration. This is accomplished by introducing a new reward function and employing a knowledge ex… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ICRA 2024

  9. arXiv:2407.10958  [pdf, other

    cs.CV

    InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

    Authors: Nirat Saini, Navaneeth Bodla, Ashish Shrivastava, Avinash Ravichandran, Xiao Zhang, Abhinav Shrivastava, Bharat Singh

    Abstract: We introduce InVi, an approach for inserting or replacing objects within videos (referred to as inpainting) using off-the-shelf, text-to-image latent diffusion models. InVi targets controlled manipulation of objects and blending them seamlessly into a background video unlike existing video editing methods that focus on comprehensive re-styling or entire scene alterations. To achieve this goal, we… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  10. arXiv:2407.10032  [pdf, other

    cs.LG

    LeanQuant: Accurate Large Language Model Quantization with Loss-Error-Aware Grid

    Authors: Tianyi Zhang, Anshumali Shrivastava

    Abstract: Large language models (LLMs) have numerous applications across various domains, but their high computational and memory demands pose significant deployment challenges. Weight quantization is an effective technique for reducing the decoding latency and memory requirements of LLMs. Existing approaches primarily aim to maintain the quality of quantized models by preserving outliers in input features,… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  11. arXiv:2407.07092  [pdf, other

    cs.CV cs.AI

    V-VIPE: Variational View Invariant Pose Embedding

    Authors: Mara Levy, Abhinav Shrivastava

    Abstract: Learning to represent three dimensional (3D) human pose given a two dimensional (2D) image of a person, is a challenging problem. In order to make the problem less ambiguous it has become common practice to estimate 3D pose in the camera coordinate space. However, this makes the task of comparing two 3D poses difficult. In this paper, we address this challenge by separating the problem of estimati… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: CVPR 2024 - RHOBIN Workshop

  12. arXiv:2406.14901  [pdf, other

    cs.IR

    IDentity with Locality: An ideal hash for gene sequence search

    Authors: Aditya Desai, Gaurav Gupta, Tianyi Zhang, Anshumali Shrivastava

    Abstract: Gene sequence search is a fundamental operation in computational genomics. Due to the petabyte scale of genome archives, most gene search systems now use hashing-based data structures such as Bloom Filters (BF). The state-of-the-art systems such as Compact bit-slicing signature index (COBS) and Repeated And Merged Bloom filters (RAMBO) use BF with Random Hash (RH) functions for gene representation… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 13 pages

  13. arXiv:2406.13301  [pdf, other

    cs.CV cs.RO

    ARDuP: Active Region Video Diffusion for Universal Policies

    Authors: Shuaiyi Huang, Mara Levy, Zhenyu Jiang, Anima Anandkumar, Yuke Zhu, Linxi Fan, De-An Huang, Abhinav Shrivastava

    Abstract: Sequential decision-making can be formulated as a text-conditioned video generation problem, where a video planner, guided by a text-defined goal, generates future frames visualizing planned actions, from which control actions are subsequently derived. In this work, we introduce Active Region Video Diffusion for Universal Policies (ARDuP), a novel framework for video-based policy learning that emp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  14. arXiv:2406.11820  [pdf, other

    cs.CV

    Composing Object Relations and Attributes for Image-Text Matching

    Authors: Khoi Pham, Chuong Huynh, Ser-Nam Lim, Abhinav Shrivastava

    Abstract: We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene grap… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR'24

  15. arXiv:2406.10900  [pdf, other

    cs.CV cs.CL

    AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

    Authors: Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha

    Abstract: Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  16. arXiv:2406.10722  [pdf, other

    cs.CV cs.AI cs.LG

    GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

    Authors: Bharat Singh, Viveka Kulharia, Luyu Yang, Avinash Ravichandran, Ambrish Tyagi, Ashish Shrivastava

    Abstract: Multimodal synthetic data generation is crucial in domains such as autonomous driving, robotics, augmented/virtual reality, and retail. We propose a novel approach, GenMM, for jointly editing RGB videos and LiDAR scans by inserting temporally and geometrically consistent 3D objects. Our method uses a reference image and 3D bounding boxes to seamlessly insert and blend new objects into target video… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  17. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  18. arXiv:2406.06908  [pdf, other

    cs.CV

    UVIS: Unsupervised Video Instance Segmentation

    Authors: Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava

    Abstract: Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes fro… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: CVPR2024 Workshop

  19. arXiv:2406.02958  [pdf, other

    cs.LG cs.AI cs.CL cs.CR cs.DC

    PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs

    Authors: Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, Daniel Lazar

    Abstract: On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large models on-device, (2) on-device training is communication- and computation-intensive, and (3) on-device training can be difficult to debug and deploy. To addre… ▽ More

    Submitted 17 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024 (Oral)

  20. arXiv:2405.03917  [pdf, other

    cs.LG

    KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

    Authors: Tianyi Zhang, Jonah Yi, Zhaozhuo Xu, Anshumali Shrivastava

    Abstract: Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression,… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  21. arXiv:2405.03092  [pdf, other

    cond-mat.mtrl-sci cs.LG math.OC

    Bayesian optimization for stable properties amid processing fluctuations in sputter deposition

    Authors: Ankit Shrivastava, Matias Kalaswad, Joyce O. Custer, David P. Adams, Habib N. Najm

    Abstract: We introduce a Bayesian optimization approach to guide the sputter deposition of molybdenum thin films, aiming to achieve desired residual stress and sheet resistance while minimizing susceptibility to stochastic fluctuations during deposition. Thin films are pivotal in numerous technologies, including semiconductors and optical devices, where their properties are critical. Sputter deposition para… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Journal ref: J. Vac. Sci. Technol. A 1 May 2024; 42 (3): 033408

  22. arXiv:2404.16710  [pdf, other

    cs.CL cs.AI cs.LG

    LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

    Authors: Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

    Abstract: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exi… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Code open sourcing is in progress

  23. arXiv:2404.16035  [pdf, other

    cs.CV cs.AI

    MaGGIe: Masked Guided Gradual Human Instance Matting

    Authors: Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee

    Abstract: Human matting is a foundation task in image and video processing, where human foreground pixels are extracted from the input. Prior works either improve the accuracy by additional guidance or improve the temporal consistency of a single instance across frames. We propose a new framework MaGGIe, Masked Guided Gradual Human Instance Matting, which predicts alpha mattes progressively for each human i… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project link: https://maggie-matt.github.io

  24. arXiv:2404.05726  [pdf, other

    cs.CV

    MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

    Authors: Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

    Abstract: With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective… ▽ More

    Submitted 24 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024. Project Page https://boheumd.github.io/MA-LMM/

  25. arXiv:2404.01990  [pdf, other

    cs.CV

    What is Point Supervision Worth in Video Instance Segmentation?

    Authors: Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

    Abstract: Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed train… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  26. arXiv:2404.01653  [pdf, other

    nucl-ex nucl-th

    Investigation of reaction and $α$ production cross sections with $^9$Be projectile

    Authors: Satbir Kaur, V. V. Parkar, S. K. Pandit, A. Shrivastava, K. Mahata, K. Ramachandran, Sangeeta Dhuri, P. C. Rout, A. Kumar, Shilpi Gupta

    Abstract: In order to investigate the contribution of $α$ production in the reaction cross sections, measurements of elastic scattering and inclusive $α$ particle angular distributions have been carried out with the $^9$Be projectile on $^{89}$Y, $^{124}$Sn, $^{159}$Tb, $^{198}$Pt, and $^{209}$Bi targets over a wide angular range at energies near the Coulomb barrier. The measured elastic scattering angular… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 9 figures

    Journal ref: Nuclear Physics A 1046, 122864 (2024)

  27. arXiv:2404.01292  [pdf, other

    cs.CV cs.LG

    Measuring Style Similarity in Diffusion Models

    Authors: Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas Geiping, Abhinav Shrivastava, Tom Goldstein

    Abstract: Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  28. arXiv:2403.14625  [pdf, other

    cs.CV

    LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

    Authors: Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava

    Abstract: We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and compact postprocessing network that can be applied to enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to train with a self-supervised objective, and it boosts the density of ViT features for m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  29. arXiv:2403.02486  [pdf, other

    cs.RO eess.SY

    Demonstrating a Robust Walking Algorithm for Underactuated Bipedal Robots in Non-flat, Non-stationary Environments

    Authors: Oluwami Dosunmu-Ogunbi, Aayushi Shrivastava, Jessy W Grizzle

    Abstract: This work explores an innovative algorithm designed to enhance the mobility of underactuated bipedal robots across challenging terrains, especially when navigating through spaces with constrained opportunities for foot support, like steps or stairs. By combining ankle torque with a refined angular momentum-based linear inverted pendulum model (ALIP), our method allows variability in the robot's ce… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2403.01273  [pdf, other

    cs.LG cs.AI cs.CL

    NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

    Authors: Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava

    Abstract: Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs t… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  31. arXiv:2402.18113  [pdf, other

    cs.CL cs.AI

    Small But Funny: A Feedback-Driven Approach to Humor Distillation

    Authors: Sahithya Ravi, Patrick Huber, Akshat Shrivastava, Aditya Sagar, Ahmed Aly, Vered Shwartz, Arash Einolghozati

    Abstract: The emergence of Large Language Models (LLMs) has brought to light promising language generation capabilities, particularly in performing tasks like complex reasoning and creative writing. Consequently, distillation through imitation of teacher responses has emerged as a popular technique to transfer knowledge from LLMs to more accessible, Small Language Models (SLMs). While this works well for si… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  32. arXiv:2402.14780  [pdf, other

    cs.CV

    Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

    Authors: Yixuan Ren, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastava

    Abstract: Image customization has been extensively studied in text-to-image (T2I) diffusion models, leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customization, has not yet been well investigated. To address the challenge of one-shot video motion customization, we propose Customize-A-Video that models the motion… ▽ More

    Submitted 27 August, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted by ECCV 2024. Project page: https://customize-a-video.github.io

  33. arXiv:2402.14035  [pdf, other

    cs.LG cs.AI

    Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model

    Authors: Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

    Abstract: Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient fo… ▽ More

    Submitted 15 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  34. arXiv:2401.10217  [pdf, other

    cs.CV

    Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions

    Authors: Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava

    Abstract: The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image super-resolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Can… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  35. Fusion of $^{7}$Li with $^{205}$Tl at near barrier energies

    Authors: V. V. Parkar, Prasanna M., Ruchi Rathod, V. Jha, S. K. Pandit, A. Shrivastava, K. Mahata, K. Ramachandran, R. Palit, Md. S. R. Laskar, B. J. Roy, Bhushan Kanagalekar, B. G. Hegde

    Abstract: The complete and incomplete fusion cross sections for the $^{7}$Li+$^{205}$Tl reaction were measured at near barrier energies by online characteristic $γ$ ray detection technique. The complete fusion (CF) cross sections at energies above the Coulomb barrier were found to be suppressed by $\sim$ 26 \% compared to the coupled channel calculations. Reduced fusion cross sections for the present system… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 9 pages, 7 figures. arXiv admin note: text overlap with arXiv:1801.06996

    Journal ref: Phys. Rev. C 109, 014610 (2024)

  36. arXiv:2312.16784  [pdf, other

    cs.LG cs.SI

    Learning Scalable Structural Representations for Link Prediction with Bloom Signatures

    Authors: Tianyi Zhang, Haoteng Yin, Rongzhe Wei, Pan Li, Anshumali Shrivastava

    Abstract: Graph neural networks (GNNs) have shown great potential in learning on graphs, but they are known to perform sub-optimally on link prediction tasks. Existing GNNs are primarily designed to learn node-wise representations and usually fail to capture pairwise relations between target nodes, which proves to be crucial for link prediction. Recent works resort to learning more expressive edge-wise repr… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  37. arXiv:2312.12143  [pdf, other

    cs.CV eess.IV

    Integrating Human Vision Perception in Vision Transformers for Classifying Waste Items

    Authors: Akshat Kishore Shrivastava, Tapan Kumar Gandhi

    Abstract: In this paper, we propose an novel methodology aimed at simulating the learning phenomenon of nystagmus through the application of differential blurring on datasets. Nystagmus is a biological phenomenon that influences human vision throughout life, notably by diminishing head shake from infancy to adulthood. Leveraging this concept, we address the issue of waste classification, a pressing global c… ▽ More

    Submitted 20 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: 16 pages, 4 figures

    MSC Class: 68T45 ACM Class: I.2; I.4

  38. arXiv:2312.08538  [pdf, other

    cs.LG cs.AI

    Contractive error feedback for gradient compression

    Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis

    Abstract: On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  39. arXiv:2312.07835  [pdf, other

    cs.CV cs.AI cs.LG

    Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements

    Authors: Gaurav Shrivastava, Ser-Nam Lim, Abhinav Shrivastava

    Abstract: In this paper, we present a novel robust framework for low-level vision tasks, including denoising, object removal, frame interpolation, and super-resolution, that does not require any external training data corpus. Our proposed approach directly learns the weights of neural modules by optimizing over the corrupted test sequence, leveraging the spatio-temporal coherence and internal statistics of… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023; Webpage - http://www.cs.umd.edu/~gauravsh/vdp.html

  40. arXiv:2312.04566  [pdf, other

    cs.CV

    Gen2Det: Generate to Detect

    Authors: Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava

    Abstract: Recently diffusion models have shown improvement in synthetic image quality as well as better control in generation. We motivate and present Gen2Det, a simple modular pipeline to create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods. Unlike existing works which generate individual object instances, require identifying foregrou… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  41. arXiv:2312.04564  [pdf, other

    cs.CV cs.GR

    EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

    Authors: Sharath Girish, Kamal Gupta, Abhinav Shrivastava

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for bo… ▽ More

    Submitted 24 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Website: https://efficientgaussian.github.io Code: https://github.com/Sharath-girish/efficientgaussian

  42. arXiv:2312.01671  [pdf, other

    cs.CV

    Multimodality-guided Image Style Transfer using Cross-modal GAN Inversion

    Authors: Hanyu Wang, Pengxiang Wu, Kevin Dela Rosa, Chen Wang, Abhinav Shrivastava

    Abstract: Image Style Transfer (IST) is an interdisciplinary topic of computer vision and art that continuously attracts researchers' interests. Different from traditional Image-guided Image Style Transfer (IIST) methods that require a style reference image as input to define the desired style, recent works start to tackle the problem in a text-guided manner, i.e., Text-guided Image Style Transfer (TIST). C… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: WACV 2024. Project website: https://hywang66.github.io/mmist/

  43. arXiv:2312.01655  [pdf, other

    quant-ph cs.AI

    Quantum Polar Metric Learning: Efficient Classically Learned Quantum Embeddings

    Authors: Vinayak Sharma, Aviral Shrivastava

    Abstract: Deep metric learning has recently shown extremely promising results in the classical data domain, creating well-separated feature spaces. This idea was also adapted to quantum computers via Quantum Metric Learning(QMeL). QMeL consists of a 2 step process with a classical model to compress the data to fit into the limited number of qubits, then train a Parameterized Quantum Circuit(PQC) to create b… ▽ More

    Submitted 27 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    ACM Class: I.2.6; E.4

  44. arXiv:2312.00115  [pdf, other

    cs.CV cs.CL

    A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

    Authors: Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran

    Abstract: Existing long video retrieval systems are trained and tested in the paragraph-to-video retrieval regime, where every long video is described by a single long paragraph. This neglects the richness and variety of possible valid descriptions of a video, which could be described in moment-by-moment detail, or in a single phrase summary, or anything in between. To provide a more thorough evaluation of… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 13 pages, 15 tables, 5 figures

  45. arXiv:2311.17921  [pdf, other

    cs.CV

    Do text-free diffusion models learn discriminative visual representations?

    Authors: Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Tianyi Zhou, Abhinav Shrivastava

    Abstract: While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We identify diffusion models, a state-of-the-art method for generative tasks, as a prime candidate. Such models involve training a U-Net to iteratively predict and re… ▽ More

    Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Website: see https://mgwillia.github.io/diffssl/ . Code: see https://github.com/soumik-kanad/diffssl . The first two authors contributed equally. 15 pages, 9 figures, 15 tables. Submission under review. (this article supersedes arXiv:2307.08702)

  46. arXiv:2311.13583  [pdf, other

    cs.LG

    Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies

    Authors: Shabnam Daghaghi, Benjamin Coleman, Benito Geordie, Anshumali Shrivastava

    Abstract: Data sampling is an effective method to improve the training speed of neural networks, with recent results demonstrating that it can even break the neural scaling laws. These results critically rely on high-quality scores to estimate the importance of an input to the network. We observe that there are two dominant strategies: static sampling, where the scores are determined before training, and dy… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  47. arXiv:2311.10873  [pdf, other

    cs.CV

    Multi-entity Video Transformers for Fine-Grained Video Representation Learning

    Authors: Matthew Walmer, Rose Kanjirathinkal, Kai Sheng Tai, Keyur Muzumdar, Taipeng Tian, Abhinav Shrivastava

    Abstract: The area of temporally fine-grained video representation learning aims to generate frame-by-frame representations for temporally dense tasks. In this work, we advance the state-of-the-art for this area by re-examining the design of transformer architectures for video representation learning. A salient aspect of our self-supervised method is the improved integration of spatial information in the te… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  48. arXiv:2311.01722  [pdf, other

    cs.LG

    Heterogeneous federated collaborative filtering using FAIR: Federated Averaging in Random Subspaces

    Authors: Aditya Desai, Benjamin Meisburger, Zichang Liu, Anshumali Shrivastava

    Abstract: Recommendation systems (RS) for items (e.g., movies, books) and ads are widely used to tailor content to users on various internet platforms. Traditionally, recommendation models are trained on a central server. However, due to rising concerns for data privacy and regulations like the GDPR, federated learning is an increasingly popular paradigm in which data never leaves the client device. Applyin… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  49. arXiv:2310.17157  [pdf, other

    cs.LG

    Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

    Authors: Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

    Abstract: Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM's in-context learning ability, or do not yield wall-clock time speedup on modern hardware.… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, 2023, 919

  50. arXiv:2310.11611  [pdf, other

    cs.LG

    In defense of parameter sharing for model-compression

    Authors: Aditya Desai, Anshumali Shrivastava

    Abstract: When considering a model architecture, there are several ways to reduce its memory footprint. Historically, popular approaches included selecting smaller architectures and creating sparse networks through pruning. More recently, randomized parameter-sharing (RPS) methods have gained traction for model compression at start of training. In this paper, we comprehensively assess the trade-off between… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.