Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Lohit, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15736  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

    Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum

    Abstract: Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as h… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.03723  [pdf, other

    cs.CV cs.GR cs.MM

    Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

    Authors: Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang, Pedro Miraldo, Suhas Lohit, Moitreya Chatterjee

    Abstract: Extensions of Neural Radiance Fields (NeRFs) to model dynamic scenes have enabled their near photo-realistic, free-viewpoint rendering. Although these methods have shown some potential in creating immersive experiences, two drawbacks limit their ubiquity: (i) a significant reduction in reconstruction quality when the computing budget is limited, and (ii) a lack of semantic understanding of the und… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Paper accepted to IEEE/CVF CVPR 2024 (Spotlight). Work done when XL was an intern at MERL. Project Page Link: https://merl.com/research/highlights/gear-nerf

    ACM Class: I.2.10

  3. arXiv:2404.16306  [pdf, other

    cs.CV

    TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

    Authors: Haomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks

    Abstract: Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  4. arXiv:2404.11764  [pdf, other

    cs.CV

    Multimodal 3D Object Detection on Unseen Domains

    Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel

    Abstract: LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degradation. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world,… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: technical report

  5. arXiv:2404.11737  [pdf, other

    cs.CV

    Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection

    Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel

    Abstract: Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: technical report

  6. arXiv:2402.15413  [pdf, other

    cs.LG

    G-RepsNet: A Fast and General Construction of Equivariant Networks for Arbitrary Matrix Groups

    Authors: Sourya Basu, Suhas Lohit, Matthew Brand

    Abstract: Group equivariance is a strong inductive bias useful in a wide range of deep learning tasks. However, constructing efficient equivariant networks for general groups and domains is difficult. Recent work by Finzi et al. (2021) directly solves the equivariance constraint for arbitrary matrix groups to obtain equivariant MLPs (EMLPs). But this method does not scale well and scaling is crucial in deep… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  7. arXiv:2310.00224  [pdf, other

    cs.CV cs.AI cs.LG

    Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

    Authors: Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks

    Abstract: Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in designing models that perform plug-and-play generation, i.e., to use a predefined or pretrained model, which is not explicitly trained on the generative task, to guide the generative process (e.g., using language). However, such guidanc… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted at ICCV 2023

  8. arXiv:2309.16592  [pdf, other

    cs.CV cs.LG

    Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection

    Authors: Manish Sharma, Moitreya Chatterjee, Kuan-Chuan Peng, Suhas Lohit, Michael Jones

    Abstract: The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, LIMIT Workshop. The first two authors contributed equally

  9. arXiv:2309.14531  [pdf, other

    cs.CV

    Pixel-Grounded Prototypical Part Networks

    Authors: Zachariah Carmichael, Suhas Lohit, Anoop Cherian, Michael Jones, Walter Scheirer

    Abstract: Prototypical part neural networks (ProtoPartNNs), namely PROTOPNET and its derivatives, are an intrinsically interpretable approach to machine learning. Their prototype learning scheme enables intuitive explanations of the form, this (prototype) looks like that (testing image patch). But, does this actually look like that? In this work, we delve into why object part localization and associated hea… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 21 pages

  10. arXiv:2212.09993  [pdf, other

    cs.AI cs.CV cs.LG

    Are Deep Neural Networks SMARTer than Second Graders?

    Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin A. Smith, Joshua B. Tenenbaum

    Abstract: Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algor… ▽ More

    Submitted 11 September, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Extended version of CVPR 2023 paper. For the SMART-101 dataset, see http://smartdataset.github.io/smart101

  11. arXiv:2209.04027  [pdf, other

    cs.CV

    Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

    Authors: Sk Miraj Ahmed, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Amit K. Roy-Chowdhury

    Abstract: Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  12. arXiv:2110.10211  [pdf, other

    cs.CV cs.LG

    Learning Partial Equivariances from Data

    Authors: David W. Romero, Suhas Lohit

    Abstract: Group Convolutional Neural Networks (G-CNNs) constrain learned features to respect the symmetries in the selected group, and lead to better generalization when these symmetries appear in the data. If this is not the case, however, equivariance leads to overly constrained models and worse performance. Frequently, transformations occurring in data can be better represented by a subset of a group tha… ▽ More

    Submitted 14 January, 2023; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Published at NeurIPS 2022

  13. arXiv:2012.04474  [pdf, other

    cs.CV cs.LG

    Rotation-Invariant Autoencoders for Signals on Spheres

    Authors: Suhas Lohit, Shubhendu Trivedi

    Abstract: Omnidirectional images and spherical representations of $3D$ shapes cannot be processed with conventional 2D convolutional neural networks (CNNs) as the unwrapping leads to large distortion. Using fast implementations of spherical and $SO(3)$ convolutions, researchers have recently developed deep learning methods better suited for classifying spherical images. These newly proposed convolutional la… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  14. arXiv:2012.03907  [pdf, ps, other

    cs.CV cs.LG

    Model Compression Using Optimal Transport

    Authors: Suhas Lohit, Michael Jones

    Abstract: Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how o… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

  15. arXiv:2012.02911  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Multi-head Knowledge Distillation for Model Compression

    Authors: Huan Wang, Suhas Lohit, Michael Jones, Yun Fu

    Abstract: Several methods of knowledge distillation have been developed for neural network compression. While they all use the KL divergence loss to align the soft outputs of the student model more closely with that of the teacher, the various methods differ in how the intermediate features of the student are encouraged to match those of the teacher. In this paper, we propose a simple-to-implement method us… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: Copyright: 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  16. arXiv:2012.02909  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    What Makes a "Good" Data Augmentation in Knowledge Distillation -- A Statistical Perspective

    Authors: Huan Wang, Suhas Lohit, Mike Jones, Yun Fu

    Abstract: Knowledge distillation (KD) is a general neural network training approach that uses a teacher model to guide the student model. Existing works mainly study KD from the network output side (e.g., trying to design a better KD loss function), while few have attempted to understand it from the input side. Especially, its interplay with data augmentation (DA) has not been well understood. In this paper… ▽ More

    Submitted 21 February, 2023; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: Camera Ready of NeurIPS'22. Code: https://github.com/MingSun-Tse/Good-DA-in-KD

  17. arXiv:2012.02043  [pdf, other

    cs.CV cs.LG

    Recovering Trajectories of Unmarked Joints in 3D Human Actions Using Latent Space Optimization

    Authors: Suhas Lohit, Rushil Anirudh, Pavan Turaga

    Abstract: Motion capture (mocap) and time-of-flight based sensing of human actions are becoming increasingly popular modalities to perform robust activity analysis. Applications range from action recognition to quantifying movement quality for health applications. While marker-less motion capture has made great progress, in critical applications such as healthcare, marker-based systems, especially active ma… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: Accepted at WACV 2021

  18. arXiv:2006.10873  [pdf, other

    cs.CV cs.LG

    Generative Patch Priors for Practical Compressive Image Recovery

    Authors: Rushil Anirudh, Suhas Lohit, Pavan Turaga

    Abstract: In this paper, we propose the generative patch prior (GPP) that defines a generative prior for compressive image recovery, based on patch-manifold models. Unlike learned, image-level priors that are restricted to the range space of a pre-trained generator, GPP can recover a wide variety of natural images using a pre-trained patch generator. Additionally, GPP retains the benefits of generative prio… ▽ More

    Submitted 5 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  19. arXiv:1906.05947  [pdf, other

    cs.CV

    Temporal Transformer Networks: Joint Learning of Invariant and Discriminative Time Warping

    Authors: Suhas Lohit, Qiao Wang, Pavan Turaga

    Abstract: Many time-series classification problems involve developing metrics that are invariant to temporal misalignment. In human activity analysis, temporal misalignment arises due to various reasons including differing initial phase, sensor sampling rates, and elastic time-warps due to subject-specific biomechanics. Past work in this area has only looked at reducing intra-class variability by elastic te… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: Published in CVPR 2019, Codes available at https://github.com/suhaslohit/TTN

  20. arXiv:1809.02850  [pdf, other

    cs.CV

    Rate-Adaptive Neural Networks for Spatial Multiplexers

    Authors: Suhas Lohit, Rajhans Singh, Kuldeep Kulkarni, Pavan Turaga

    Abstract: In resource-constrained environments, one can employ spatial multiplexing cameras to acquire a small number of measurements of a scene, and perform effective reconstruction or high-level inference using purely data-driven neural networks. However, once trained, the measurement matrix and the network are valid only for a single measurement rate (MR) chosen at training time. To overcome this drawbac… ▽ More

    Submitted 8 September, 2018; originally announced September 2018.

  21. arXiv:1806.03379  [pdf, other

    cs.CV cs.AI

    CS-VQA: Visual Question Answering with Compressively Sensed Images

    Authors: Li-Chi Huang, Kuldeep Kulkarni, Anik Jha, Suhas Lohit, Suren Jayasuriya, Pavan Turaga

    Abstract: Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvabl… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.

    Comments: 5 pages, 2 figures, accepted to ICIP 2018

    MSC Class: 68

  22. arXiv:1708.09485  [pdf, other

    cs.CV

    Learning Invariant Riemannian Geometric Representations Using Deep Nets

    Authors: Suhas Lohit, Pavan Turaga

    Abstract: Non-Euclidean constraints are inherent in many kinds of data in computer vision and machine learning, typically as a result of specific invariance requirements that need to be respected during high-level inference. Often, these geometric constraints can be expressed in the language of Riemannian geometry, where conventional vector space machine learning does not apply directly. The central questio… ▽ More

    Submitted 22 September, 2017; v1 submitted 30 August, 2017; originally announced August 2017.

    Comments: Accepted at International Conference on Computer Vision Workshop (ICCVW), 2017 on Manifold Learning: from Euclid to Riemann

  23. arXiv:1708.04669  [pdf, other

    cs.CV

    Convolutional Neural Networks for Non-iterative Reconstruction of Compressively Sensed Images

    Authors: Suhas Lohit, Kuldeep Kulkarni, Ronan Kerviche, Pavan Turaga, Amit Ashok

    Abstract: Traditional algorithms for compressive sensing recovery are computationally expensive and are ineffective at low measurement rates. In this work, we propose a data driven non-iterative algorithm to overcome the shortcomings of earlier iterative algorithms. Our solution, ReconNet, is a deep neural network, whose parameters are learned end-to-end to map block-wise compressive measurements of the sce… ▽ More

    Submitted 16 August, 2017; v1 submitted 15 August, 2017; originally announced August 2017.

  24. arXiv:1601.06892  [pdf, other

    cs.CV

    ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Random Measurements

    Authors: Kuldeep Kulkarni, Suhas Lohit, Pavan Turaga, Ronan Kerviche, Amit Ashok

    Abstract: The goal of this paper is to present a non-iterative and more importantly an extremely fast algorithm to reconstruct images from compressively sensed (CS) random measurements. To this end, we propose a novel convolutional neural network (CNN) architecture which takes in CS measurements of an image as input and outputs an intermediate reconstruction. We call this network, ReconNet. The intermediate… ▽ More

    Submitted 7 March, 2016; v1 submitted 26 January, 2016; originally announced January 2016.

    Comments: Accepted at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016