Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Bhatt, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.00530  [pdf, other

    cs.CL cs.AI cs.LG

    Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

    Authors: Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, Aditya Grover

    Abstract: A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context. This only leverages the pairwise comparisons when the generations are placed in an identical context. However, such conditional rankings often fail to capture the complex and multidimensional aspects of human preferences. In this work,… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 25 pages, 14 figures, 5 tables

  2. arXiv:2403.14797  [pdf, other

    cs.CV cs.LG

    Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

    Authors: Gaurav Bhatt, James Ross, Leonid Sigal

    Abstract: Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-… ▽ More

    Submitted 15 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Journal ref: European Conference on Computer Vision, 2024

  3. arXiv:2403.08199  [pdf, other

    cs.LG cs.AI

    Deep Submodular Peripteral Networks

    Authors: Gantavya Bhatt, Arnav Das, Jeff Bilmes

    Abstract: Submodular functions, crucial for various applications, often lack practical learning methods for their acquisition. Seemingly unrelated, learning a scaling from oracles offering graded pairwise preferences (GPC) is underexplored, despite a rich history in psychometrics. In this paper, we introduce deep submodular peripteral networks (DSPNs), a novel parametric family of submodular functions, and… ▽ More

    Submitted 15 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint

  4. arXiv:2401.06692  [pdf, other

    cs.CL cs.AI cs.LG

    An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

    Authors: Gantavya Bhatt, Yifang Chen, Arnav M. Das, Jifan Zhang, Sang T. Truong, Stephen Mussmann, Yinglun Zhu, Jeffrey Bilmes, Simon S. Du, Kevin Jamieson, Jordan T. Ash, Robert D. Nowak

    Abstract: Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues t… ▽ More

    Submitted 7 July, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to Findings of the Association for Computational Linguistics: ACL 2024

  5. arXiv:2312.01261  [pdf, other

    cs.CV cs.CY

    TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

    Authors: Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk

    Abstract: Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such a model's ability to generate more diverse image… ▽ More

    Submitted 16 July, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024. Code and data available at https://tibet-ai.github.io

  6. arXiv:2311.14948  [pdf, other

    cs.LG cs.AI cs.CV

    Effective Backdoor Mitigation Depends on the Pre-training Objective

    Authors: Sahil Verma, Gantavya Bhatt, Avi Schwarzschild, Soumye Singhal, Arnav Mohanty Das, Chirag Shah, John P Dickerson, Jeff Bilmes

    Abstract: Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for… ▽ More

    Submitted 5 December, 2023; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: Accepted for oral presentation at BUGS workshop @ NeurIPS 2023 (https://neurips2023-bugs.github.io/)

  7. arXiv:2310.00377  [pdf, other

    cs.LG

    Mitigating the Effect of Incidental Correlations on Part-based Learning

    Authors: Gaurav Bhatt, Deepayan Das, Leonid Sigal, Vineeth N Balasubramanian

    Abstract: Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific bac… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted in 37th Conference on Neural Information Processing Systems (NeurIPS'2023)

  8. arXiv:2306.09910  [pdf, other

    cs.LG cs.AI cs.CV

    LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

    Authors: Jifan Zhang, Yifang Chen, Gregory Canal, Stephen Mussmann, Arnav M. Das, Gantavya Bhatt, Yinglun Zhu, Jeffrey Bilmes, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak

    Abstract: Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires… ▽ More

    Submitted 1 March, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

  9. arXiv:2305.06408  [pdf, other

    cs.LG

    Accelerating Batch Active Learning Using Continual Learning Techniques

    Authors: Arnav Das, Gantavya Bhatt, Megh Bhalerao, Vianne Gao, Rui Yang, Jeff Bilmes

    Abstract: A major problem with Active Learning (AL) is high training costs since models are typically retrained from scratch after every query round. We start by demonstrating that standard AL on neural networks with warm starting fails, both to accelerate training and to avoid catastrophic forgetting when using fine-tuning over AL query rounds. We then develop a new class of techniques, circumventing this… ▽ More

    Submitted 12 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Appeared in TMLR 2023

  10. High Resolution Point Clouds from mmWave Radar

    Authors: Akarsh Prabhakara, Tao Jin, Arnav Das, Gantavya Bhatt, Lilly Kumari, Elahe Soltanaghaei, Jeff Bilmes, Swarun Kumar, Anthony Rowe

    Abstract: This paper explores a machine learning approach for generating high resolution point clouds from a single-chip mmWave radar. Unlike lidar and vision-based systems, mmWave radar can operate in harsh environments and see through occlusions like smoke, fog, and dust. Unfortunately, current mmWave processing techniques offer poor spatial resolution compared to lidar point clouds. This paper presents R… ▽ More

    Submitted 16 July, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

    Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, 2023, pp. 4135-4142

  11. arXiv:2205.13147  [pdf, other

    cs.LG cs.CV

    Matryoshka Representation Learning

    Authors: Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

    Abstract: Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we d… ▽ More

    Submitted 7 February, 2024; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Edited related work to include intrinsic dimensionality works

  12. arXiv:2107.04952  [pdf, other

    cs.CV cs.AI cs.LG

    Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision

    Authors: Gaurav Bhatt, Shivam Chandhok, Vineeth N Balasubramanian

    Abstract: A common problem with most zero and few-shot learning approaches is they suffer from bias towards seen classes resulting in sub-optimal performance. Existing efforts aim to utilize unlabeled images from unseen classes (i.e transductive zero-shot) during training to enable generalization. However, this limits their use in practical scenarios where data from target unseen classes is unavailable or i… ▽ More

    Submitted 13 July, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

    Comments: Accepted at IJCAI'21 workshop on Weakly Supervised Representation Learning

  13. arXiv:2102.05602  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Systematic Generalization in Neural Networks-based Multivariate Time Series Forecasting Models

    Authors: Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, Prathosh A. P

    Abstract: Systematic generalization aims to evaluate reasoning about novel combinations from known components, an intrinsic property of human cognition. In this work, we study systematic generalization of NNs in forecasting future time series of dependent variables in a dynamical system, conditioned on past time series of dependent variables, and past and future control variables. We focus on systematic gen… ▽ More

    Submitted 7 March, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: 9 pages, 8 figures, 2 tables

  14. arXiv:2010.04976  [pdf, other

    cs.CL cs.LG

    Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?

    Authors: Hritik Bansal, Gantavya Bhatt, Sumeet Agarwal

    Abstract: Previous work suggests that RNNs trained on natural language corpora can capture number agreement well for simple sentences but perform less well when sentences contain agreement attractors: intervening nouns between the verb and the main subject with grammatical number opposite to the latter. This suggests these models may not learn the actual syntax of agreement, but rather infer shallower heuri… ▽ More

    Submitted 9 April, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

    Comments: 15 pages, 3 figures, 13 Tables (including Appendix); Non Archival Extended Abstract Accepted in SciL 2021 - https://scholarworks.umass.edu/scil/vol4/iss1/38/

  15. arXiv:2005.08199  [pdf, other

    cs.CL q-bio.NC

    How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

    Authors: Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal

    Abstract: Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks. On the other hand, simple recurrent networks (SRNs), which appear more biologically grounded in terms of synaptic connections, have generally been less successful at capturing long-range dependencies as well as the… ▽ More

    Submitted 25 May, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: 11 pages, 5 figures (including appendix); to appear at ACL SRW 2020

    ACM Class: I.2.6; I.2.7; J.5

  16. arXiv:1811.00936  [pdf, other

    cs.SD eess.AS

    Acoustic Features Fusion using Attentive Multi-channel Deep Architecture

    Authors: Gaurav Bhatt, Akshita Gupta, Aditya Arora, Balasubramanian Raman

    Abstract: In this paper, we present a novel deep fusion architecture for audio classification tasks. The multi-channel model presented is formed using deep convolution layers where different acoustic features are passed through each channel. To enable dissemination of information across the channels, we introduce attention feature maps that aid in the alignment of frames. The output of each channel is merge… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted in CHiME'18 (Interspeech Workshop)

  17. arXiv:1801.06792  [pdf, other

    cs.CL

    Attentive Recurrent Tensor Model for Community Question Answering

    Authors: Gaurav Bhatt, Shivam Sharma, Balasubramanian Raman

    Abstract: A major challenge to the problem of community question answering is the lexical and semantic gap between the sentence representations. Some solutions to minimize this gap includes the introduction of extra parameters to deep models or augmenting the external handcrafted features. In this paper, we propose a novel attentive recurrent tensor network for solving the lexical and semantic gap in commun… ▽ More

    Submitted 21 January, 2018; originally announced January 2018.

  18. arXiv:1712.03935  [pdf, other

    cs.CL

    On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification

    Authors: Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, Ankush Mittal

    Abstract: Identifying the veracity of a news article is an interesting problem while automating this process can be a challenging task. Detection of a news article as fake is still an open question as it is contingent on many factors which the current state-of-the-art models fail to incorporate. In this paper, we explore a subtask to fake news identification, and that is stance detection. Given a news artic… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

    Comments: Source code available at - www.deeplearn-ai.com

  19. arXiv:1711.00003  [pdf, other

    cs.CV

    Common Representation Learning Using Step-based Correlation Multi-Modal CNN

    Authors: Gaurav Bhatt, Piyush Jha, Balasubramanian Raman

    Abstract: Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we invest… ▽ More

    Submitted 31 October, 2017; originally announced November 2017.

    Comments: Accepted in Asian Conference of Pattern Recognition (ACPR-2017)