Zum Hauptinhalt springen

Showing 51–100 of 143 results for author: Baral, C

.
  1. arXiv:2210.07631  [pdf, other

    cs.CL cs.CV

    Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task

    Authors: Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

    Abstract: Evaluation of models on benchmarks is unreliable without knowing the degree of sample hardness; this subsequently overestimates the capability of AI systems and limits their adoption in real world applications. We propose a Data Scoring task that requires assignment of each unannotated sample in a benchmark a score between 0 to 1, where 0 signifies easy and 1 signifies hard. Use of unannotated sam… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2007.06898

  2. arXiv:2210.07566  [pdf, other

    cs.CL cs.CV

    A Survey of Parameters Associated with the Quality of Benchmarks in NLP

    Authors: Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

    Abstract: Several benchmarks have been built with heavy investment in resources to track our progress in NLP. Thousands of papers published in response to those benchmarks have competed to top leaderboards, with models often surpassing human performance. However, recent studies have shown that models triumph over several popular benchmarks just by overfitting on spurious biases, without truly learning the d… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2005.00816

  3. arXiv:2210.07471  [pdf, other

    cs.CL

    "John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility

    Authors: Himanshu Gupta, Neeraj Varshney, Swaroop Mishra, Kuntal Kumar Pal, Saurabh Arjun Sawant, Kevin Scaria, Siddharth Goyal, Chitta Baral

    Abstract: In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-an… ▽ More

    Submitted 2 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: EACL 2023

  4. arXiv:2210.05528  [pdf, other

    cs.CL cs.AI

    Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems

    Authors: Neeraj Varshney, Chitta Baral

    Abstract: Do all instances need inference through the big models for a correct prediction? Perhaps not; some instances are easy and can be answered correctly by even small capacity models. This provides opportunities for improving the computational efficiency of systems. In this work, we present an explorative study on 'model cascading', a simple technique that utilizes a collection of models of varying cap… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  5. arXiv:2210.04466  [pdf, other

    cs.CL cs.CV

    Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications

    Authors: Swaroop Mishra, Anjana Arunkumar, Chitta Baral

    Abstract: With the increasing importance of safety requirements associated with the use of black box models, evaluation of selective answering capability of models has been critical. Area under the curve (AUC) is used as a metric for this purpose. We find limitations in AUC; e.g., a model having higher AUC is not always better in performing selective answering. We propose three alternate metrics that fix th… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  6. arXiv:2210.01371  [pdf, other

    cs.IR cs.CL

    A Study on the Efficiency and Generalization of Light Hybrid Retrievers

    Authors: Man Luo, Shashank Jain, Anchit Gupta, Arash Einolghozati, Barlas Oguz, Debojeet Chatterjee, Xilun Chen, Chitta Baral, Peyman Heidari

    Abstract: Hybrid retrievers can take advantage of both sparse and dense retrievers. Previous hybrid retrievers leverage indexing-heavy dense retrievers. In this work, we study "Is it possible to reduce the indexing memory of hybrid retrievers without sacrificing performance"? Driven by this question, we leverage an indexing-efficient dense retriever (i.e. DrBoost) and introduce a LITE retriever that further… ▽ More

    Submitted 23 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: accepted to ACL23

  7. arXiv:2207.07568  [pdf, other

    cs.CL

    Reasoning about Actions over Visual and Linguistic Modalities: A Survey

    Authors: Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang, Chitta Baral

    Abstract: 'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals. As a result, most common sense (CS) knowledge for humans revolves around actions. While 'Reasoning about Actions & Change' (RAC) has been widely studied in the Knowledge Representation community, it has recently piqued the interest of NLP and computer vision researchers. This paper surveys e… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: 7 pages, 3 figures; This survey will be periodically updated with the latest works in this area

  8. arXiv:2207.02419  [pdf, other

    cs.CL cs.AI cs.LG

    BioTABQA: Instruction Learning for Biomedical Table Question Answering

    Authors: Man Luo, Sharad Saxena, Swaroop Mishra, Mihir Parmar, Chitta Baral

    Abstract: Table Question Answering (TQA) is an important but under-explored task. Most of the existing QA datasets are in unstructured text format and only few of them use tables as the context. To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information. In this paper, we first curate a table question answering dataset, BioTABQA,… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: BioASQ10 Workshop

  9. arXiv:2206.07736  [pdf, other

    cs.LG cs.CV

    Improving Diversity with Adversarially Learned Transformations for Domain Generalization

    Authors: Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang

    Abstract: To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies. Many of the recent successes have come from methods that pre-specify the types of diversity that a model is exposed to during training, so that it can ultimately generalize well to new domains. However, naïve diversity based augmentations do not… ▽ More

    Submitted 12 December, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: WACV 2023. Code: https://github.com/tejas-gokhale/ALT

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. arXiv:2205.12538  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Is a Question Decomposition Unit All We Need?

    Authors: Pruthvi Patel, Swaroop Mishra, Mihir Parmar, Chitta Baral

    Abstract: Large Language Models (LMs) have achieved state-of-the-art performance on many Natural Language Processing (NLP) benchmarks. With the growing number of new benchmarks, we build bigger and more complex LMs. However, building new LMs may not be an ideal option owing to the cost, time and environmental impact associated with it. We explore an alternative route: can we modify data by expressing it in… ▽ More

    Submitted 26 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 (17 pages)

  12. arXiv:2205.09898  [pdf, other

    cs.LG cs.CL

    Let the Model Decide its Curriculum for Multitask Learning

    Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral

    Abstract: Curriculum learning strategies in prior multi-task learning approaches arrange datasets in a difficulty hierarchy either based on human perception or by exhaustively searching the optimal arrangement. However, human perception of difficulty may not always correlate well with machine interpretation leading to poor performance and exhaustive search is computationally expensive. Addressing these conc… ▽ More

    Submitted 27 May, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: NAACL 2022 Deep Learning for Low-Resource NLP Workshop

  13. arXiv:2205.00415  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions

    Authors: Mihir Parmar, Swaroop Mishra, Mor Geva, Chitta Baral

    Abstract: In recent years, progress in NLU has been driven by benchmarks. These benchmarks are typically collected by crowdsourcing, where annotators write examples based on annotation instructions crafted by dataset creators. In this work, we hypothesize that annotators pick up on patterns in the crowdsourcing instructions, which bias them to write many similar examples that are then over-represented in th… ▽ More

    Submitted 19 March, 2024; v1 submitted 1 May, 2022; originally announced May 2022.

    Comments: EACL 2023 (Outstanding Paper Award)

  14. arXiv:2204.07705  [pdf, other

    cs.CL cs.AI

    Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

    Authors: Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza , et al. (15 additional authors not shown)

    Abstract: How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting,… ▽ More

    Submitted 24 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: Accepted to EMNLP 2022, 25 pages

  15. arXiv:2204.07600  [pdf, other

    cs.CL

    In-BoXBART: Get Instructions into Biomedical Multi-Task Learning

    Authors: Mihir Parmar, Swaroop Mishra, Mirali Purohit, Man Luo, M. Hassan Murad, Chitta Baral

    Abstract: Single-task models have proven pivotal in solving specific tasks; however, they have limitations in real-world applications where multi-tasking is necessary and domain shifts are exhibited. Recently, instructional prompts have shown significant improvement towards multi-task generalization; however, the effect of instructional prompts and Multi-Task Learning (MTL) has not been systematically studi… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: NAACL 2022 Findings

  16. arXiv:2204.05660  [pdf, other

    cs.CL cs.AI cs.LG

    NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

    Authors: Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva, Peter Clark, Chitta Baral, Ashwin Kalyan

    Abstract: Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed to this end, state-of-the-art AI systems are brittle; failing to perform the underlying mathematical reasoning when they appear in a slightly different scenario. Drawing inspiration from GLUE that was proposed… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: ACL 2022

  17. arXiv:2203.16682  [pdf, other

    cs.CV cs.CL cs.LG

    To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

    Authors: Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

    Abstract: We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of bias… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022 (Short Paper)

  18. arXiv:2203.09161  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    How Many Data Samples is an Additional Instruction Worth?

    Authors: Ravsehaj Singh Puri, Swaroop Mishra, Mihir Parmar, Chitta Baral

    Abstract: Recently introduced instruction-paradigm empowers non-expert users to leverage NLP resources by defining a new task in natural language. Instruction-tuned models have significantly outperformed multitask learning models (without instruction); however they are far from state-of-the-art task-specific models. Conventional approaches to improve model performance via creating datasets with large number… ▽ More

    Submitted 13 February, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: EACL 2023 Findings

  19. arXiv:2203.08597  [pdf, other

    cs.CL cs.AI cs.LG

    Less is More: Summary of Long Instructions is Better for Program Synthesis

    Authors: Kirby Kuznia, Swaroop Mishra, Mihir Parmar, Chitta Baral

    Abstract: Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, and names (whic… ▽ More

    Submitted 22 October, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: EMNLP 2022

  20. arXiv:2203.07653  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

    Authors: Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva, Chitta Baral

    Abstract: Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive stu… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: ACL 2022 Findings

  21. arXiv:2203.07522  [pdf, other

    cs.CL

    Choose Your QA Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering

    Authors: Man Luo, Kazuma Hashimoto, Semih Yavuz, Zhiwei Liu, Chitta Baral, Yingbo Zhou

    Abstract: While both extractive and generative readers have been successfully applied to the Question Answering (QA) task, little attention has been paid toward the systematic comparison of them. Characterizing the strengths and weaknesses of the two readers is crucial not only for making a more informed reader selection in practice but also for developing a deeper understanding to foster further research o… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  22. arXiv:2203.03073  [pdf, other

    cs.CL cs.AI cs.LG

    ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

    Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral

    Abstract: Knowledge of questions' difficulty level helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in NLP? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale… ▽ More

    Submitted 8 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  23. arXiv:2203.00211  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

    Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral

    Abstract: In order to equip NLP systems with selective prediction capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline 'MaxProb' remains to be explored. To this end, we systematically study 'selective prediction' in a large-scale setup of 17 datasets across several NLP tasks. Through co… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

    Comments: ACL 2022 Findings

  24. arXiv:2201.07745  [pdf, other

    cs.IR cs.CL

    Improving Biomedical Information Retrieval with Neural Retrievers

    Authors: Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral

    Abstract: Information retrieval (IR) is essential in search engines and dialogue systems as well as natural language processing tasks such as open-domain question answering. IR serve an important function in the biomedical domain, where content and sources of scientific knowledge may evolve rapidly. Although neural retrievers have surpassed traditional IR approaches such as TF-IDF and BM25 in standard open-… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: Accepted at AAAI 2022

  25. arXiv:2110.08438  [pdf, other

    cs.CL cs.AI cs.LG

    Unsupervised Natural Language Inference Using PHL Triplet Generation

    Authors: Neeraj Varshney, Pratyay Banerjee, Tejas Gokhale, Chitta Baral

    Abstract: Transformer-based models achieve impressive performance on numerous Natural Language Inference (NLI) benchmarks when trained on respective training datasets. However, in certain cases, training samples may not be available or collecting them could be time-consuming and resource-intensive. In this work, we address the above challenge and present an explorative study on unsupervised NLI, a paradigm… ▽ More

    Submitted 15 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ACL 2022 Findings

  26. arXiv:2110.08393  [pdf, other

    cs.AI cs.LG

    A Bayesian Approach for Medical Inquiry and Disease Inference in Automated Differential Diagnosis

    Authors: Hong Guan, Chitta Baral

    Abstract: We propose a Bayesian approach for both medical inquiry and disease inference, the two major phases in differential diagnosis. Unlike previous work that simulates data from given probabilities and uses ML algorithms on them, we directly use the Quick Medical Reference (QMR) belief network, and apply Bayesian inference in the inference phase and Bayesian experimental design in the inquiry phase. Mo… ▽ More

    Submitted 22 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

  27. arXiv:2110.07165  [pdf, other

    cs.CV cs.CL

    Semantically Distributed Robust Optimization for Vision-and-Language Inference

    Authors: Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang

    Abstract: Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms. While data augmentation techniques have been designed to mitigate against these failure modes, methods that can integrate this knowledge into the training pipeline remain under-explored. In this paper,… ▽ More

    Submitted 14 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Findings of ACL 2022; code available at https://github.com/ASU-APG/VLI_SDRO

  28. arXiv:2109.10497  [pdf, other

    cs.CL

    A Simple Approach to Jointly Rank Passages and Select Relevant Sentences in the OBQA Context

    Authors: Man Luo, Shuguang Chen, Chitta Baral

    Abstract: In the open book question answering (OBQA) task, selecting the relevant passages and sentences from distracting information is crucial to reason the answer to a question. HotpotQA dataset is designed to teach and evaluate systems to do both passage ranking and sentence selection. Many existing frameworks use separate models to select relevant passages and sentences respectively. Such systems not o… ▽ More

    Submitted 2 August, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

    Comments: Accepted to NAACL SWR 2022

  29. arXiv:2109.07830  [pdf, other

    cs.CL cs.AI cs.LG

    Reframing Instructional Prompts to GPTk's Language

    Authors: Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, Hannaneh Hajishirzi

    Abstract: What kinds of instructional prompts are easier to follow for Language Models (LMs)? We study this question by conducting extensive empirical analysis that shed light on important features of successful instructional prompts. Specifically, we study several classes of reframing techniques for manual reformulation of prompts into more effective ones. Some examples include decomposing a complex task i… ▽ More

    Submitted 15 March, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: ACL 2022 Findings

  30. arXiv:2109.04672  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model

    Authors: Kuntal Kumar Pal, Chitta Baral

    Abstract: The transformer-based pre-trained language models have been tremendously successful in most of the conventional NLP tasks. But they often struggle in those tasks where numerical understanding is required. Some possible reasons can be the tokenizers and pre-training objectives which are not specifically designed to learn and preserve numeracy. Here we investigate the ability of text-to-text transfe… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 7 pages, 10 figures, 5 tables, Accepted in the Findings of EMNLP 2021

  31. arXiv:2109.04014  [pdf, other

    cs.CL

    Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

    Authors: Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral

    Abstract: Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: accepted at EMNLP 2021

  32. arXiv:2109.01934  [pdf, other

    cs.CV cs.CL cs.LG

    Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

    Authors: Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

    Abstract: Vision-and-language (V\&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e.\ implicitly learning the geometry of the scene. In… ▽ More

    Submitted 4 September, 2021; originally announced September 2021.

    Comments: Accepted to ICCV 2021. PaperId : ICCV2021-10857 Copyright transferred to IEEE ICCV. DOI will be updated later

  33. arXiv:2107.00315  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Interviewer-Candidate Role Play: Towards Developing Real-World NLP Systems

    Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral

    Abstract: Standard NLP tasks do not incorporate several common real-world scenarios such as seeking clarifications about the question, taking advantage of clues, abstaining in order to avoid incorrect answers, etc. This difference in task formulation hinders the adoption of NLP systems in real-world settings. In this work, we take a step towards bridging this gap and present a multi-stage task that simulate… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: 12 pages

  34. arXiv:2105.14357  [pdf, other

    cs.CL cs.AI cs.CR

    Constructing Flow Graphs from Procedural Cybersecurity Texts

    Authors: Kuntal Kumar Pal, Kazuaki Kashihara, Pratyay Banerjee, Swaroop Mishra, Ruoyu Wang, Chitta Baral

    Abstract: Following procedural texts written in natural languages is challenging. We must read the whole text to identify the relevant information or identify the instruction flows to complete a task, which is prone to failures. If such texts are structured, we can readily visualize instruction-flows, reason or infer a particular step, or even build automated systems to help novice agents achieve a goal. Ho… ▽ More

    Submitted 29 May, 2021; originally announced May 2021.

    Comments: 13 pages, 5 pages, accepted in the Findings of ACL 2021

  35. arXiv:2105.12392  [pdf, other

    cs.CL

    Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction

    Authors: Ming Shen, Pratyay Banerjee, Chitta Baral

    Abstract: In this work, we propose Masked Noun-Phrase Prediction (MNPP), a pre-training strategy to tackle pronoun resolution in a fully unsupervised setting. Firstly, We evaluate our pre-trained model on various pronoun resolution datasets without any finetuning. Our method outperforms all previous unsupervised methods on all datasets by large margins. Secondly, we proceed to a few-shot setting where we fi… ▽ More

    Submitted 28 May, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted to ACL2021

  36. arXiv:2104.08773  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Cross-Task Generalization via Natural Language Crowdsourcing Instructions

    Authors: Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi

    Abstract: Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on individual datasets, such models often struggle with generalization across tasks (e.g., a question-answering system cannot solve classification tasks). A long-standing… ▽ More

    Submitted 14 March, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: ACL 2022

  37. arXiv:2104.05981  [pdf, other

    cs.CV

    CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

    Authors: Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Chitta Baral

    Abstract: Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video. In this paper, we take visual understanding to a higher level where systems are challenged to answer questions that involve mentally simulating the hypothetical consequences of performing specific actions in a given scenario. Towards that end, we formulate a vision-languag… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: 16 pages, 11 figures, Accepted as a Long Paper at NAACL-HLT 2021

  38. arXiv:2103.15022  [pdf, other

    cs.CL

    'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks

    Authors: Man Luo, Shailaja Keyur Sampat, Riley Tallman, Yankai Zeng, Manuha Vancha, Akarshan Sajja, Chitta Baral

    Abstract: GQA~\citep{hudson2019gqa} is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best vision-language models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where… ▽ More

    Submitted 31 May, 2022; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: accepted to EACL 2021

  39. arXiv:2103.12801  [pdf, other

    cs.LG cs.CL cs.CR

    Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling

    Authors: Pratyay Banerjee, Kuntal Kumar Pal, Fish Wang, Chitta Baral

    Abstract: Decompilation is the procedure of transforming binary programs into a high-level representation, such as source code, for human analysts to examine. While modern decompilers can reconstruct and recover much information that is discarded during compilation, inferring variable names is still extremely difficult. Inspired by recent advances in natural language processing, we propose a novel solution… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: Work In Progress

  40. arXiv:2103.11263  [pdf, other

    cs.CL cs.LG

    Self-Supervised Test-Time Learning for Reading Comprehension

    Authors: Pratyay Banerjee, Tejas Gokhale, Chitta Baral

    Abstract: Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods. In this work, we consider the task of unsupervised reading comprehension and present a method that performs "test-time learning" (TTL) on a given context (text passage), without requiring training on l… ▽ More

    Submitted 20 March, 2021; originally announced March 2021.

    Comments: Accepted to NAACL 2021

  41. arXiv:2012.09938  [pdf, other

    cs.CL cs.AI

    Can Transformers Reason About Effects of Actions?

    Authors: Pratyay Banerjee, Chitta Baral, Man Luo, Arindam Mitra, Kuntal Pal, Tran C. Son, Neeraj Varshney

    Abstract: A recent work has shown that transformers are able to "reason" with facts and rules in a limited setting where the rules are natural language expressions of conjunctions of conditions implying a conclusion. Since this suggests that transformers may be used for reasoning with knowledge given in natural language, we do a rigorous evaluation of this with respect to a common form of knowledge and its… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  42. arXiv:2012.02356  [pdf, other

    cs.CV cs.CL

    WeaQA: Weak Supervision via Captions for Visual Question Answering

    Authors: Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

    Abstract: Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets. This has led to heavy reliance on datasets and a lack of generalization to new types of questions and scenes. Linguistic priors along with biases and errors due to annotator subjectivity have been shown to percolate into VQA mod… ▽ More

    Submitted 28 May, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: Accepted in Findings of ACL 2021

  43. arXiv:2012.01806  [pdf, other

    cs.CV cs.LG

    Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

    Authors: Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

    Abstract: While existing work in robust deep learning has focused on small pixel-level norm-based perturbations, this may not account for perturbations encountered in several real-world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expe… ▽ More

    Submitted 7 April, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: AAAI 2021. Camera Ready version + Appendix

  44. arXiv:2010.12083  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    Language-Conditioned Imitation Learning for Robot Manipulation Tasks

    Authors: Simon Stepputtis, Joseph Campbell, Mariano Phielipp, Stefan Lee, Chitta Baral, Heni Ben Amor

    Abstract: Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted to the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada as spotlight presentation

  45. arXiv:2009.08566  [pdf, other

    cs.CV cs.CL

    MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

    Authors: Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

    Abstract: While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct… ▽ More

    Submitted 15 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted to EMNLP 2020, Long Papers

  46. arXiv:2009.01938  [pdf, other

    cs.IR cs.CL

    Multi-Perspective Semantic Information Retrieval

    Authors: Samarth Rawal, Chitta Baral

    Abstract: Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. While a combination of traditional keyword- and modern BERT-based approaches have been shown to be effective in recent work, there are often nuances in identifying what information is "relevant" to a parti… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

  47. arXiv:2008.09371  [pdf, other

    cs.CL cs.LG

    Towards Improving Selective Prediction Ability of NLP Systems

    Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral

    Abstract: It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibratin… ▽ More

    Submitted 6 April, 2022; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: ACL 2022 RepL4NLP Workshop

  48. arXiv:2008.03964  [pdf, other

    cs.CL cs.CV cs.LG eess.SY

    DQI: A Guide to Benchmark Evaluation

    Authors: Swaroop Mishra, Anjana Arunkumar, Bhavdeep Sachdeva, Chris Bryan, Chitta Baral

    Abstract: A `state of the art' model A surpasses humans in a benchmark B, but fails on similar benchmarks C, D, and E. What does B have that the other benchmarks do not? Recent research provides the answer: spurious bias. However, developing A to solve benchmarks B through E does not guarantee that it will solve future benchmarks. To progress towards a model that `truly learns' an underlying task, we need t… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: ICML UDL 2020

  49. arXiv:2007.11185  [pdf, other

    hep-ph nucl-th

    Study of strange non-strange hadron ratios in pp and p-Pb collisions at LHC energies

    Authors: Sarita Sahoo, Rama Chandra Baral, Pradip Kumar Sahu, Mina Ketan Parida

    Abstract: It has been observed that the yields of strange and multi-strange hadrons relative to pion increase significantly with the event charged-particle multiplicity. We notice from experimental data that yield ratios between non-strange hadrons, like p/$π$ or hadrons of same strange content, like $Λ$/K$_s^0$, show similar enhancement. We have studied this behavior within the ambit of a parton model (EPO… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: 5 pages, 3 figures

  50. arXiv:2007.06898  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Our Evaluation Metric Needs an Update to Encourage Generalization

    Authors: Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

    Abstract: Models that surpass human performance on several popular benchmarks display significant degradation in performance on exposure to Out of Distribution (OOD) data. Recent research has shown that models overfit to spurious biases and `hack' datasets, in lieu of learning generalizable features like humans. In order to stop the inflation in model performance -- and thus overestimation in AI systems' ca… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Comments: Accepted to ICML UDL 2020