Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Chakravarti, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2104.08303  [pdf, other

    cs.AI cs.CL

    Capturing Row and Column Semantics in Transformer Based Question Answering over Tables

    Authors: Michael Glass, Mustafa Canim, Alfio Gliozzo, Saneem Chemmengath, Vishwajeet Kumar, Rishav Chakravarti, Avi Sil, Feifei Pan, Samarth Bharadwaj, Nicolas Rodolfo Fauceglia

    Abstract: Transformer based architectures are recently used for the task of answering questions over tables. In order to improve the accuracy on this task, specialized pre-training techniques have been developed and applied on millions of open-domain web tables. In this paper, we propose two novel approaches demonstrating that one can achieve superior performance on table QA task without even using any of t… ▽ More

    Submitted 26 April, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: To appear at NAACL 2021

  2. arXiv:2101.07942  [pdf, other

    cs.CL

    Towards Confident Machine Reading Comprehension

    Authors: Rishav Chakravarti, Avirup Sil

    Abstract: There has been considerable progress on academic benchmarks for the Reading Comprehension (RC) task with State-of-the-Art models closing the gap with human performance on extractive question answering. Datasets such as SQuAD 2.0 & NQ have also introduced an auxiliary task requiring models to predict when a question has no answer in the text. However, in production settings, it is also necessary to… ▽ More

    Submitted 23 February, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

  3. arXiv:1911.02984  [pdf, other

    cs.CL cs.IR

    The TechQA Dataset

    Authors: Vittorio Castelli, Rishav Chakravarti, Saswati Dana, Anthony Ferritto, Radu Florian, Martin Franz, Dinesh Garg, Dinesh Khandelwal, Scott McCarley, Mike McCawley, Mohamed Nasr, Lin Pan, Cezar Pendus, John Pitrelli, Saurabh Pujar, Salim Roukos, Andrzej Sakrajda, Avirup Sil, Rosario Uceda-Sosa, Todd Ward, Rong Zhang

    Abstract: We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain. The TechQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size -- 600 training, 310 de… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: Long version of conference paper to be submitted

  4. arXiv:1911.00337  [pdf, other

    cs.CL

    Ensembling Strategies for Answering Natural Questions

    Authors: Anthony Ferritto, Lin Pan, Rishav Chakravarti, Salim Roukos, Radu Florian, J. William Murdock, Avirup Sil

    Abstract: Many of the top question answering systems today utilize ensembling to improve their performance on tasks such as the Stanford Question Answering Dataset (SQuAD) and Natural Questions (NQ) challenges. Unfortunately most of these systems do not publish their ensembling strategies used in their leaderboard submissions. In this work, we investigate a number of ensembling techniques and demonstrate a… ▽ More

    Submitted 6 November, 2019; v1 submitted 30 October, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1909.05286

  5. arXiv:1910.06360  [pdf, other

    cs.CL cs.LG

    Structured Pruning of a BERT-based Question Answering Model

    Authors: J. S. McCarley, Rishav Chakravarti, Avirup Sil

    Abstract: The recent trend in industry-setting Natural Language Processing (NLP) research has been to operate large %scale pretrained language models like BERT under strict computational limits. While most model compression work has focused on "distilling" a general-purpose language representation using expensive pretraining distillation, less attention has been paid to creating smaller task-specific langua… ▽ More

    Submitted 11 April, 2021; v1 submitted 14 October, 2019; originally announced October 2019.

  6. arXiv:1909.05286  [pdf, other

    cs.CL

    Frustratingly Easy Natural Question Answering

    Authors: Lin Pan, Rishav Chakravarti, Anthony Ferritto, Michael Glass, Alfio Gliozzo, Salim Roukos, Radu Florian, Avirup Sil

    Abstract: Existing literature on Question Answering (QA) mostly focuses on algorithmic novelty, data augmentation, or increasingly large pre-trained language models like XLNet and RoBERTa. Additionally, a lot of systems on the QA leaderboards do not have associated research documentation in order to successfully replicate their experiments. In this paper, we outline these algorithmic components such as Atte… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

  7. arXiv:1909.04120  [pdf, other

    cs.CL cs.AI cs.LG

    Span Selection Pre-training for Question Answering

    Authors: Michael Glass, Alfio Gliozzo, Rishav Chakravarti, Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avirup Sil

    Abstract: BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better… ▽ More

    Submitted 18 June, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted at ACL2020

  8. CFO: A Framework for Building Production NLP Systems

    Authors: Rishav Chakravarti, Cezar Pendus, Andrzej Sakrajda, Anthony Ferritto, Lin Pan, Michael Glass, Vittorio Castelli, J. William Murdock, Radu Florian, Salim Roukos, Avirup Sil

    Abstract: This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Readi… ▽ More

    Submitted 19 June, 2020; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: http://ibm.biz/cfo_framework

    Report number: D19-3006

    Journal ref: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

  9. arXiv:1804.08057  [pdf, ps, other

    cs.CL cs.IR

    A Study on Passage Re-ranking in Embedding based Unsupervised Semantic Search

    Authors: Md Faisal Mahbub Chowdhury, Vijil Chenthamarakshan, Rishav Chakravarti, Alfio M. Gliozzo

    Abstract: State of the art approaches for (embedding based) unsupervised semantic search exploits either compositional similarity (of a query and a passage) or pair-wise word (or term) similarity (from the query and the passage). By design, word based approaches do not incorporate similarity in the larger context (query/passage), while compositional similarity based approaches are usually unable to take adv… ▽ More

    Submitted 13 March, 2019; v1 submitted 21 April, 2018; originally announced April 2018.

    Comments: Fixed latex compiling issues

  10. arXiv:1708.04326  [pdf, ps, other

    cs.IR

    Improved Answer Selection with Pre-Trained Word Embeddings

    Authors: Rishav Chakravarti, Jiri Navratil, Cicero Nogueira dos Santos

    Abstract: This paper evaluates existing and newly proposed answer selection methods based on pre-trained word embeddings. Word embeddings are highly effective in various natural language processing tasks and their integration into traditional information retrieval (IR) systems allows for the capture of semantic relatedness between questions and answers. Empirical results on three publicly available data set… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.