Zum Hauptinhalt springen

Showing 1–37 of 37 results for author: Nallapati, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.15778  [pdf, other

    cs.LG cs.CL

    BASS: Batched Attention-optimized Speculative Sampling

    Authors: Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras

    Abstract: Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges.… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  2. arXiv:2403.08845  [pdf, other

    cs.LG cs.AI

    Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs

    Authors: Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang

    Abstract: This study introduces bifurcated attention, a method designed to enhance language model inference in shared-context batch decoding scenarios. Our approach addresses the challenge of redundant memory IO costs, a critical factor contributing to latency in high batch sizes and extended context lengths. Bifurcated attention achieves this by strategically dividing the attention mechanism during increme… ▽ More

    Submitted 11 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2403.08688  [pdf, other

    cs.CL cs.AI

    Token Alignment via Character Matching for Subword Completion

    Authors: Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Rob Kwiatowski, Ramesh Nallapati, Bing Xiang

    Abstract: Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This struggle stems from tokenization, where partial tokens fall out of distribution during inference, leading to incorrect or nonsensical outputs. This paper examines a technique to alleviate the tokenization artifact on text completion in generative models, maintaining per… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  4. arXiv:2402.01935  [pdf, other

    cs.CL

    Code Representation Learning At Scale

    Authors: Dejiao Zhang, Wasi Ahmad, Ming Tan, Hantian Ding, Ramesh Nallapati, Dan Roth, Xiaofei Ma, Bing Xiang

    Abstract: Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-st… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 10 pages

    Journal ref: ICLR 2024

  5. arXiv:2310.11248  [pdf, other

    cs.LG cs.CL cs.SE

    CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

    Authors: Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang

    Abstract: Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing… ▽ More

    Submitted 16 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: To appear at NeurIPS 2023 (Datasets and Benchmarks Track)

  6. arXiv:2307.02435  [pdf, other

    cs.LG cs.CL cs.SE

    Exploring Continual Learning for Code Generation Models

    Authors: Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang

    Abstract: Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance. However, libraries are upgraded or deprecated very frequently and re-training large-scale language models is computationally expensive. Therefore, Continual Learning (CL) is an important aspect that remains underexplored in the code domain. In this paper, we introduce a benchmark called CodeTask-CL th… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: ACL 2023

  7. arXiv:2212.10264  [pdf, other

    cs.LG cs.CL cs.SE

    ReCode: Robustness Evaluation of Code Generation Models

    Authors: Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang

    Abstract: Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in gene… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Code and data available at https://github.com/amazon-science/recode

  8. arXiv:2212.10007  [pdf, other

    cs.CL cs.SE

    CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

    Authors: Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang

    Abstract: While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking… ▽ More

    Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  9. arXiv:2210.14868  [pdf, other

    cs.LG cs.CL

    Multi-lingual Evaluation of Code Generation Models

    Authors: Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

    Abstract: We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the perform… ▽ More

    Submitted 28 March, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Code and data release: https://github.com/amazon-research/mxeval

  10. arXiv:2210.01185  [pdf, other

    cs.CL

    ContraCLM: Contrastive Learning For Causal Language Model

    Authors: Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan, Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Xiaofei Ma, Bing Xiang

    Abstract: Despite exciting progress in causal language models, the expressiveness of the representations is largely limited due to poor discrimination ability. To remedy this issue, we present ContraCLM, a novel contrastive learning framework at both token-level and sequence-level. We assess ContraCLM on a variety of downstream tasks. We show that ContraCLM enhances discrimination of the representations and… ▽ More

    Submitted 2 May, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: 10 pages

    Journal ref: ACL 2023

  11. arXiv:2209.14415  [pdf, other

    cs.CL

    Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

    Authors: Jun Wang, Patrick Ng, Alexander Hanbo Li, Jiarong Jiang, Zhiguo Wang, Ramesh Nallapati, Bing Xiang, Sudipta Sengupta

    Abstract: Most recent research on Text-to-SQL semantic parsing relies on either parser itself or simple heuristic based approach to understand natural language query (NLQ). When synthesizing a SQL query, there is no explicit semantic information of NLQ available to the parser which leads to undesirable generalization performance. In addition, without lexical-level fine-grained query understanding, linking b… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: EMNLP Industry Track 2022

  12. arXiv:2205.02170  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Few-Shot Fine-Tuning for Opinion Summarization

    Authors: Arthur Bražinskas, Ramesh Nallapati, Mohit Bansal, Markus Dreyer

    Abstract: Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples. However, in opinion summarization, large annotated datasets of reviews paired with reference summaries are not available and would be expensive to create. This calls for fine-tuning methods robust to overfitting on small datasets. In a… ▽ More

    Submitted 8 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: NAACL Findings 2022

  13. arXiv:2203.11239  [pdf, other

    cs.CL

    DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

    Authors: Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew Arnold, Bing Xiang, Dan Roth

    Abstract: Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-pre… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  14. arXiv:2109.05424  [pdf, other

    cs.CL cs.LG

    Pairwise Supervised Contrastive Learning of Sentence Representations

    Authors: Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

    Abstract: Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss. Nevertheless, they share a common weakness: sentences in a contradiction pair are not necessarily from different semantic categories. Therefore, optimizing the semantic entailment and contradiction reasoning objective… ▽ More

    Submitted 29 January, 2022; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: 9 pages, EMNLP 2021

  15. arXiv:2105.04623  [pdf, other

    cs.CL cs.AI

    Improving Factual Consistency of Abstractive Summarization via Question Answering

    Authors: Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, Bing Xiang

    Abstract: A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summari… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: ACL-IJCNLP 2021

  16. arXiv:2104.09500  [pdf, other

    cs.CL cs.AI cs.LG

    Transductive Learning for Abstractive News Summarization

    Authors: Arthur Bražinskas, Mengwen Liu, Ramesh Nallapati, Sujith Ravi, Markus Dreyer

    Abstract: Pre-trained and fine-tuned news summarizers are expected to generalize to news articles unseen in the fine-tuning (training) phase. However, these articles often contain specifics, such as new events and people, a summarizer could not learn about in training. This applies to scenarios such as a news publisher training a summarizer on dated news and summarizing incoming recent news. In this work, w… ▽ More

    Submitted 16 April, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

  17. arXiv:2103.12953  [pdf, other

    cs.LG cs.CL

    Supporting Clustering with Contrastive Learning

    Authors: Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew Arnold, Bing Xiang

    Abstract: Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To t… ▽ More

    Submitted 28 May, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: NAACL 2021

  18. arXiv:2102.09130  [pdf, other

    cs.CL cs.AI

    Entity-level Factual Consistency of Abstractive Text Summarization

    Authors: Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, Bing Xiang

    Abstract: A key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document. For example, state-of-the-art models trained on existing datasets exhibit entity hallucination, generating names of entities that are not present in the source document. We propose a set of new metrics to quantify the entity-level factual consistency of gene… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: EACL 2021

  19. arXiv:2011.13137  [pdf, other

    cs.CL cs.AI cs.LG

    Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

    Authors: Yifan Gao, Henghui Zhu, Patrick Ng, Cicero Nogueira dos Santos, Zhiguo Wang, Feng Nan, Dejiao Zhang, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

    Abstract: In open-domain question answering, questions are highly likely to be ambiguous because users may not know the scope of relevant topics when formulating them. Therefore, a system needs to find possible interpretations of the question, and predict one or multiple plausible answers. When multiple plausible answers are found, the system should rewrite the question for each answer to resolve the ambigu… ▽ More

    Submitted 30 May, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

    Comments: ACL 2021 main conference, 14 pages, 7 figures. Code will be released at https://github.com/amzn/refuel-open-domain-qa

  20. arXiv:2010.06028  [pdf, other

    cs.CL

    End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

    Authors: Siamak Shakeri, Cicero Nogueira dos Santos, Henry Zhu, Patrick Ng, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

    Abstract: We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  21. arXiv:2010.03073  [pdf, other

    cs.CL cs.IR

    Beyond [CLS] through Ranking by Generation

    Authors: Cicero Nogueira dos Santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, Bing Xiang

    Abstract: Generative models for Information Retrieval, where ranking of documents is viewed as the task of generating a query from a document's language model, were very successful in various IR tasks in the past. However, with the advent of modern deep neural networks, attention has shifted to discriminative ranking functions that model the semantic similarity of documents and queries instead. Recently, de… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  22. arXiv:2009.10270  [pdf, other

    cs.IR

    Embedding-based Zero-shot Retrieval through Query Generation

    Authors: Davis Liang, Peng Xu, Siamak Shakeri, Cicero Nogueira dos Santos, Ramesh Nallapati, Zhiheng Huang, Bing Xiang

    Abstract: Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However, term-based matching algorithms often miss relevant passages that have no lexical overlap with the query and cannot be finetuned to downstream datasets. In this… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  23. arXiv:2007.09186  [pdf, other

    cs.IR

    AWS CORD-19 Search: A Neural Search Engine for COVID-19 Literature

    Authors: Parminder Bhatia, Lan Liu, Kristjan Arumae, Nima Pourdamghani, Suyog Deshpande, Ben Snively, Mona Mona, Colby Wise, George Price, Shyam Ramaswamy, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, Bing Xiang, Taha Kass-Hout

    Abstract: Coronavirus disease (COVID-19) has been declared as a pandemic by WHO with thousands of cases being reported each day. Numerous scientific articles are being published on the disease raising the need for a service which can organize, and query them in a reliable fashion. To support this cause we present AWS CORD-19 Search (ACS), a public, COVID-19 specific, neural search engine that is powered by… ▽ More

    Submitted 7 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

  24. arXiv:2004.11892  [pdf, other

    cs.CL

    Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

    Authors: Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

    Abstract: Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to traini… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  25. arXiv:1911.10666  [pdf, other

    cs.CL

    Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer

    Authors: Henghui Zhu, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

    Abstract: Conversation structure is useful for both understanding the nature of conversation dynamics and for providing features for many downstream applications such as summarization of conversations. In this work, we define the problem of conversation structure modeling as identifying the parent utterance(s) to which each utterance in the conversation responds to. Previous work usually took a pair of utte… ▽ More

    Submitted 24 November, 2019; originally announced November 2019.

    Comments: AAAI 2020

  26. arXiv:1910.07973  [pdf, other

    cs.CL cs.LG

    Universal Text Representation from BERT: An Empirical Study

    Authors: Xiaofei Ma, Zhiguo Wang, Patrick Ng, Ramesh Nallapati, Bing Xiang

    Abstract: We present a systematic investigation of layer-wise BERT activations for general-purpose text representations to understand what linguistic information they capture and how transferable they are across different tasks. Sentence-level embeddings are evaluated against two state-of-the-art models on downstream and probing tasks from SentEval, while passage-level embeddings are evaluated on four quest… ▽ More

    Submitted 23 October, 2019; v1 submitted 17 October, 2019; originally announced October 2019.

  27. arXiv:1909.07746  [pdf, other

    cs.LG cs.CL cs.IR

    Multi Sense Embeddings from Topic Models

    Authors: Shobhit Jain, Sravan Babu Bodapati, Ramesh Nallapati, Anima Anandkumar

    Abstract: Distributed word embeddings have yielded state-of-the-art performance in many NLP tasks, mainly due to their success in capturing useful semantic information. These representations assign only a single vector to each word whereas a large number of words are polysemous (i.e., have multiple meanings). In this work, we approach this critical problem in lexical semantics, namely that of representing v… ▽ More

    Submitted 3 February, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

    Comments: Accepted at ACL supported conference for Natural Language & Speech Processing. https://www.aclweb.org/anthology/W19-74, Year: 2019

  28. arXiv:1908.08167  [pdf, other

    cs.CL cs.AI

    Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering

    Authors: Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, Bing Xiang

    Abstract: BERT model has been successfully applied to open-domain QA tasks. However, previous work trains BERT by viewing passages corresponding to the same question as independent training instances, which may cause incomparable scores for answers from different passages. To tackle this issue, we propose a multi-passage BERT model to globally normalize answer scores across all passages of the same question… ▽ More

    Submitted 1 October, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: To appear in EMNLP 2019

  29. arXiv:1907.12374  [pdf, other

    cs.IR cs.AI cs.LG

    Topic Modeling with Wasserstein Autoencoders

    Authors: Feng Nan, Ran Ding, Ramesh Nallapati, Bing Xiang

    Abstract: We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs… ▽ More

    Submitted 6 December, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

    Comments: In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 6345-6381)

  30. arXiv:1905.05910  [pdf, other

    cs.IR cs.CL

    Passage Ranking with Weak Supervision

    Authors: Peng Xu, Xiaofei Ma, Ramesh Nallapati, Bing Xiang

    Abstract: In this paper, we propose a \textit{weak supervision} framework for neural ranking tasks based on the data programming paradigm \citep{Ratner2016}, which enables us to leverage multiple weak supervision signals from different sources. Empirically, we consider two sources of weak supervision signals, unsupervised ranking functions and semantic feature similarities. We train a BERT-based passage-ran… ▽ More

    Submitted 4 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: 6 pages, 1 figure

    Journal ref: ICLR 2019 LLD workshop

  31. arXiv:1903.08550  [pdf, other

    cs.CV cs.LG

    OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations

    Authors: Pramuditha Perera, Ramesh Nallapati, Bing Xiang

    Abstract: We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class. Our solution is based on learning latent representations of in-class examples using a denoising auto-encoder network. The key contribution of our work is our proposal to explicitly… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Comments: CVPR 2019 Accepted Paper

  32. arXiv:1809.02687  [pdf, other

    cs.CL cs.LG

    Coherence-Aware Neural Topic Modeling

    Authors: Ran Ding, Ramesh Nallapati, Bing Xiang

    Abstract: Topic models are evaluated based on their ability to describe documents well (i.e. low perplexity) and to produce topics that carry coherent semantic meaning. In topic modeling so far, perplexity is a direct optimization target. However, topic coherence, owing to its challenging computation, is not optimized for and is only evaluated after training. In this work, under a neural variational inferen… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

    Comments: Accepted at EMNLP 2018

  33. arXiv:1708.00308  [pdf, other

    cs.CL cs.LG stat.ML

    SenGen: Sentence Generating Neural Variational Topic Model

    Authors: Ramesh Nallapati, Igor Melnyk, Abhishek Kumar, Bowen Zhou

    Abstract: We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable t… ▽ More

    Submitted 1 August, 2017; originally announced August 2017.

  34. arXiv:1611.04244  [pdf, other

    cs.CL

    Classify or Select: Neural Architectures for Extractive Document Summarization

    Authors: Ramesh Nallapati, Bowen Zhou, Mingbo Ma

    Abstract: We present two novel and contrasting Recurrent Neural Network (RNN) based architectures for extractive summarization of documents. The Classifier based architecture sequentially accepts or rejects each sentence in the original document order for its membership in the final summary. The Selector architecture, on the other hand, is free to pick one sentence at a time in any arbitrary order to piece… ▽ More

    Submitted 13 November, 2016; originally announced November 2016.

    Comments: arXiv admin note: text overlap with arXiv:1611.04230

  35. arXiv:1611.04230  [pdf, other

    cs.CL

    SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents

    Authors: Ramesh Nallapati, Feifei Zhai, Bowen Zhou

    Abstract: We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novel… ▽ More

    Submitted 13 November, 2016; originally announced November 2016.

    Comments: Published at AAAI 2017, The Thirty-First AAAI Conference on Artificial Intelligence (AAAI-2017)

  36. arXiv:1603.08148  [pdf, other

    cs.CL cs.LG cs.NE

    Pointing the Unknown Words

    Authors: Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, Yoshua Bengio

    Abstract: The problem of rare and unknown words is an important issue that can potentially influence the performance of many NLP systems, including both the traditional count-based and the deep learning models. We propose a novel way to deal with the rare and unseen words for the neural network models using attention. Our model uses two softmax layers in order to predict the next word in conditional languag… ▽ More

    Submitted 21 August, 2016; v1 submitted 26 March, 2016; originally announced March 2016.

    Comments: ACL 2016 Oral Paper

  37. arXiv:1602.06023  [pdf, other

    cs.CL

    Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

    Authors: Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang

    Abstract: In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-… ▽ More

    Submitted 26 August, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

    Journal ref: The SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2016