Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Sachan, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01875  [pdf, other

    cs.CL

    Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval

    Authors: Yanfei Chen, Jinsung Yoon, Devendra Singh Sachan, Qingze Wang, Vincent Cohen-Addad, Mohammadhossein Bateni, Chen-Yu Lee, Tomas Pfister

    Abstract: Recent advances in large language models (LLMs) have enabled autonomous agents with complex reasoning and task-fulfillment capabilities using a wide range of tools. However, effectively identifying the most relevant tools for a given task becomes a key bottleneck as the toolset size grows, hindering reliable tool utilization. To address this, we introduce Re-Invoke, an unsupervised tool retrieval… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  2. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  3. arXiv:2206.10658  [pdf, other

    cs.CL cs.IR

    Questions Are All You Need to Train a Dense Passage Retriever

    Authors: Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer

    Abstract: We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires a… ▽ More

    Submitted 2 April, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to TACL, pre MIT Press publication version

  4. arXiv:2205.07870  [pdf, other

    cs.LG cs.AI

    Unsupervised Driving Behavior Analysis using Representation Learning and Exploiting Group-based Training

    Authors: Soma Bandyopadhyay, Anish Datta, Shruti Sachan, Arpan Pal

    Abstract: Driving behavior monitoring plays a crucial role in managing road safety and decreasing the risk of traffic accidents. Driving behavior is affected by multiple factors like vehicle characteristics, types of roads, traffic, but, most importantly, the pattern of driving of individuals. Current work performs a robust driving pattern analysis by capturing variations in driving patterns. It forms consi… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 7 figures, 8 pages , 7 tables, accepted and presented conference AAAI 2022 AI for Transportation Workshop (Prefinal version)

  5. arXiv:2204.07496  [pdf, other

    cs.CL cs.IR

    Improving Passage Retrieval with Zero-Shot Question Generation

    Authors: Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

    Abstract: We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or… ▽ More

    Submitted 2 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: EMNLP 2022 camera-ready version. Code is available at: https://github.com/DevSinghSachan/unsupervised-passage-reranking

  6. arXiv:2106.05346  [pdf, other

    cs.CL cs.AI cs.IR

    End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

    Authors: Devendra Singh Sachan, Siva Reddy, William Hamilton, Chris Dyer, Dani Yogatama

    Abstract: We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectat… ▽ More

    Submitted 4 December, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready version

  7. arXiv:2101.00408  [pdf, other

    cs.CL cs.AI

    End-to-End Training of Neural Retrievers for Open-Domain Question Answering

    Authors: Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro

    Abstract: Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse… ▽ More

    Submitted 1 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021

  8. arXiv:2010.11374  [pdf, other

    cs.CL cs.LG

    Stronger Transformers for Neural Multi-Hop Question Generation

    Authors: Devendra Singh Sachan, Lingfei Wu, Mrinmaya Sachan, William Hamilton

    Abstract: Prior work on automated question generation has almost exclusively focused on generating simple questions whose answers can be extracted from a single document. However, there is an increasing interest in developing systems that are capable of more complex multi-hop question generation, where answering the questions requires reasoning over multiple documents. In this work, we introduce a series of… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Code will be made available

  9. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

    Authors: Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov

    Abstract: In this paper, we study bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. Several prior works have suggested that either complex pretraining schemes using unsupervised methods such as language modeling (Dai and Le 2015; Miyato, Dai, and Goodfellow 2016) or complicated models (Johnson and Zhang 2017) are necessary to achieve a high… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: Published at AAAI 2019

  10. arXiv:2008.09084  [pdf, other

    cs.CL

    Do Syntax Trees Help Pre-trained Transformers Extract Information?

    Authors: Devendra Singh Sachan, Yuhao Zhang, Peng Qi, William Hamilton

    Abstract: Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g., BERT) remains unclear, especially given recent studies highlighting how these models implicitly encode syntax. In this work, we systematically study the utility… ▽ More

    Submitted 26 January, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: EACL 2021. Code available at: https://github.com/DevSinghSachan/syntax-augmented-bert

  11. arXiv:1809.00794  [pdf, other

    cs.CL cs.AI cs.LG

    Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation

    Authors: Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wangrong Zhu, Devendra Singh Sachan, Eric P. Xing

    Abstract: We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks that transform any inputs into natural language, such as machine translation, summarization, dialog, content manipulation, and so forth. With the design goals of modularity, versatility, and extensibility in mind, Texar extracts common patterns underlying the diverse tasks and methodologies, creates… ▽ More

    Submitted 3 July, 2019; v1 submitted 4 September, 2018; originally announced September 2018.

    Comments: ACL 2019 demo, expanded version

  12. arXiv:1809.00252  [pdf, other

    cs.CL cs.LG

    Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

    Authors: Devendra Singh Sachan, Graham Neubig

    Abstract: In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able… ▽ More

    Submitted 13 September, 2018; v1 submitted 1 September, 2018; originally announced September 2018.

    Comments: Third Conference on Machine Translation (WMT 2018)

  13. arXiv:1804.06323  [pdf, other

    cs.CL

    When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?

    Authors: Ye Qi, Devendra Singh Sachan, Matthieu Felix, Sarguna Janani Padmanabhan, Graham Neubig

    Abstract: The performance of Neural Machine Translation (NMT) systems often suffers in low-resource scenarios where sufficiently large-scale parallel corpora cannot be obtained. Pre-trained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In thi… ▽ More

    Submitted 18 April, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: NAACL 2018

  14. arXiv:1803.00188  [pdf, ps, other

    cs.CL

    XNMT: The eXtensible Neural Machine Translation Toolkit

    Authors: Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, Liming Wang

    Abstract: This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of m… ▽ More

    Submitted 28 February, 2018; originally announced March 2018.

    Comments: To be presented at AMTA 2018 Open Source Software Showcase

  15. arXiv:1801.06261  [pdf, other

    cs.CL

    Investigating the Working of Text Classifiers

    Authors: Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov

    Abstract: Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively utilize the constituent expressions. Almost all of the reported work train large networks using discriminative approaches, which come with a caveat of no prope… ▽ More

    Submitted 5 August, 2018; v1 submitted 18 January, 2018; originally announced January 2018.

    Comments: Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics: Technical Papers (COLING 2018), NIPS 2017 Workshop on Deep Learning: Bridging Theory and Practice

  16. arXiv:1711.07908  [pdf, other

    cs.CL

    Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition

    Authors: Devendra Singh Sachan, Pengtao Xie, Mrinmaya Sachan, Eric P Xing

    Abstract: Biomedical named entity recognition (NER) is a fundamental task in text mining of medical documents and has many applications. Deep learning based approaches to this task have been gaining increasing attention in recent years as their parameters can be learned end-to-end without the need for hand-engineered features. However, these approaches rely on high-quality labeled data, which is expensive t… ▽ More

    Submitted 14 August, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

    Comments: Machine Learning for Healthcare (MLHC) 2018, Comments: 12 pages, updated authors affiliations

  17. arXiv:1508.00189  [pdf, other

    cs.CL cs.IR

    Class Vectors: Embedding representation of Document Classes

    Authors: Devendra Singh Sachan, Shailesh Kumar

    Abstract: Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we propose "Class Vectors" - a framework for learning a vector per class in the same embedding space as the word and paragraph embeddings. Similarity between these class… ▽ More

    Submitted 2 August, 2015; originally announced August 2015.