Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Sreedhar, M N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15214  [pdf, other

    cs.CL

    Unsupervised Extraction of Dialogue Policies from Conversations

    Authors: Makesh Narsimhan Sreedhar, Traian Rebedea, Christopher Parisien

    Abstract: Dialogue policies play a crucial role in developing task-oriented dialogue systems, yet their development and maintenance are challenging and typically require substantial effort from experts in dialogue modeling. While in many situations, large amounts of conversational data are available for the task at hand, people lack an effective solution able to extract dialogue policies from this data. In… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 6 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.08673  [pdf, ps, other

    cs.CL cs.AI cs.LG

    HelpSteer2: Open-source dataset for training top-performing reward models

    Authors: Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev

    Abstract: High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2404.03820  [pdf, other

    cs.CL

    CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

    Authors: Makesh Narsimhan Sreedhar, Traian Rebedea, Shaona Ghosh, Jiaqi Zeng, Christopher Parisien

    Abstract: Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning. There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversations - a critical aspect for deploying chatbots to production. We introduce the CantTalkAboutThis dataset to help language models remain focused on t… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  5. arXiv:2311.09661  [pdf, other

    cs.CL

    Evolving Domain Adaptation of Pretrained Language Models for Text Classification

    Authors: Yun-Shiuan Chuang, Yi Wu, Dhruv Gupta, Rheeya Uppaal, Ananya Kumar, Luhang Sun, Makesh Narsimhan Sreedhar, Sijia Yang, Timothy T. Rogers, Junjie Hu

    Abstract: Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  6. arXiv:2311.09528  [pdf, other

    cs.CL cs.AI cs.LG

    HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

    Authors: Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev

    Abstract: Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various as… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  7. arXiv:2310.05344  [pdf, other

    cs.CL cs.AI cs.LG

    SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

    Authors: Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev

    Abstract: Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) stages. However, RLHF faces inherent limitations stemming from a complex training setup and its tendency to align the model with implicit values that e… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  8. arXiv:2211.05596  [pdf, other

    cs.CL cs.AI

    Prompt Learning for Domain Adaptation in Task-Oriented Dialogue

    Authors: Makesh Narsimhan Sreedhar, Christopher Parisien

    Abstract: Conversation designers continue to face significant obstacles when creating production quality task-oriented dialogue systems. The complexity and cost involved in schema development and data collection is often a major barrier for such designers, limiting their ability to create natural, user-friendly experiences. We frame the classification of user intent as the generation of a canonical form, a… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: Accepted for publication at SereTOD Workshop - EMNLP 2022

  9. arXiv:2205.11490  [pdf, other

    cs.CL cs.AI

    Local Byte Fusion for Neural Machine Translation

    Authors: Makesh Narsimhan Sreedhar, Xiangpeng Wan, Yu Cheng, Junjie Hu

    Abstract: Subword tokenization schemes are the dominant technique used in current NLP models. However, such schemes can be rigid and tokenizers built on one corpus do not adapt well to other parallel corpora. It has also been observed that in multilingual corpora, subword tokenization schemes over-segment low-resource languages leading to a drop in translation performance. A simple alternative to subword to… ▽ More

    Submitted 28 June, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted at ACL 2023 - Main Conference

  10. arXiv:2010.07261  [pdf, other

    cs.CL cs.AI cs.LG

    Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback

    Authors: Makesh Narsimhan Sreedhar, Kun Ni, Siva Reddy

    Abstract: The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering… ▽ More

    Submitted 14 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at Findings of EMNLP 2020

  11. arXiv:2004.00161  [pdf, other

    cs.CV cs.LG eess.IV

    Towards Lifelong Self-Supervision For Unpaired Image-to-Image Translation

    Authors: Victor Schmidt, Makesh Narsimhan Sreedhar, Mostafa ElAraby, Irina Rish

    Abstract: Unpaired Image-to-Image Translation (I2IT) tasks often suffer from lack of data, a problem which self-supervised learning (SSL) has recently been very popular and successful at tackling. Leveraging auxiliary tasks such as rotation prediction or generative colorization, SSL can produce better and more robust representations in a low data regime. Training such tasks along an I2IT task is however com… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.