Zum Hauptinhalt springen

Showing 1–26 of 26 results for author: Dinan, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2304.02034  [pdf, other

    cs.LG cs.CL hep-th stat.ML

    Effective Theory of Transformers at Initialization

    Authors: Emily Dinan, Sho Yaida, Susan Zhang

    Abstract: We perform an effective-theory analysis of forward-backward signal propagation in wide and deep Transformers, i.e., residual neural networks with multi-head self-attention blocks and multilayer perceptron blocks. This analysis suggests particular width scalings of initialization and training hyperparameters for these models. We then take up such suggestions, training Vision and Language Transforme… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 64 pages, 5 figures

  3. arXiv:2212.08195  [pdf, other

    cs.CL

    Improving Chess Commentaries by Combining Language Models with Symbolic Reasoning Engines

    Authors: Andrew Lee, David Wu, Emily Dinan, Mike Lewis

    Abstract: Despite many recent advancements in language modeling, state-of-the-art language models lack grounding in the real world and struggle with tasks involving complex reasoning. Meanwhile, advances in the symbolic reasoning capabilities of AI have led to systems that outperform humans in games like chess and Go (Silver et al., 2018). Chess commentary provides an interesting domain for bridging these t… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  4. arXiv:2211.12615  [pdf, other

    cs.CL cs.AI

    AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

    Authors: Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis

    Abstract: Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  5. arXiv:2210.15893  [pdf, other

    cs.CL cs.AI

    When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

    Authors: Weiyan Shi, Emily Dinan, Kurt Shuster, Jason Weston, Jing Xu

    Abstract: Deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. However, humans may not always provide explicit signals when the chatbot makes mistakes during interactions. In this work, we propose Juicer, a framework to make use of both binary and free-form textual human feedback. It works by: (i) extending sparse binary feedback by training a satisfact… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  6. arXiv:2107.03451  [pdf, other

    cs.CL cs.AI

    Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

    Authors: Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

    Abstract: Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans. However, these models are often trained on large datasets from the internet, and as a result, may learn undesirable behaviors from this data, such as toxic or otherwise harmful language. Researchers must thus wrestle with the issue of how and whe… ▽ More

    Submitted 23 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

  7. arXiv:2012.14983  [pdf, other

    cs.CL cs.AI cs.LG

    Reducing conversational agents' overconfidence through linguistic calibration

    Authors: Sabrina J. Mielke, Arthur Szlam, Emily Dinan, Y-Lan Boureau

    Abstract: While improving neural dialogue agents' factual accuracy is the object of much research, another important aspect of communication, less studied in the setting of neural dialogue, is transparency about ignorance. In this work, we analyze to what extent state-of-the-art chit-chat models are linguistically calibrated in the sense that their verbalized expression of doubt (or confidence) matches the… ▽ More

    Submitted 26 June, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Accepted in TACL, to be presented at NAACL 2022

  8. arXiv:2010.07079  [pdf, other

    cs.CL cs.AI

    Recipes for Safety in Open-domain Chatbots

    Authors: Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, Emily Dinan

    Abstract: Models trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior and unwanted biases. We investigate a variety of methods to mitigate these issues in the context of open-domain generative dialogue models. We introduce a new human-and-model-in-the-loop framework for both training safer models and for… ▽ More

    Submitted 4 August, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

  9. arXiv:2009.10855  [pdf, other

    cs.CL

    Controlling Style in Generated Dialogue

    Authors: Eric Michael Smith, Diana Gonzalez-Rico, Emily Dinan, Y-Lan Boureau

    Abstract: Open-domain conversation models have become good at generating natural-sounding dialogue, using very large architectures with billions of trainable parameters. The vast training data required to train these architectures aggregates many different styles, tones, and qualities. Using that data to train a single model makes it difficult to use the model as a consistent conversational agent, e.g. with… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

  10. arXiv:2008.08076  [pdf, other

    cs.AI cs.CL

    Deploying Lifelong Open-Domain Dialogue Learning

    Authors: Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston

    Abstract: Much of NLP research has focused on crowdsourced static datasets and the supervised learning paradigm of training once and then evaluating test performance. As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (S… ▽ More

    Submitted 19 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  11. arXiv:2006.12442  [pdf, other

    cs.CL cs.AI

    Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

    Authors: Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

    Abstract: We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of cont… ▽ More

    Submitted 13 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

  12. arXiv:2005.00614  [pdf, other

    cs.CL

    Multi-Dimensional Gender Bias Classification

    Authors: Emily Dinan, Angela Fan, Ledell Wu, Jason Weston, Douwe Kiela, Adina Williams

    Abstract: Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to,… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  13. arXiv:2004.13637  [pdf, other

    cs.CL cs.AI

    Recipes for building an open-domain chatbot

    Authors: Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

    Abstract: Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a… ▽ More

    Submitted 30 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  14. arXiv:2002.02878  [pdf, other

    cs.AI cs.CL stat.ML

    I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

    Authors: Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam

    Abstract: Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the div… ▽ More

    Submitted 10 February, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  15. arXiv:1911.09194  [pdf, other

    cs.AI cs.CL cs.LG

    Generating Interactive Worlds with Text

    Authors: Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston

    Abstract: Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT. We introdu… ▽ More

    Submitted 4 December, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

  16. arXiv:1911.03914  [pdf, ps, other

    cs.CL

    Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles

    Authors: Eric Michael Smith, Diana Gonzalez-Rico, Emily Dinan, Y-Lan Boureau

    Abstract: Text style transfer is usually performed using attributes that can take a handful of discrete values (e.g., positive to negative reviews). In this work, we introduce an architecture that can leverage pre-trained consistent continuous distributed style representations and use them to transfer to an attribute unseen during training, without requiring any re-tuning of the style transfer model. We dem… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

  17. arXiv:1911.03842  [pdf, other

    cs.CL

    Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation

    Authors: Emily Dinan, Angela Fan, Adina Williams, Jack Urbanek, Douwe Kiela, Jason Weston

    Abstract: Models often easily learn biases present in the training data, and their predictions directly reflect this bias. We analyze gender bias in dialogue data, and examine how this bias is actually amplified in subsequent generative chit-chat dialogue models. We measure gender bias in six existing dialogue datasets, and focus on the most biased one, the multi-player text-based fantasy adventure dataset… ▽ More

    Submitted 16 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

  18. arXiv:1911.03768  [pdf, other

    cs.CL cs.AI

    The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

    Authors: Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston

    Abstract: We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producin… ▽ More

    Submitted 28 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Comments: ACL 2020

  19. arXiv:1910.14599  [pdf, other

    cs.CL cs.LG

    Adversarial NLI: A New Benchmark for Natural Language Understanding

    Authors: Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, Douwe Kiela

    Abstract: We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-the-art mode… ▽ More

    Submitted 6 May, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: ACL 2020

  20. arXiv:1908.06083  [pdf, other

    cs.CL

    Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

    Authors: Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston

    Abstract: The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galán-García et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of hum… ▽ More

    Submitted 17 August, 2019; originally announced August 2019.

  21. arXiv:1908.04319  [pdf, other

    cs.LG cs.CL stat.ML

    Neural Text Generation with Unlikelihood Training

    Authors: Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

    Abstract: Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some post-hoc fixes have been proposed, in particular top-$k$ and nucleus sampling, they do not address the fact that the token-level probabilities predicted by the mode… ▽ More

    Submitted 26 September, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Sean Welleck and Ilia Kulikov contributed equally

  22. arXiv:1903.03094  [pdf, other

    cs.CL cs.AI

    Learning to Speak and Act in a Fantasy Text Adventure Game

    Authors: Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

    Abstract: We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to usin… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  23. arXiv:1902.00098  [pdf, other

    cs.AI cs.CL cs.HC

    The Second Conversational Intelligence Challenge (ConvAI2)

    Authors: Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston

    Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics lik… ▽ More

    Submitted 31 January, 2019; originally announced February 2019.

  24. arXiv:1811.01241  [pdf, other

    cs.CL

    Wizard of Wikipedia: Knowledge-Powered Conversational agents

    Authors: Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

    Abstract: In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically "generate and hope" generic utterances that can be memorized in the weights of the model when mapping from input utterance(s) to output, rather than employing recalled knowledge as context. Use of kno… ▽ More

    Submitted 21 February, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

  25. arXiv:1808.04776  [pdf, ps, other

    cs.CL

    Retrieve and Refine: Improved Sequence Generation Models For Dialogue

    Authors: Jason Weston, Emily Dinan, Alexander H. Miller

    Abstract: Sequence generation models for dialogue are known to have several problems: they tend to produce short, generic sentences that are uninformative and unengaging. Retrieval models on the other hand can surface interesting responses, but are restricted to the given retrieval set leading to erroneous replies that cannot be tuned to the specific context. In this work we develop a model that combines th… ▽ More

    Submitted 6 September, 2018; v1 submitted 14 August, 2018; originally announced August 2018.

  26. arXiv:1801.07243  [pdf, ps, other

    cs.AI cs.CL

    Personalizing Dialogue Agents: I have a dog, do you have pets too?

    Authors: Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

    Abstract: Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking… ▽ More

    Submitted 25 September, 2018; v1 submitted 22 January, 2018; originally announced January 2018.