Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Brunk, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2107.04512  [pdf, other

    cs.CL cs.LG

    Using Machine Translation to Localize Task Oriented NLG Output

    Authors: Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano

    Abstract: One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages. This paper explores doing this by applying machine translation to the English output. Using machine translation is very scalable, as it can work with any English output and can handle dynamic text, but otherwise the problem is a poor fit. The… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 12 pages, 10 figures

  2. arXiv:2008.13533  [pdf, other

    cs.CL cs.LG stat.ML

    Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

    Authors: Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

    Abstract: Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality con… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

  3. arXiv:2004.06201  [pdf, ps, other

    cs.CL cs.IR cs.LG

    Reverse Engineering Configurations of Neural Text Generation Models

    Authors: Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

    Abstract: This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models. The study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. Previously, the extent and degree to which these artifacts surface in generated text has not been well studied. In the spirit of better understanding generative te… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  4. arXiv:1512.00103  [pdf, other

    cs.CL

    Multilingual Language Processing From Bytes

    Authors: Dan Gillick, Cliff Brunk, Oriol Vinyals, Amarnag Subramanya

    Abstract: We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to t… ▽ More

    Submitted 2 April, 2016; v1 submitted 30 November, 2015; originally announced December 2015.