Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Chan, J S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.03279  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

    Authors: Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

    Abstract: Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce MACHIAVELLI,… ▽ More

    Submitted 12 June, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: ICML 2023 Oral (camera-ready); 31 pages, 5 figures

  2. arXiv:2303.16755  [pdf, other

    cs.CL cs.AI cs.LG

    Training Language Models with Language Feedback at Scale

    Authors: Jérémy Scheurer, Jon Ander Campos, Tomasz Korbak, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

    Abstract: Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we i… ▽ More

    Submitted 22 February, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Published in TMLR: https://openreview.net/forum?id=xo3hI5MwvU

  3. arXiv:2303.16749  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Improving Code Generation by Training with Natural Language Feedback

    Authors: Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

    Abstract: The potential for pre-trained large language models (LLMs) to use natural language feedback at inference time has been an exciting recent development. We build upon this observation by formalizing an algorithm for learning from natural language feedback at training time instead, which we call Imitation learning from Language Feedback (ILF). ILF requires only a small amount of human-written feedbac… ▽ More

    Submitted 22 February, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Published in (and superceded by) TMLR: https://openreview.net/forum?id=xo3hI5MwvU

  4. arXiv:2210.10039  [pdf, other

    cs.CV cs.CY cs.LG

    How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios

    Authors: Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks

    Abstract: In recent years, deep neural networks have demonstrated increasingly strong abilities to recognize objects and activities in videos. However, as video understanding becomes widely used in real-world applications, a key consideration is developing human-centric systems that understand not only the content of the video but also how it would affect the wellbeing and emotional state of viewers. To fac… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022; datasets available at https://github.com/hendrycks/emodiversity/

  5. arXiv:2208.01009  [pdf, other

    cs.CL cs.AI cs.LG

    Few-shot Adaptation Works with UnpredicTable Data

    Authors: Jun Shern Chan, Michael Pieler, Jonathan Jao, Jérémy Scheurer, Ethan Perez

    Abstract: Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Proce… ▽ More

    Submitted 7 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: Code at https://github.com/JunShern/few-shot-adaptation

  6. arXiv:2204.14146  [pdf, other

    cs.CL cs.AI cs.LG

    Training Language Models with Language Feedback

    Authors: Jérémy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

    Abstract: Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human e… ▽ More

    Submitted 17 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: The First Workshop on Learning with Natural Language Supervision at ACL 2022