Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Metcalf, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12824  [pdf, other

    cs.CL cs.AI

    Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models

    Authors: Xavier Suau, Pieter Delobelle, Katherine Metcalf, Armand Joulin, Nicholas Apostoloff, Luca Zappella, Pau Rodríguez

    Abstract: An important issue with Large Language Models (LLMs) is their undesired ability to generate toxic language. In this work, we show that the neurons responsible for toxicity can be determined by their power to discriminate toxic sentences, and that toxic language can be mitigated by reducing their activation levels proportionally to this power. We propose AUROC adaptation (AurA), an intervention tha… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ICML 2024, 8 pages + appendix

  2. arXiv:2404.08828  [pdf, other

    cs.LG cs.AI cs.HC

    Hindsight PRIORs for Reward Learning from Human Preferences

    Authors: Mudit Verma, Katherine Metcalf

    Abstract: Preference based Reinforcement Learning (PbRL) removes the need to hand specify a reward function by learning a reward from preference feedback over policy behaviors. Current approaches to PbRL do not address the credit assignment problem inherent in determining which parts of a behavior most contributed to a preference, which result in data intensive approaches and subpar reward functions. We add… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: International Conference on Learning Representations, 2024

  3. arXiv:2402.17975  [pdf, other

    cs.AI cs.LG

    Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

    Authors: Katherine Metcalf, Miguel Sarabia, Natalie Mackraz, Barry-John Theobald

    Abstract: Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) learning a dynamics-aware state-action representation (z^{sa}) via a self-supervi… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: CoRL 2023. arXiv admin note: substantial text overlap with arXiv:2211.06527

  4. arXiv:2310.17722  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models as Generalizable Policies for Embodied Tasks

    Authors: Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev

    Abstract: We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  5. arXiv:2211.06527  [pdf, other

    cs.LG cs.AI cs.HC

    Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

    Authors: Katherine Metcalf, Miguel Sarabia, Barry-John Theobald

    Abstract: Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required from the human, even for relatively simple tasks. In this work, we demonstrate that encoding environment dynamics in the reward function (REED) dramatically red… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  6. arXiv:2210.09151  [pdf, other

    cs.LG cs.AI cs.CL

    Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

    Authors: Mudit Verma, Katherine Metcalf

    Abstract: Specifying rewards for reinforcement learned (RL) agents is challenging. Preference-based RL (PbRL) mitigates these challenges by inferring a reward from feedback over sets of trajectories. However, the effectiveness of PbRL is limited by the amount of feedback needed to reliably recover the structure of the target reward. We present the PRIor Over Rewards (PRIOR) framework, which incorporates pri… ▽ More

    Submitted 19 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

  7. arXiv:2203.10117  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    On the role of Lip Articulation in Visual Speech Perception

    Authors: Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald

    Abstract: Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understandin… ▽ More

    Submitted 10 November, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Submitted to ICASSP 2023

  8. arXiv:2202.09472  [pdf, other

    cs.LG

    FedEmbed: Personalized Private Federated Learning

    Authors: Andrew Silva, Katherine Metcalf, Nicholas Apostoloff, Barry-John Theobald

    Abstract: Federated learning enables the deployment of machine learning to problems for which centralized data collection is impractical. Adding differential privacy guarantees bounds on privacy while data are contributed to a global model. Adding personalization to federated learning introduces new challenges as we must account for preferences of individual users, where a data sample could have conflicting… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: 15 pages

    MSC Class: 68T99 ACM Class: I.2.0

  9. arXiv:1904.01664  [pdf, other

    cs.HC cs.AI cs.CL

    Mirroring to Build Trust in Digital Assistants

    Authors: Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff

    Abstract: We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way t… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: Preprint

  10. arXiv:1812.04145  [pdf, other

    cs.LG cs.MA stat.ML

    Learning Sharing Behaviors with Arbitrary Numbers of Agents

    Authors: Katherine Metcalf, Barry-John Theobald, Nicholas Apostoloff

    Abstract: We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: 14 pages, 9 figures, 3 tables, International Conference on Autonomous Agents and Multiagent Systems (AAMAS), machine learning, Reinforcement learning