Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Fluri, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15753  [pdf, other

    cs.LG cs.AI stat.ML

    The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

    Authors: Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Skalse

    Abstract: In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main so… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 58 pages, 1 figure

  2. arXiv:2306.09983  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Evaluating Superhuman Models with Consistency Checks

    Authors: Lukas Fluri, Daniel Paleka, Florian Tramèr

    Abstract: If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impos… ▽ More

    Submitted 19 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: 42 pages, 18 figures. Code and data are available at https://github.com/ethz-spylab/superhuman-ai-consistency