-
The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video
Authors:
Michelle R. Greene,
Benjamin J. Balas,
Mark D. Lescroart,
Paul R. MacNeilage,
Jennifer A. Hart,
Kamran Binaee,
Peter A. Hausamann,
Ronald Mezile,
Bharath Shankar,
Christian B. Sinnott,
Kaylie Capurro,
Savannah Halow,
Hunter Howe,
Mariam Josyula,
Annie Li,
Abraham Mieses,
Amina Mohamed,
Ilya Nudnou,
Ezra Parkhill,
Peter Riley,
Brett Schmidt,
Matthew W. Shinkle,
Wentao Si,
Brian Szekely,
Joaquin M. Torres
, et al. (1 additional authors not shown)
Abstract:
We introduce the Visual Experience Dataset (VEDB), a compilation of over 240 hours of egocentric video combined with gaze- and head-tracking data that offers an unprecedented view of the visual world as experienced by human observers. The dataset consists of 717 sessions, recorded by 58 observers ranging from 6-49 years old. This paper outlines the data collection, processing, and labeling protoco…
▽ More
We introduce the Visual Experience Dataset (VEDB), a compilation of over 240 hours of egocentric video combined with gaze- and head-tracking data that offers an unprecedented view of the visual world as experienced by human observers. The dataset consists of 717 sessions, recorded by 58 observers ranging from 6-49 years old. This paper outlines the data collection, processing, and labeling protocols undertaken to ensure a representative sample and discusses the potential sources of error or bias within the dataset. The VEDB's potential applications are vast, including improving gaze tracking methodologies, assessing spatiotemporal image statistics, and refining deep neural networks for scene and activity recognition. The VEDB is accessible through established open science platforms and is intended to be a living dataset with plans for expansion and community contributions. It is released with an emphasis on ethical considerations, such as participant privacy and the mitigation of potential biases. By providing a dataset grounded in real-world experiences and accompanied by extensive metadata and supporting code, the authors invite the research community to utilize and contribute to the VEDB, facilitating a richer understanding of visual perception and behavior in naturalistic settings.
△ Less
Submitted 13 August, 2024; v1 submitted 15 February, 2024;
originally announced April 2024.
-
Defining and Characterizing Reward Hacking
Authors:
Joar Skalse,
Nikolaus H. R. Howe,
Dmitrii Krasheninnikov,
David Krueger
Abstract:
We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhacka…
▽ More
We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhackable proxy by leaving some terms out of the reward function (making it "narrower") or overlooking fine-grained distinctions between roughly equivalent outcomes, but we show this is usually not the case. A key insight is that the linearity of reward (in state-action visit counts) makes unhackability a very strong condition. In particular, for the set of all stochastic policies, two reward functions can only be unhackable if one of them is constant. We thus turn our attention to deterministic policies and finite sets of stochastic policies, where non-trivial unhackable pairs always exist, and establish necessary and sufficient conditions for the existence of simplifications, an important special case of unhackability. Our results reveal a tension between using reward functions to specify narrow tasks and aligning AI systems with human values.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Myriad: a real-world testbed to bridge trajectory optimization and deep learning
Authors:
Nikolaus H. R. Howe,
Simon Dufort-Labbé,
Nitarshan Rajkumar,
Pierre-Luc Bacon
Abstract:
We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments. The primary contributions of Myriad are threefold. First, Myriad provides machine learning practitioners access to trajectory optimization techniques for application within a typical automatic differentiation workflow. Second, Myriad presents many real-world optimal control problems, rangin…
▽ More
We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments. The primary contributions of Myriad are threefold. First, Myriad provides machine learning practitioners access to trajectory optimization techniques for application within a typical automatic differentiation workflow. Second, Myriad presents many real-world optimal control problems, ranging from biology to medicine to engineering, for use by the machine learning community. Formulated in continuous space and time, these environments retain some of the complexity of real-world systems often abstracted away by standard benchmarks. As such, Myriad strives to serve as a stepping stone towards application of modern machine learning techniques for impactful real-world tasks. Finally, we use the Myriad repository to showcase a novel approach for learning and control tasks. Trained in a fully end-to-end fashion, our model leverages an implicit planning module over neural ordinary differential equations, enabling simultaneous learning and planning with complex environment dynamics.
△ Less
Submitted 26 January, 2023; v1 submitted 21 February, 2022;
originally announced February 2022.