-
Melody predominates over harmony in the evolution of musical scales across 96 countries
Authors:
John M McBride,
Elizabeth Phillips,
Patrick E Savage,
Steven Brown,
Tsvi Tlusty
Abstract:
The standard theory of musical scales since antiquity has been based on harmony, rather than melody. Some recent analyses support either view, and we lack a comparative test on cross-cultural data. We address this longstanding problem through a rigorous, computational comparison of the main theories against 1,314 scales from 96 countries. There is near-universal support for melodic theories, which…
▽ More
The standard theory of musical scales since antiquity has been based on harmony, rather than melody. Some recent analyses support either view, and we lack a comparative test on cross-cultural data. We address this longstanding problem through a rigorous, computational comparison of the main theories against 1,314 scales from 96 countries. There is near-universal support for melodic theories, which predict step-sizes of 1-3 semitones. Harmony accounts for the prevalence of some simple-integer-ratio intervals, particularly for music-theoretic scales from Eurasian societies, which may explain their dominance amongst Western scholars. However, harmony poorly predicts scales measured from ethnographic recordings, particularly outside of Eurasia. Overall, we show that the historical emphasis on harmony is misguided and that melody is the primary determinant of the world's musical scales.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations
Authors:
Connor Mattson,
Anurag Aribandi,
Daniel S. Brown
Abstract:
We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot…
▽ More
We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot a policy via human video demonstrations or teaching a robot to imitate a policy from another robot with a different embodiment. However, prior work has only focused on cases where near-optimal demonstrations are available, which is often difficult to ensure. By contrast, we study the setting of cross-embodiment reward learning from mixed-quality demonstrations. We demonstrate that prior work struggles to learn generalizable reward representations when learning from mixed-quality data. We then analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning. Our results give insight into how different representation learning techniques lead to qualitatively different reward shaping behaviors and the importance of human feedback when learning from mixed-quality, mixed-embodiment data.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
NamedCurves: Learned Image Enhancement via Color Naming
Authors:
David Serrano-Lozano,
Luis Herranz,
Michael S. Brown,
Javier Vazquez-Corral
Abstract:
A popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blu…
▽ More
A popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blue" of the sky or the "green" of trees. Inspired by this approach to color manipulation, we propose NamedCurves, a learning-based image enhancement technique that separates the image into a small set of named colors. Our method learns to globally adjust the image for each specific named color via tone curves and then combines the images using an attention-based fusion mechanism to mimic spatial editing. We demonstrate the effectiveness of our method against several competing methods on the well-known Adobe 5K dataset and the PPR10K dataset, showing notable improvements.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Authors:
Teun van der Weij,
Felix Hofstätter,
Ollie Jaffe,
Samuel F. Brown,
Francis Rhys Ward
Abstract:
Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging $\unicode{x2013}$ which we define as "strategic underper…
▽ More
Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging $\unicode{x2013}$ which we define as "strategic underperformance on an evaluation". In this paper we assess sandbagging capabilities in contemporary language models (LMs). We prompt frontier LMs, like GPT-4 and Claude 3 Opus, to selectively underperform on dangerous capability evaluations, while maintaining performance on general (harmless) capability evaluations. Moreover, we find that models can be fine-tuned, on a synthetic dataset, to hide specific capabilities unless given a password. This behaviour generalizes to high-quality, held-out benchmarks such as WMDP. In addition, we show that both frontier and smaller models can be prompted, or password-locked, to target specific scores on a capability evaluation. Even more, we found that a capable password-locked model (Llama 3 70b) is reasonably able to emulate a less capable model (Llama 2 7b). Overall, our results suggest that capability evaluations are vulnerable to sandbagging. This vulnerability decreases the trustworthiness of evaluations, and thereby undermines important safety decisions regarding the development and deployment of advanced AI systems.
△ Less
Submitted 14 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
SCI 3.0: A Web-based Schema Curation Interface for Graphical Event Representations
Authors:
Reece Suchocki,
Mary Martin,
Martha Palmer,
Susan Brown
Abstract:
To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to ou…
▽ More
To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to our approach is the Schema Curation Interface 3.0 (SCI 3.0), a web application that facilitates real-time editing of event schema properties within a generated graph e.g., adding, removing, or editing sub-events, entities, and relations directly through an interface.
△ Less
Submitted 16 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad Tomašev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Authors:
Seliem El-Sayed,
Canfer Akbulut,
Amanda McCroskery,
Geoff Keeling,
Zachary Kenton,
Zaria Jalan,
Nahema Marchal,
Arianna Manzini,
Toby Shevlane,
Shannon Vallor,
Daniel Susser,
Matija Franklin,
Sophie Bridgers,
Harry Law,
Matthew Rahtz,
Murray Shanahan,
Michael Henry Tessler,
Arthur Douillard,
Tom Everitt,
Sasha Brown
Abstract:
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high…
▽ More
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
Authors:
Zohre Karimi,
Shing-Hei Ho,
Bao Thach,
Alan Kuntz,
Daniel S. Brown
Abstract:
Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This pap…
▽ More
Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery
△ Less
Submitted 15 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks
Authors:
Jordan Thompson,
Brian Y. Cho,
Daniel S. Brown,
Alan Kuntz
Abstract:
Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a…
▽ More
Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a mixture density network to output a Gaussian mixture model representation of the robot geometry given the current tendon displacements. This model computes a probability distribution that is more representative of the true distribution of geometries at a given configuration than a model that outputs a single geometry, while also reducing the computation time. We demonstrate one use of this model through a trajectory optimization method that explicitly reasons about the workspace uncertainty to minimize the probability of collision.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Exploring Correlation Patterns in the Ethereum Validator Network
Authors:
Simon Brown,
Leonardo Bautista-Gomez
Abstract:
There have been several studies into measuring the level of decentralization in Ethereum through applying various indices to indicate the relative dominance of entities in different domains in the ecosystem. However, these indices do not capture any correlation between those different entities, that could potentially make them the subject of external coercion, or covert collusion. We propose an in…
▽ More
There have been several studies into measuring the level of decentralization in Ethereum through applying various indices to indicate the relative dominance of entities in different domains in the ecosystem. However, these indices do not capture any correlation between those different entities, that could potentially make them the subject of external coercion, or covert collusion. We propose an index that measures the relative dominance of entities based on the application of correlation factors. We posit that this approach produces a more nuanced and accurate index of decentralization.
△ Less
Submitted 22 March, 2024;
originally announced April 2024.
-
Evaluating Frontier Models for Dangerous Capabilities
Authors:
Mary Phuong,
Matthew Aitchison,
Elliot Catt,
Sarah Cogan,
Alexandre Kaskasoli,
Victoria Krakovna,
David Lindner,
Matthew Rahtz,
Yannis Assael,
Sarah Hodkinson,
Heidi Howard,
Tom Lieberum,
Ramana Kumar,
Maria Abi Raad,
Albert Webson,
Lewis Ho,
Sharon Lin,
Sebastian Farquhar,
Marcus Hutter,
Gregoire Deletang,
Anian Ruoss,
Seliem El-Sayed,
Sasha Brown,
Anca Dragan,
Rohin Shah
, et al. (2 additional authors not shown)
Abstract:
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous…
▽ More
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
△ Less
Submitted 5 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models
Authors:
Dimitris Papadimitriou,
Daniel S. Brown
Abstract:
It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian…
▽ More
It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
A Framework for Assurance Audits of Algorithmic Systems
Authors:
Khoa Lam,
Benjamin Lange,
Borhane Blili-Hamelin,
Jovana Davidovic,
Shea Brown,
Ali Hasan
Abstract:
An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operatio…
▽ More
An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operationalizable compliance and assurance external audit framework. We model elements of this approach after financial auditing practices, and argue that AI audits should similarly provide assurance to their stakeholders about AI organizations' ability to govern their algorithms in ways that mitigate harms and uphold human values. We discuss the necessary conditions for the criterion audit and provide a procedural blueprint for performing an audit engagement in practice. We illustrate how this framework can be adapted to current regulations by deriving the criteria on which bias audits can be performed for in-scope hiring algorithms, as required by the recently effective New York City Local Law 144 of 2021. We conclude by offering a critical discussion on the benefits, inherent limitations, and implementation challenges of applying practices of the more mature financial auditing industry to AI auditing where robust guardrails against quality assurance issues are only starting to emerge. Our discussion -- informed by experiences in performing these audits in practice -- highlights the critical role that an audit ecosystem plays in ensuring the effectiveness of audits.
△ Less
Submitted 28 May, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
RESIN-EDITOR: A Schema-guided Hierarchical Event Graph Visualizer and Editor
Authors:
Khanh Duy Nguyen,
Zixuan Zhang,
Reece Suchocki,
Sha Li,
Martha Palmer,
Susan Brown,
Jiawei Han,
Heng Ji
Abstract:
In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, com…
▽ More
In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, comprehensive source tracing, and interactive user editing, which is more powerful and versatile than existing Information Extraction (IE) visualization tools. In our evaluation of RESIN-EDITOR, we demonstrate ways in which our tool is effective in understanding complex events and enhancing system performance. The source code, a video demonstration, and a live website for RESIN-EDITOR have been made publicly available.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Creating a Discipline-specific Commons for Infectious Disease Epidemiology
Authors:
Michael M. Wagner,
William Hogan,
John Levander,
Adam Darr,
Matt Diller,
Max Sibilla,
Alexander T. Loiacono. Terence Sperringer, Jr.,
Shawn T. Brown
Abstract:
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potenti…
▽ More
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots
Authors:
Connor Mattson,
Jeremy C. Clark,
Daniel S. Brown
Abstract:
We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this…
▽ More
We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Quantifying Assistive Robustness Via the Natural-Adversarial Frontier
Authors:
Jerry Zhi-Yang He,
Zackory Erickson,
Daniel S. Brown,
Anca D. Dragan
Abstract:
Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to…
▽ More
Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to human motions that are unlikely to occur during natural interactions with people. A robot policy might fail under small adversarial perturbations but work under large natural perturbations. We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance. We introduce RIGID, a method for constructing this frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like (as measured by a discriminator). On an Assistive Gym task, we use RIGID to analyze the performance of standard collaborative Reinforcement Learning, as well as the performance of existing methods meant to increase robustness. We also compare the frontier RIGID identifies with the failures identified in expert adversarial interaction, and with naturally-occurring failures during user interaction. Overall, we find evidence that RIGID can provide a meaningful measure of robustness predictive of deployment performance, and uncover failure cases in human-robot interaction that are difficult to find manually. https://ood-human.github.io.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Indirect Swarm Control: Characterization and Analysis of Emergent Swarm Behaviors
Authors:
Ricardo Vega,
Connor Mattson,
Daniel S. Brown,
Cameron Nowzari
Abstract:
Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for ch…
▽ More
Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for characterizing the conditions that lead to different macrostates and how to predict/analyze their macroscopic properties, allowing us to indirectly engineer the same behaviors from the bottom up by tuning their environmental conditions rather than local interaction rules. We then apply this framework to a simple system of binary sensing and acting agents as an example to see if a re-framing of this swarms problem can help us push the state of the art forward. By first creating some working definitions of macrostates in a particular swarm system, we show how agent-based modeling may be combined with control theory to enable a generalized understanding of controllable emergent processes without needing to simulate everything. Whereas phase diagrams can generally only be created through Monte Carlo simulations or sweeping through ranges of parameters in a simulator, we develop closed-form functions that can immediately produce them revealing an infinite set of swarm parameter combinations that can lead to a specifically chosen self-organized behavior. While the exact methods are still under development, we believe simply laying out a potential path towards solutions that have evaded our traditional methods using a novel method is worth considering. Our results are characterized through both simulations and real experiments on ground robots.
△ Less
Submitted 28 March, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Examining Autoexposure for Challenging Scenes
Authors:
SaiKiran Tedla,
Beixuan Yang,
Michael S. Brown
Abstract:
Autoexposure (AE) is a critical step applied by camera systems to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with constant illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especia…
▽ More
Autoexposure (AE) is a critical step applied by camera systems to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with constant illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especially those with time-varying lighting, is the lack of suitable image datasets. To address this issue, we have captured a new 4D exposure dataset that provides a large solution space (i.e., shutter speed range from (1/500 to 15 seconds) over a temporal sequence with moving objects, bright lights, and varying lighting. In addition, we have designed a software platform to allow AE algorithms to be used in a plug-and-play manner with the dataset. Our dataset and associate platform enable repeatable evaluation of different AE algorithms and provide a much-needed starting point to develop better AE methods. We examine several existing AE strategies using our dataset and show that most users prefer a simple saliency method for challenging lighting conditions.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Contextual Reliability: When Different Features Matter in Different Contexts
Authors:
Gaurav Ghosal,
Amrith Setlur,
Daniel S. Brown,
Anca D. Dragan,
Aditi Raghunathan
Abstract:
Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we canno…
▽ More
Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?
Authors:
Akansha Kalra,
Daniel S. Brown
Abstract:
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if the…
▽ More
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if these frameworks are actually aligned with human preferences. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We also provide experimental evidence that not only shows that reward DDTs can often achieve competitive RL performance when compared with larger capacity deep neural network reward functions but also demonstrates the diagnostic utility of our framework in checking alignment of learned reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards. Videos and code, are available at: https://sites.google.com/view/ddt-rlhf
△ Less
Submitted 24 June, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement
Authors:
Marcos V. Conde,
Javier Vazquez-Corral,
Michael S. Brown,
Radu Timofte
Abstract:
3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet n…
▽ More
3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet not so memory-efficient, as storing multiple 3D LUTs is required. For this reason and other implementation limitations, their use on mobile devices is less popular. In this work, we propose a Neural Implicit LUT (NILUT), an implicitly defined continuous 3D color transformation parameterized by a neural network. We show that NILUTs are capable of accurately emulating real 3D LUTs. Moreover, a NILUT can be extended to incorporate multiple styles into a single network with the ability to blend styles implicitly. Our novel approach is memory-efficient, controllable and can complement previous methods, including learned ISPs. Code, models and dataset available at: https://github.com/mv-lab/nilut
△ Less
Submitted 24 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
brainlife.io: A decentralized and open source cloud platform to support neuroscience research
Authors:
Soichi Hayashi,
Bradley A. Caron,
Anibal Sólon Heinsfeld,
Sophia Vinci-Booher,
Brent McPherson,
Daniel N. Bullock,
Giulia Bertò,
Guiomar Niso,
Sandra Hanekamp,
Daniel Levitas,
Kimberly Ray,
Anne MacKenzie,
Lindsey Kitchell,
Josiah K. Leong,
Filipi Nascimento-Silva,
Serge Koudoro,
Hanna Willis,
Jasleen K. Jolly,
Derek Pisner,
Taylor R. Zuidema,
Jan W. Kurzawski,
Kyriaki Mikellidou,
Aurore Bussalb,
Christopher Rorden,
Conner Victory
, et al. (39 additional authors not shown)
Abstract:
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to red…
▽ More
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research.
△ Less
Submitted 11 August, 2023; v1 submitted 3 June, 2023;
originally announced June 2023.
-
Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms
Authors:
Connor Mattson,
Daniel S. Brown
Abstract:
Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand…
▽ More
Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand what behaviors are interesting or even possible. Our proposed approach adapts to user preferences by learning a similarity space over swarm collective behaviors using self-supervised learning and human-in-the-loop queries. We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors. We also propose several general-purpose heuristics that improve the efficiency of our novelty search by prioritizing robot controllers that are likely to lead to interesting emergent behaviors. We test our approach in simulation on two robot capability models and show that our methods consistently discover a richer set of emergent behaviors than prior work. Code, videos, and datasets are available at https://sites.google.com/view/evolving-novel-swarms.
△ Less
Submitted 16 July, 2023; v1 submitted 25 April, 2023;
originally announced May 2023.
-
Learning Semantic Role Labeling from Compatible Label Sequences
Authors:
Tao Li,
Ghazaleh Kazeminejad,
Susan W. Brown,
Martha Palmer,
Vivek Srikumar
Abstract:
Semantic role labeling (SRL) has multiple disjoint label sets, e.g., VerbNet and PropBank. Creating these datasets is challenging, therefore a natural question is how to use each one to help the other. Prior work has shown that cross-task interaction helps, but only explored multitask learning so far. A common issue with multi-task setup is that argument sequences are still separately decoded, run…
▽ More
Semantic role labeling (SRL) has multiple disjoint label sets, e.g., VerbNet and PropBank. Creating these datasets is challenging, therefore a natural question is how to use each one to help the other. Prior work has shown that cross-task interaction helps, but only explored multitask learning so far. A common issue with multi-task setup is that argument sequences are still separately decoded, running the risk of generating structurally inconsistent label sequences (as per lexicons like Semlink). In this paper, we eliminate such issue with a framework that jointly models VerbNet and PropBank labels as one sequence. In this setup, we show that enforcing Semlink constraints during decoding constantly improves the overall F1. With special input constructions, our joint model infers VerbNet arguments from given PropBank arguments with over 99 F1. For learning, we propose a constrained marginal model that learns with knowledge defined in Semlink to further benefit from the large amounts of PropBank-only data. On the joint benchmark based on CoNLL05, our models achieve state-of-the-art F1's, outperforming the prior best in-domain model by 3.5 (VerbNet) and 0.8 (PropBank). For out-of-domain generalization, our models surpass the prior best by 3.4 (VerbNet) and 0.2 (PropBank).
△ Less
Submitted 19 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI
Authors:
Louise Axon,
Dimitrios Panagiotakopoulos,
Samuel Ayo,
Carolina Sanchez-Hernandez,
Yan Zong,
Simon Brown,
Lei Zhang,
Michael Goldsmith,
Sadie Creese,
Weisi Guo
Abstract:
Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure…
▽ More
Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure data from diverse stakeholders. There is an urgent need to develop a secure network that has trustworthy data chains and works with the requirements generated by UTM. Here, we review existing research in 3 key interconnected areas: (1) blockchain development for secure data transfer between competing aviation stakeholders, (2) self-learning networking architectures that distribute consensus to achieve secure air traffic control, (3) explainable AI to build trust with human stakeholders and backpropagate requirements for blockchain and network optimisation. When connected together, this new digital ecosystem blueprint is tailored for safety critical UTM sectors. We motivate the readers with a case study, where a federated learning UTM uses real air traffic and weather data is secured and explained to human operators. This emerging area still requires significant research and development by the community to ensure it can enable future autonomous air mobility.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Network Analysis as a Tool for Shaping Conservation and Development Policy: A Case Study of Timber Market Optimization in India
Authors:
Xiou Ge,
Sarah E. Brown,
Pushpendra Rana,
Lav R. Varshney,
Daniel C. Miller
Abstract:
The incorporation of trees on farms can help to improve livelihoods and build resilience among small-holder farmers in developing countries. On-farm trees can help gen- erate additional income from commercial tree harvest as well as contribute significant environmental benefits and ecosystem services to increase resiliency. Long-term benefits from tree-based livelihoods, however, depend on sustain…
▽ More
The incorporation of trees on farms can help to improve livelihoods and build resilience among small-holder farmers in developing countries. On-farm trees can help gen- erate additional income from commercial tree harvest as well as contribute significant environmental benefits and ecosystem services to increase resiliency. Long-term benefits from tree-based livelihoods, however, depend on sustainable timber harvesting. In this paper, we discuss the potential for network analysis as a tool to inform conservation and development decision-making. Specifically, we formulate the commercial tree market between farmers and traders as a transportation problem and optimize the transactions. We create a model of the commercial tree market in the Bilaspur district of Himachal Pradesh, India based on a detailed dataset of market interactions between farmers and timber traders, using the existing road network of this region. Using this model, we perform a maximum-flow-minimum-cost optimization for tree commodity flow. We compare the results of our optimized model with actual flow within the network, and we find a high potential to increase efficiency of market transactions within this region, noting a significant reduction to the minimum- cost flow value for our optimized model compared to the flow cost for actual transactions. We propose that using this network flow optimization model to strategically distribute permits can reduce costs associated with market transactions. Our results suggest that this direct policy action would be beneficial to the region. Finally, we suggest that cost savings could be used to establish tree planting programs to support a long-term sustainable tree market. Shaping policies to address these market inefficiencies in developing regions could help support and elevate tree-based livelihoods for farmers, traders, and industries.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
GamutMLP: A Lightweight MLP for Color Loss Recovery
Authors:
Hoang M. Le,
Brian Price,
Scott Cohen,
Michael S. Brown
Abstract:
Cameras and image-editing software often process images in the wide-gamut ProPhoto color space, encompassing 90% of all visible colors. However, when images are encoded for sharing, this color-rich representation is transformed and clipped to fit within the small-gamut standard RGB (sRGB) color space, representing only 30% of visible colors. Recovering the lost color information is challenging due…
▽ More
Cameras and image-editing software often process images in the wide-gamut ProPhoto color space, encompassing 90% of all visible colors. However, when images are encoded for sharing, this color-rich representation is transformed and clipped to fit within the small-gamut standard RGB (sRGB) color space, representing only 30% of visible colors. Recovering the lost color information is challenging due to the clipping procedure. Inspired by neural implicit representations for 2D images, we propose a method that optimizes a lightweight multi-layer-perceptron (MLP) model during the gamut reduction step to predict the clipped values. GamutMLP takes approximately 2 seconds to optimize and requires only 23 KB of storage. The small memory footprint allows our GamutMLP model to be saved as metadata in the sRGB image -- the model can be extracted when needed to restore wide-gamut color values. We demonstrate the effectiveness of our approach for color recovery and compare it with alternative strategies, including pre-trained DNN-based gamut expansion networks and other implicit neural representation methods. As part of this effort, we introduce a new color gamut dataset of 2200 wide-gamut/small-gamut images for training and testing. Our code and dataset can be found on the project website: https://gamut-mlp.github.io.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Human-in-the-Loop Schema Induction
Authors:
Tianyi Zhang,
Isaac Tham,
Zhaoyi Hou,
Jiaxuan Ren,
Liyang Zhou,
Hainiu Xu,
Li Zhang,
Lara J. Martin,
Rotem Dror,
Sha Li,
Heng Ji,
Martha Palmer,
Susan Brown,
Reece Suchocki,
Chris Callison-Burch
Abstract:
Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic el…
▽ More
Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models
Authors:
Yi Liu,
Gaurav Datta,
Ellen Novoseller,
Daniel S. Brown
Abstract:
Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we stu…
▽ More
Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction. Our paper provides empirical evidence that learned dynamics models enable robots to learn customized policies based on user preferences in ways that are safer and more sample efficient than prior preference learning approaches. Supplementary materials and code are available at https://sites.google.com/berkeley.edu/mop-rl.
△ Less
Submitted 9 February, 2024; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Authors:
Daniel Shin,
Anca D. Dragan,
Daniel S. Brown
Abstract:
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization…
▽ More
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization via offline RL, our observation is that it can be a surprisingly rich source of information for preference learning as well. We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning, learns a distribution over reward functions, and optimizes a corresponding policy via offline RL. Crucially, our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps. To test our approach, we first evaluate existing offline RL benchmarks for their suitability for offline reward learning. Surprisingly, for many offline RL domains, we find that simply using a trivial reward function results good policy performance, making these domains ill-suited for evaluating learned rewards. To address this, we identify a subset of existing offline RL benchmarks that are well suited for offline reward learning and also propose new offline apprenticeship learning benchmarks which allow for more open-ended behaviors. When evaluated on this curated set of domains, our empirical results suggest that combining offline RL with learned human preferences can enable an agent to learn to perform novel tasks that were not explicitly shown in the offline data.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
SIRL: Similarity-based Implicit Representation Learning
Authors:
Andreea Bobu,
Yi Liu,
Rohin Shah,
Daniel S. Brown,
Anca D. Dragan
Abstract:
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that co…
▽ More
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
△ Less
Submitted 17 March, 2023; v1 submitted 2 January, 2023;
originally announced January 2023.
-
Learning Representations that Enable Generalization in Assistive Tasks
Authors:
Jerry Zhi-Yang He,
Aditi Raghunathan,
Daniel S. Brown,
Zackory Erickson,
Anca D. Dragan
Abstract:
Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Su…
▽ More
Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
SimpleMind adds thinking to deep neural networks
Authors:
Youngwon Choi,
M. Wasil Wahi-Anwar,
Matthew S. Brown
Abstract:
Deep neural networks (DNNs) detect patterns in data and have shown versatility and strong performance in many computer vision applications. However, DNNs alone are susceptible to obvious mistakes that violate simple, common sense concepts and are limited in their ability to use explicit knowledge to guide their search and decision making. While overall DNN performance metrics may be good, these ob…
▽ More
Deep neural networks (DNNs) detect patterns in data and have shown versatility and strong performance in many computer vision applications. However, DNNs alone are susceptible to obvious mistakes that violate simple, common sense concepts and are limited in their ability to use explicit knowledge to guide their search and decision making. While overall DNN performance metrics may be good, these obvious errors, coupled with a lack of explainability, have prevented widespread adoption for crucial tasks such as medical image analysis. The purpose of this paper is to introduce SimpleMind, an open-source software framework for Cognitive AI focused on medical image understanding. It allows creation of a knowledge base that describes expected characteristics and relationships between image objects in an intuitive human-readable form. The SimpleMind framework brings thinking to DNNs by: (1) providing methods for reasoning with the knowledge base about image content, such as spatial inferencing and conditional reasoning to check DNN outputs; (2) applying process knowledge, in the form of general-purpose software agents, that are chained together to accomplish image preprocessing, DNN prediction, and result post-processing, and (3) performing automatic co-optimization of all knowledge base parameters to adapt agents to specific problems. SimpleMind enables reasoning on multiple detected objects to ensure consistency, providing cross checking between DNN outputs. This machine reasoning improves the reliability and trustworthiness of DNNs through an interpretable model and explainable decisions. Example applications are provided that demonstrate how SimpleMind supports and improves deep neural networks by embedding them within a Cognitive AI framework.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning
Authors:
Tu Trinh,
Haoyu Chen,
Daniel S. Brown
Abstract:
We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high…
▽ More
We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the human's unobserved reward function, and (2) percent improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation for both discrete and continuous state-space domains and illustrate the feasibility of developing a robotic system that can accurately evaluate demonstration sufficiency. We also show that the robot can utilize active learning in asking for demonstrations from specific states which results in fewer demos needed for the robot to still maintain high confidence in its policy. Finally, via a user study, we show that our approach successfully enables robots to perform at users' desired performance levels, without needing too many or perfectly optimal demonstrations.
△ Less
Submitted 2 January, 2024; v1 submitted 28 November, 2022;
originally announced November 2022.
-
MIMT: Multi-Illuminant Color Constancy via Multi-Task Local Surface and Light Color Learning
Authors:
Shuwei Li,
Jikai Wang,
Michael S. Brown,
Robby T. Tan
Abstract:
The assumption of a uniform light color distribution is no longer applicable in scenes that have multiple light colors. Most color constancy methods are designed to deal with a single light color, and thus are erroneous when applied to multiple light colors. The spatial variability in multiple light colors causes the color constancy problem to be more challenging and requires the extraction of loc…
▽ More
The assumption of a uniform light color distribution is no longer applicable in scenes that have multiple light colors. Most color constancy methods are designed to deal with a single light color, and thus are erroneous when applied to multiple light colors. The spatial variability in multiple light colors causes the color constancy problem to be more challenging and requires the extraction of local surface/light information. Motivated by this, we introduce a multi-task learning method to discount multiple light colors in a single input image. To have better cues of the local surface/light colors under multiple light color conditions, we design a novel multi-task learning framework. Our framework includes auxiliary tasks of achromatic-pixel detection and surface-color similarity prediction, providing better cues for local light and surface colors, respectively. Moreover, to ensure that our model maintains the constancy of surface colors regardless of the variations of light colors, a novel local surface color feature preservation scheme is developed. We demonstrate that our model achieves 47.1% improvement (from 4.69 mean angular error to 2.48) compared to a state-of-the-art multi-illuminant color constancy method on a multi-illuminant dataset (LSMI).
△ Less
Submitted 22 August, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Inpainting in discrete Sobolev spaces: structural information for uncertainty reduction
Authors:
Marco Seracini,
Stephen R. Brown
Abstract:
In this article, using an exemplar-based approach, we investigate the inpainting problem, introducing a new mathematical functional, whose minimization determines the quality of the reconstructions. The new functional expression takes into account of fnite differences terms, in a similar fashion to what happens in the theoretical Sobolev spaces. Moreover, we introduce a new priority index to deter…
▽ More
In this article, using an exemplar-based approach, we investigate the inpainting problem, introducing a new mathematical functional, whose minimization determines the quality of the reconstructions. The new functional expression takes into account of fnite differences terms, in a similar fashion to what happens in the theoretical Sobolev spaces. Moreover, we introduce a new priority index to determine the scanning order of the points to inpaint, prioritizing the uncertainty reduction in the choice. The achieved results highlight important theoretical-connected aspects of the inpainting by patch procedure.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations
Authors:
Albert Wilcox,
Ashwin Balakrishna,
Jules Dedieu,
Wyame Benslimane,
Daniel S. Brown,
Ken Goldberg
Abstract:
Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, p…
▽ More
Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified $Q$-value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in the neighborhood of high-performing trajectories by encouraging high $Q$-values in corresponding regions of the state space. Experiments across $5$ continuous control domains suggest that MCAC can be used to significantly increase learning efficiency across $6$ commonly used RL and RL-from-demonstrations algorithms. See https://sites.google.com/view/mcac-rl for code and supplementary material.
△ Less
Submitted 20 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Bayesian Neural Networks for Geothermal Resource Assessment: Prediction with Uncertainty
Authors:
Stephen Brown,
William L. Rodi,
Marco Seracini,
Chen Gu,
Michael Fehler,
James Faulds,
Connor M. Smith,
Sven Treitel
Abstract:
We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) an…
▽ More
We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) and negative training sites (known drill sites with unsuitable geothermal conditions) and use these to constrain and optimize artificial neural networks for this classification task. The main objective is to predict the geothermal resource potential at unknown sites within a large geographic area where the defining features are known. These predictions could be used to target promising areas for further detailed investigations. We describe the evolution of our work from defining a specific neural network architecture to training and optimization trials. Upon analysis we expose the inevitable problems of model variability and resulting prediction uncertainty. Finally, to address these problems we apply the concept of Bayesian neural networks, a heuristic approach to regularization in network training, and make use of the practical interpretation of the formal uncertainty measures they provide.
△ Less
Submitted 25 October, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Don't CWEAT It: Toward CWE Analysis Techniques in Early Stages of Hardware Design
Authors:
Baleegh Ahmad,
Wei-Kai Liu,
Luca Collini,
Hammond Pearce,
Jason M. Fung,
Jonathan Valamehr,
Mohammad Bidmeshki,
Piotr Sapiecha,
Steve Brown,
Krishnendu Chakrabarty,
Ramesh Karri,
Benjamin Tan
Abstract:
To help prevent hardware security vulnerabilities from propagating to later design stages where fixes are costly, it is crucial to identify security concerns as early as possible, such as in RTL designs. In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code t…
▽ More
To help prevent hardware security vulnerabilities from propagating to later design stages where fixes are costly, it is crucial to identify security concerns as early as possible, such as in RTL designs. In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code that might contain one of a set of MITRE's common weakness enumerations (CWEs). We explore the CWE database to characterize the scope and attributes of the CWEs and identify those that are amenable to static analysis. We prototype scanners and evaluate them on 11 open source designs - 4 system-on-chips (SoC) and 7 processor cores - and explore the nature of identified weaknesses. Our analysis reported 53 potential weaknesses in the OpenPiton SoC used in Hack@DAC-21, 11 of which we confirmed as security concerns.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types
Authors:
Gaurav R. Ghosal,
Matthew Zurek,
Daniel S. Brown,
Anca D. Dragan
Abstract:
When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, o…
▽ More
When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, or quality, of human feedback. However, in many settings, giving one type of feedback (e.g. a demonstration) may be much more difficult than a different type of feedback (e.g. answering a comparison query). Thus, we expect to see more or less noise depending on the type of human feedback. In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning. We test this in both simulated experiments and in a user study with real human feedback. We find that overestimating human rationality can have dire effects on reward learning accuracy and regret. We also find that fitting the rationality coefficient to human data enables better reward learning, even when the human deviates significantly from the noisy-rational choice model due to systematic biases. Further, we find that the rationality level affects the informativeness of each feedback type: surprisingly, demonstrations are not always the most informative -- when the human acts very suboptimally, comparisons actually become more informative, even when the rationality level is the same for both. Ultimately, our results emphasize the importance and advantage of paying attention to the assumed human-rationality level, especially when agents actively learn from multiple types of human feedback.
△ Less
Submitted 9 March, 2023; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Reservoir Computing with 3D Nanowire Networks
Authors:
R. K. Daniels,
J. B. Mallinson,
Z. E. Heywood,
P. J. Bones,
M. D. Arnold,
S. A. Brown
Abstract:
Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the…
▽ More
Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the nanowires as 1D objects lying in a perfectly 2D plane - the effect of stacking on RC performance has not yet been established. Here we use detailed simulations to compare the performance of perfectly 2D and quasi-3D (stacked) networks of nanowires in two tasks: memory capacity and nonlinear transformation. We also show that our model of the junctions between nanowires is general enough to describe a wide range of memristive networks, and consider the impact of physically realistic electrode configurations on performance. We show that the various networks and configurations have a strikingly similar performance in RC tasks, which is surprising given their radically different topologies. Our results show that networks with an experimentally achievable number of electrodes perform close to the upper bounds achievable when using the information from every wire. However, we also show important differences, in particular that the quasi-3D networks are more resilient to changes in the input parameters, generalizing better to noisy training data. Since previous literature suggests that topology plays an important role in computing performance, these results may have important implications for future applications of nanowire networks in neuromorphic computing.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies
Authors:
Satvik Sharma,
Ellen Novoseller,
Vainavi Viswanath,
Zaynah Javed,
Rishi Parikh,
Ryan Hoque,
Ashwin Balakrishna,
Daniel S. Brown,
Ken Goldberg
Abstract:
Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behav…
▽ More
Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behaviors on physical hardware. On the other hand, excessive training in simulation can cause policies to overfit to the visual appearance and dynamics of the simulator. In this work, we study strategies to automatically determine when policies trained in simulation can be reliably transferred to a physical robot. We specifically study these ideas in the context of robotic fabric manipulation, in which successful sim2real transfer is especially challenging due to the difficulties of precisely modeling the dynamics and visual appearance of fabric. Results in a fabric smoothing task suggest that our switching criteria correlate well with performance in real. In particular, our confidence-based switching criteria achieve average final fabric coverage of 87.2-93.7% within 55-60% of the total training budget. See https://tinyurl.com/lsc-case for code and supplemental materials.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Day-to-Night Image Synthesis for Training Nighttime Neural ISPs
Authors:
Abhijith Punnappurath,
Abdullah Abuolaim,
Abdelrahman Abdelhamed,
Alex Levinshtein,
Michael S. Brown
Abstract:
Many flagship smartphone cameras now use a dedicated neural image signal processor (ISP) to render noisy raw sensor images to the final processed output. Training nightmode ISP networks relies on large-scale datasets of image pairs with: (1) a noisy raw image captured with a short exposure and a high ISO gain; and (2) a ground truth low-noise raw image captured with a long exposure and low ISO tha…
▽ More
Many flagship smartphone cameras now use a dedicated neural image signal processor (ISP) to render noisy raw sensor images to the final processed output. Training nightmode ISP networks relies on large-scale datasets of image pairs with: (1) a noisy raw image captured with a short exposure and a high ISO gain; and (2) a ground truth low-noise raw image captured with a long exposure and low ISO that has been rendered through the ISP. Capturing such image pairs is tedious and time-consuming, requiring careful setup to ensure alignment between the image pairs. In addition, ground truth images are often prone to motion blur due to the long exposure. To address this problem, we propose a method that synthesizes nighttime images from daytime images. Daytime images are easy to capture, exhibit low-noise (even on smartphone cameras) and rarely suffer from motion blur. We outline a processing framework to convert daytime raw images to have the appearance of realistic nighttime raw images with different levels of noise. Our procedure allows us to easily produce aligned noisy and clean nighttime image pairs. We show the effectiveness of our synthesis framework by training neural ISPs for nightmode rendering. Furthermore, we demonstrate that using our synthetic nighttime images together with small amounts of real data (e.g., 5% to 10%) yields performance almost on par with training exclusively on real nighttime images. Our dataset and code are available at https://github.com/SamsungLabs/day-to-night.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata
Authors:
Seonghyeon Nam,
Abhijith Punnappurath,
Marcus A. Brubaker,
Michael S. Brown
Abstract:
Most camera images are rendered and saved in the standard RGB (sRGB) format by the camera's hardware. Due to the in-camera photo-finishing routines, nonlinear sRGB images are undesirable for computer vision tasks that assume a direct relationship between pixel values and scene radiance. For such applications, linear raw-RGB sensor images are preferred. Saving images in their raw-RGB format is stil…
▽ More
Most camera images are rendered and saved in the standard RGB (sRGB) format by the camera's hardware. Due to the in-camera photo-finishing routines, nonlinear sRGB images are undesirable for computer vision tasks that assume a direct relationship between pixel values and scene radiance. For such applications, linear raw-RGB sensor images are preferred. Saving images in their raw-RGB format is still uncommon due to the large storage requirement and lack of support by many imaging applications. Several "raw reconstruction" methods have been proposed that utilize specialized metadata sampled from the raw-RGB image at capture time and embedded in the sRGB image. This metadata is used to parameterize a mapping function to de-render the sRGB image back to its original raw-RGB format when needed. Existing raw reconstruction methods rely on simple sampling strategies and global mapping to perform the de-rendering. This paper shows how to improve the de-rendering results by jointly learning sampling and reconstruction. Our experiments show that our learned sampling can adapt to the image content to produce better raw reconstructions than existing methods. We also describe an online fine-tuning strategy for the reconstruction network to improve results further.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Noise2NoiseFlow: Realistic Camera Noise Modeling without Clean Images
Authors:
Ali Maleky,
Shayan Kousha,
Michael S. Brown,
Marcus A. Brubaker
Abstract:
Image noise modeling is a long-standing problem with many applications in computer vision. Early attempts that propose simple models, such as signal-independent additive white Gaussian noise or the heteroscedastic Gaussian noise model (a.k.a., camera noise level function) are not sufficient to learn the complex behavior of the camera sensor noise. Recently, more complex learning-based models have…
▽ More
Image noise modeling is a long-standing problem with many applications in computer vision. Early attempts that propose simple models, such as signal-independent additive white Gaussian noise or the heteroscedastic Gaussian noise model (a.k.a., camera noise level function) are not sufficient to learn the complex behavior of the camera sensor noise. Recently, more complex learning-based models have been proposed that yield better results in noise synthesis and downstream tasks, such as denoising. However, their dependence on supervised data (i.e., paired clean images) is a limiting factor given the challenges in producing ground-truth images. This paper proposes a framework for training a noise model and a denoiser simultaneously while relying only on pairs of noisy images rather than noisy/clean paired image data. We apply this framework to the training of the Noise Flow architecture. The noise synthesis and density estimation results show that our framework outperforms previous signal-processing-based noise models and is on par with its supervised counterpart. The trained denoiser is also shown to significantly improve upon both supervised and weakly supervised baseline denoising approaches. The results indicate that the joint training of a denoiser and a noise model yields significant improvements in the denoiser.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Modeling sRGB Camera Noise with Normalizing Flows
Authors:
Shayan Kousha,
Ali Maleky,
Michael S. Brown,
Marcus A. Brubaker
Abstract:
Noise modeling and reduction are fundamental tasks in low-level computer vision. They are particularly important for smartphone cameras relying on small sensors that exhibit visually noticeable noise. There has recently been renewed interest in using data-driven approaches to improve camera noise models via neural networks. These data-driven approaches target noise present in the raw-sensor image…
▽ More
Noise modeling and reduction are fundamental tasks in low-level computer vision. They are particularly important for smartphone cameras relying on small sensors that exhibit visually noticeable noise. There has recently been renewed interest in using data-driven approaches to improve camera noise models via neural networks. These data-driven approaches target noise present in the raw-sensor image before it has been processed by the camera's image signal processor (ISP). Modeling noise in the RAW-rgb domain is useful for improving and testing the in-camera denoising algorithm; however, there are situations where the camera's ISP does not apply denoising or additional denoising is desired when the RAW-rgb domain image is no longer available. In such cases, the sensor noise propagates through the ISP to the final rendered image encoded in standard RGB (sRGB). The nonlinear steps on the ISP culminate in a significantly more complex noise distribution in the sRGB domain and existing raw-domain noise models are unable to capture the sRGB noise distribution. We propose a new sRGB-domain noise model based on normalizing flows that is capable of learning the complex noise distribution found in sRGB images under various ISO levels. Our normalizing flows-based approach outperforms other models by a large margin in noise modeling and synthesis tasks. We also show that image denoisers trained on noisy images synthesized with our noise model outperforms those trained with noise from baselines models.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
The Forgotten Margins of AI Ethics
Authors:
Abeba Birhane,
Elayne Ruane,
Thomas Laurent,
Matthew S. Brown,
Johnathan Flowers,
Anthony Ventresque,
Christopher L. Dancy
Abstract:
How has recent AI Ethics literature addressed topics such as fairness and justice in the context of continued social and structural power asymmetries? We trace both the historical roots and current landmark work that have been shaping the field and categorize these works under three broad umbrellas: (i) those grounded in Western canonical philosophy, (ii) mathematical and statistical methods, and…
▽ More
How has recent AI Ethics literature addressed topics such as fairness and justice in the context of continued social and structural power asymmetries? We trace both the historical roots and current landmark work that have been shaping the field and categorize these works under three broad umbrellas: (i) those grounded in Western canonical philosophy, (ii) mathematical and statistical methods, and (iii) those emerging from critical data/algorithm/information studies. We also survey the field and explore emerging trends by examining the rapidly growing body of literature that falls under the broad umbrella of AI Ethics. To that end, we read and annotated peer-reviewed papers published over the past four years in two premier conferences: FAccT and AIES. We organize the literature based on an annotation scheme we developed according to three main dimensions: whether the paper deals with concrete applications, use-cases, and/or people's lived experience; to what extent it addresses harmed, threatened, or otherwise marginalized groups; and if so, whether it explicitly names such groups. We note that although the goals of the majority of FAccT and AIES papers were often commendable, their consideration of the negative impacts of AI on traditionally marginalized groups remained shallow. Taken together, our conceptual analysis and the data from annotated papers indicate that the field would benefit from an increased focus on ethical analysis grounded in concrete use-cases, people's experiences, and applications as well as from approaches that are sensitive to structural and historical power asymmetries.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Causal Confusion and Reward Misidentification in Preference-Based Reward Learning
Authors:
Jeremy Tien,
Jerry Zhi-Yang He,
Zackory Erickson,
Anca D. Dragan,
Daniel S. Brown
Abstract:
Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification w…
▽ More
Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences. In particular, we perform a series of sensitivity and ablation analyses on several benchmark domains where rewards learned from preferences achieve minimal test error but fail to generalize to out-of-distribution states -- resulting in poor policy performance when optimized. We find that the presence of non-causal distractor features, noise in the stated preferences, and partial state observability can all exacerbate reward misidentification. We also identify a set of methods with which to interpret misidentified learned rewards. In general, we observe that optimizing misidentified rewards drives the policy off the reward's training distribution, resulting in high predicted (learned) rewards but low true rewards. These findings illuminate the susceptibility of preference learning to reward misidentification and causal confusion -- failure to consider even one of many factors can result in unexpected, undesirable behavior.
△ Less
Submitted 18 March, 2023; v1 submitted 13 April, 2022;
originally announced April 2022.