Search | arXiv e-print repository

Melody predominates over harmony in the evolution of musical scales across 96 countries

Authors: John M McBride, Elizabeth Phillips, Patrick E Savage, Steven Brown, Tsvi Tlusty

Abstract: The standard theory of musical scales since antiquity has been based on harmony, rather than melody. Some recent analyses support either view, and we lack a comparative test on cross-cultural data. We address this longstanding problem through a rigorous, computational comparison of the main theories against 1,314 scales from 96 countries. There is near-universal support for melodic theories, which… ▽ More The standard theory of musical scales since antiquity has been based on harmony, rather than melody. Some recent analyses support either view, and we lack a comparative test on cross-cultural data. We address this longstanding problem through a rigorous, computational comparison of the main theories against 1,314 scales from 96 countries. There is near-universal support for melodic theories, which predict step-sizes of 1-3 semitones. Harmony accounts for the prevalence of some simple-integer-ratio intervals, particularly for music-theoretic scales from Eurasian societies, which may explain their dominance amongst Western scholars. However, harmony poorly predicts scales measured from ethnographic recordings, particularly outside of Eurasia. Overall, we show that the historical emphasis on harmony is misguided and that melody is the primary determinant of the world's musical scales. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.05610 [pdf, other]

Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations

Authors: Connor Mattson, Anurag Aribandi, Daniel S. Brown

Abstract: We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot… ▽ More We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot a policy via human video demonstrations or teaching a robot to imitate a policy from another robot with a different embodiment. However, prior work has only focused on cases where near-optimal demonstrations are available, which is often difficult to ensure. By contrast, we study the setting of cross-embodiment reward learning from mixed-quality demonstrations. We demonstrate that prior work struggles to learn generalizable reward representations when learning from mixed-quality data. We then analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning. Our results give insight into how different representation learning techniques lead to qualitatively different reward shaping behaviors and the importance of human feedback when learning from mixed-quality, mixed-embodiment data. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: First Two Authors Share Equal Contribution. 19 Pages, 4 Figures

arXiv:2407.09892 [pdf, other]

NamedCurves: Learned Image Enhancement via Color Naming

Authors: David Serrano-Lozano, Luis Herranz, Michael S. Brown, Javier Vazquez-Corral

Abstract: A popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blu… ▽ More A popular method for enhancing images involves learning the style of a professional photo editor using pairs of training images comprised of the original input with the editor-enhanced version. When manipulating images, many editing tools offer a feature that allows the user to manipulate a limited selection of familiar colors. Editing by color name allows easy adjustment of elements like the "blue" of the sky or the "green" of trees. Inspired by this approach to color manipulation, we propose NamedCurves, a learning-based image enhancement technique that separates the image into a small set of named colors. Our method learns to globally adjust the image for each specific named color via tone curves and then combines the images using an attention-based fusion mechanism to mimic spatial editing. We demonstrate the effectiveness of our method against several competing methods on the well-known Adobe 5K dataset and the PPR10K dataset, showing notable improvements. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: European Conference on Computer Vision ECCV 2024

arXiv:2406.07358 [pdf, other]

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Authors: Teun van der Weij, Felix Hofstätter, Ollie Jaffe, Samuel F. Brown, Francis Rhys Ward

Abstract: Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging $\unicode{x2013}$ which we define as "strategic underper… ▽ More Trustworthy capability evaluations are crucial for ensuring the safety of AI systems, and are becoming a key component of AI regulation. However, the developers of an AI system, or the AI system itself, may have incentives for evaluations to understate the AI's actual capability. These conflicting interests lead to the problem of sandbagging $\unicode{x2013}$ which we define as "strategic underperformance on an evaluation". In this paper we assess sandbagging capabilities in contemporary language models (LMs). We prompt frontier LMs, like GPT-4 and Claude 3 Opus, to selectively underperform on dangerous capability evaluations, while maintaining performance on general (harmless) capability evaluations. Moreover, we find that models can be fine-tuned, on a synthetic dataset, to hide specific capabilities unless given a password. This behaviour generalizes to high-quality, held-out benchmarks such as WMDP. In addition, we show that both frontier and smaller models can be prompted, or password-locked, to target specific scores on a capability evaluation. Even more, we found that a capable password-locked model (Llama 3 70b) is reasonably able to emulate a less capable model (Llama 2 7b). Overall, our results suggest that capability evaluations are vulnerable to sandbagging. This vulnerability decreases the trustworthiness of evaluations, and thereby undermines important safety decisions regarding the development and deployment of advanced AI systems. △ Less

Submitted 14 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.09733 [pdf, other]

SCI 3.0: A Web-based Schema Curation Interface for Graphical Event Representations

Authors: Reece Suchocki, Mary Martin, Martha Palmer, Susan Brown

Abstract: To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to ou… ▽ More To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to our approach is the Schema Curation Interface 3.0 (SCI 3.0), a web application that facilitates real-time editing of event schema properties within a generated graph e.g., adding, removing, or editing sub-events, entities, and relations directly through an interface. △ Less

Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.16244 [pdf, other]

The Ethics of Advanced AI Assistants

Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders. △ Less

Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15058 [pdf, other]

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Authors: Seliem El-Sayed, Canfer Akbulut, Amanda McCroskery, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt, Sasha Brown

Abstract: Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high… ▽ More Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.07185 [pdf, other]

Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Authors: Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown

Abstract: Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This pap… ▽ More Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery △ Less

Submitted 15 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: In proceedings of the International Symposium on Medical Robotics (ISMR) 2024. Equal contribution from two first authors

arXiv:2404.04241 [pdf, other]

Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks

Authors: Jordan Thompson, Brian Y. Cho, Daniel S. Brown, Alan Kuntz

Abstract: Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a… ▽ More Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a mixture density network to output a Gaussian mixture model representation of the robot geometry given the current tendon displacements. This model computes a probability distribution that is more representative of the true distribution of geometries at a given configuration than a model that outputs a single geometry, while also reducing the computation time. We demonstrate one use of this model through a trajectory optimization method that explicitly reasons about the workspace uncertainty to minimize the probability of collision. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02164 [pdf, other]

Exploring Correlation Patterns in the Ethereum Validator Network

Authors: Simon Brown, Leonardo Bautista-Gomez

Abstract: There have been several studies into measuring the level of decentralization in Ethereum through applying various indices to indicate the relative dominance of entities in different domains in the ecosystem. However, these indices do not capture any correlation between those different entities, that could potentially make them the subject of external coercion, or covert collusion. We propose an in… ▽ More There have been several studies into measuring the level of decentralization in Ethereum through applying various indices to indicate the relative dominance of entities in different domains in the ecosystem. However, these indices do not capture any correlation between those different entities, that could potentially make them the subject of external coercion, or covert collusion. We propose an index that measures the relative dominance of entities based on the application of correlation factors. We posit that this approach produces a more nuanced and accurate index of decentralization. △ Less

Submitted 22 March, 2024; originally announced April 2024.

Comments: 11 pages, 7 figures, 3 tables

arXiv:2403.13793 [pdf, other]

Evaluating Frontier Models for Dangerous Capabilities

Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models. △ Less

Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.02431 [pdf, other]

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

Authors: Dimitris Papadimitriou, Daniel S. Brown

Abstract: It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian… ▽ More It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.14908 [pdf, ps, other]

doi 10.1145/3630106.3658957

A Framework for Assurance Audits of Algorithmic Systems

Authors: Khoa Lam, Benjamin Lange, Borhane Blili-Hamelin, Jovana Davidovic, Shea Brown, Ali Hasan

Abstract: An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operatio… ▽ More An increasing number of regulations propose AI audits as a mechanism for achieving transparency and accountability for artificial intelligence (AI) systems. Despite some converging norms around various forms of AI auditing, auditing for the purpose of compliance and assurance currently lacks agreed-upon practices, procedures, taxonomies, and standards. We propose the criterion audit as an operationalizable compliance and assurance external audit framework. We model elements of this approach after financial auditing practices, and argue that AI audits should similarly provide assurance to their stakeholders about AI organizations' ability to govern their algorithms in ways that mitigate harms and uphold human values. We discuss the necessary conditions for the criterion audit and provide a procedural blueprint for performing an audit engagement in practice. We illustrate how this framework can be adapted to current regulations by deriving the criteria on which bias audits can be performed for in-scope hiring algorithms, as required by the recently effective New York City Local Law 144 of 2021. We conclude by offering a critical discussion on the benefits, inherent limitations, and implementation challenges of applying practices of the more mature financial auditing industry to AI auditing where robust guardrails against quality assurance issues are only starting to emerge. Our discussion -- informed by experiences in performing these audits in practice -- highlights the critical role that an audit ecosystem plays in ensuring the effectiveness of audits. △ Less

Submitted 28 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Journal ref: The 2024 ACM Conference on Fairness, Accountability, and Transparency

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.03093 [pdf, other]

RESIN-EDITOR: A Schema-guided Hierarchical Event Graph Visualizer and Editor

Authors: Khanh Duy Nguyen, Zixuan Zhang, Reece Suchocki, Sha Li, Martha Palmer, Susan Brown, Jiawei Han, Heng Ji

Abstract: In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, com… ▽ More In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events. Our RESIN-EDITOR system allows users to render and freely edit hierarchical event graphs extracted from multimedia and multi-document news clusters with guidance from human-curated event schemas. RESIN-EDITOR's unique features include hierarchical graph visualization, comprehensive source tracing, and interactive user editing, which is more powerful and versatile than existing Information Extraction (IE) visualization tools. In our evaluation of RESIN-EDITOR, we demonstrate ways in which our tool is effective in understanding complex events and enhancing system performance. The source code, a video demonstration, and a live website for RESIN-EDITOR have been made publicly available. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: The first two authors contribute equally to this paper

arXiv:2311.06989 [pdf]

Creating a Discipline-specific Commons for Infectious Disease Epidemiology

Authors: Michael M. Wagner, William Hogan, John Levander, Adam Darr, Matt Diller, Max Sibilla, Alexander T. Loiacono. Terence Sperringer, Jr., Shawn T. Brown

Abstract: Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potenti… ▽ More Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 12 pages, 6 figures

arXiv:2310.16941 [pdf, other]

Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

Authors: Connor Mattson, Jeremy C. Clark, Daniel S. Brown

Abstract: We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this… ▽ More We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 9 figures, To be published in Proceedings IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2023)

arXiv:2310.10610 [pdf, other]

Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

Authors: Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan

Abstract: Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to… ▽ More Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to human motions that are unlikely to occur during natural interactions with people. A robot policy might fail under small adversarial perturbations but work under large natural perturbations. We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance. We introduce RIGID, a method for constructing this frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like (as measured by a discriminator). On an Assistive Gym task, we use RIGID to analyze the performance of standard collaborative Reinforcement Learning, as well as the performance of existing methods meant to increase robustness. We also compare the frontier RIGID identifies with the failures identified in expert adversarial interaction, and with naturally-occurring failures during user interaction. Overall, we find evidence that RIGID can provide a meaningful measure of robustness predictive of deployment performance, and uncover failure cases in human-robot interaction that are difficult to find manually. https://ood-human.github.io. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.11408 [pdf, other]

Indirect Swarm Control: Characterization and Analysis of Emergent Swarm Behaviors

Authors: Ricardo Vega, Connor Mattson, Daniel S. Brown, Cameron Nowzari

Abstract: Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for ch… ▽ More Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for characterizing the conditions that lead to different macrostates and how to predict/analyze their macroscopic properties, allowing us to indirectly engineer the same behaviors from the bottom up by tuning their environmental conditions rather than local interaction rules. We then apply this framework to a simple system of binary sensing and acting agents as an example to see if a re-framing of this swarms problem can help us push the state of the art forward. By first creating some working definitions of macrostates in a particular swarm system, we show how agent-based modeling may be combined with control theory to enable a generalized understanding of controllable emergent processes without needing to simulate everything. Whereas phase diagrams can generally only be created through Monte Carlo simulations or sweeping through ranges of parameters in a simulator, we develop closed-form functions that can immediately produce them revealing an infinite set of swarm parameter combinations that can lead to a specifically chosen self-organized behavior. While the exact methods are still under development, we believe simply laying out a potential path towards solutions that have evaded our traditional methods using a novel method is worth considering. Our results are characterized through both simulations and real experiments on ground robots. △ Less

Submitted 28 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: 8 pages, 13 figures, submitted to IROS 2024 conference

arXiv:2309.04542 [pdf, other]

Examining Autoexposure for Challenging Scenes

Authors: SaiKiran Tedla, Beixuan Yang, Michael S. Brown

Abstract: Autoexposure (AE) is a critical step applied by camera systems to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with constant illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especia… ▽ More Autoexposure (AE) is a critical step applied by camera systems to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with constant illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especially those with time-varying lighting, is the lack of suitable image datasets. To address this issue, we have captured a new 4D exposure dataset that provides a large solution space (i.e., shutter speed range from (1/500 to 15 seconds) over a temporal sequence with moving objects, bright lights, and varying lighting. In addition, we have designed a software platform to allow AE algorithms to be used in a plug-and-play manner with the dataset. Our dataset and associate platform enable repeatable evaluation of different AE algorithms and provide a much-needed starting point to develop better AE methods. We examine several existing AE strategies using our dataset and show that most users prefer a simple saliency method for challenging lighting conditions. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: ICCV 2023

arXiv:2307.10026 [pdf, other]

Contextual Reliability: When Different Features Matter in Different Contexts

Authors: Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, Aditi Raghunathan

Abstract: Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we canno… ▽ More Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: ICML 2023 Camera Ready Version

arXiv:2306.13004 [pdf, other]

Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?

Authors: Akansha Kalra, Daniel S. Brown

Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if the… ▽ More Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if these frameworks are actually aligned with human preferences. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We also provide experimental evidence that not only shows that reward DDTs can often achieve competitive RL performance when compared with larger capacity deep neural network reward functions but also demonstrates the diagnostic utility of our framework in checking alignment of learned reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards. Videos and code, are available at: https://sites.google.com/view/ddt-rlhf △ Less

Submitted 24 June, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: Accepted at RLC 2024

arXiv:2306.11920 [pdf, other]

NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement

Authors: Marcos V. Conde, Javier Vazquez-Corral, Michael S. Brown, Radu Timofte

Abstract: 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet n… ▽ More 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet not so memory-efficient, as storing multiple 3D LUTs is required. For this reason and other implementation limitations, their use on mobile devices is less popular. In this work, we propose a Neural Implicit LUT (NILUT), an implicitly defined continuous 3D color transformation parameterized by a neural network. We show that NILUTs are capable of accurately emulating real 3D LUTs. Moreover, a NILUT can be extended to incorporate multiple styles into a single network with the ability to blend styles implicitly. Our novel approach is memory-efficient, controllable and can complement previous methods, including learned ISPs. Code, models and dataset available at: https://github.com/mv-lab/nilut △ Less

Submitted 24 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: AAAI 2024 - The 38th Annual AAAI Conference on Artificial Intelligence

arXiv:2306.02183 [pdf]

doi 10.1038/s41592-024-02237-2

brainlife.io: A decentralized and open source cloud platform to support neuroscience research

Authors: Soichi Hayashi, Bradley A. Caron, Anibal Sólon Heinsfeld, Sophia Vinci-Booher, Brent McPherson, Daniel N. Bullock, Giulia Bertò, Guiomar Niso, Sandra Hanekamp, Daniel Levitas, Kimberly Ray, Anne MacKenzie, Lindsey Kitchell, Josiah K. Leong, Filipi Nascimento-Silva, Serge Koudoro, Hanna Willis, Jasleen K. Jolly, Derek Pisner, Taylor R. Zuidema, Jan W. Kurzawski, Kyriaki Mikellidou, Aurore Bussalb, Christopher Rorden, Conner Victory , et al. (39 additional authors not shown)

Abstract: Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to red… ▽ More Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research. △ Less

Submitted 11 August, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

arXiv:2305.16148 [pdf, other]

doi 10.1145/3583131.3590443

Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms

Authors: Connor Mattson, Daniel S. Brown

Abstract: Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand… ▽ More Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand what behaviors are interesting or even possible. Our proposed approach adapts to user preferences by learning a similarity space over swarm collective behaviors using self-supervised learning and human-in-the-loop queries. We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors. We also propose several general-purpose heuristics that improve the efficiency of our novelty search by prioritizing robot controllers that are likely to lead to interesting emergent behaviors. We test our approach in simulation on two robot capability models and show that our methods consistently discover a richer set of emergent behaviors than prior work. Code, videos, and datasets are available at https://sites.google.com/view/evolving-novel-swarms. △ Less

Submitted 16 July, 2023; v1 submitted 25 April, 2023; originally announced May 2023.

Comments: 13 pages, 10 figures, To be published in Proceedings Genetic and Evolutionary Computation Conference (GECCO 2023)

arXiv:2305.14600 [pdf, other]

Learning Semantic Role Labeling from Compatible Label Sequences

Authors: Tao Li, Ghazaleh Kazeminejad, Susan W. Brown, Martha Palmer, Vivek Srikumar

Abstract: Semantic role labeling (SRL) has multiple disjoint label sets, e.g., VerbNet and PropBank. Creating these datasets is challenging, therefore a natural question is how to use each one to help the other. Prior work has shown that cross-task interaction helps, but only explored multitask learning so far. A common issue with multi-task setup is that argument sequences are still separately decoded, run… ▽ More Semantic role labeling (SRL) has multiple disjoint label sets, e.g., VerbNet and PropBank. Creating these datasets is challenging, therefore a natural question is how to use each one to help the other. Prior work has shown that cross-task interaction helps, but only explored multitask learning so far. A common issue with multi-task setup is that argument sequences are still separately decoded, running the risk of generating structurally inconsistent label sequences (as per lexicons like Semlink). In this paper, we eliminate such issue with a framework that jointly models VerbNet and PropBank labels as one sequence. In this setup, we show that enforcing Semlink constraints during decoding constantly improves the overall F1. With special input constructions, our joint model infers VerbNet arguments from given PropBank arguments with over 99 F1. For learning, we propose a constrained marginal model that learns with knowledge defined in Semlink to further benefit from the large amounts of PropBank-only data. On the joint benchmark based on CoNLL05, our models achieve state-of-the-art F1's, outperforming the prior best in-domain model by 3.5 (VerbNet) and 0.8 (PropBank). For out-of-domain generalization, our models surpass the prior best by 3.4 (VerbNet) and 0.2 (PropBank). △ Less

Submitted 19 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at Findings of EMNLP 2023

arXiv:2304.14095 [pdf, other]

Securing Autonomous Air Traffic Management: Blockchain Networks Driven by Explainable AI

Authors: Louise Axon, Dimitrios Panagiotakopoulos, Samuel Ayo, Carolina Sanchez-Hernandez, Yan Zong, Simon Brown, Lei Zhang, Michael Goldsmith, Sadie Creese, Weisi Guo

Abstract: Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure… ▽ More Air Traffic Management data systems today are inefficient and not scalable to enable future unmanned systems. Current data is fragmented, siloed, and not easily accessible. There is data conflict, misuse, and eroding levels of trust in provenance and accuracy. With increased autonomy in aviation, Artificially Intelligent (AI) enabled unmanned traffic management (UTM) will be more reliant on secure data from diverse stakeholders. There is an urgent need to develop a secure network that has trustworthy data chains and works with the requirements generated by UTM. Here, we review existing research in 3 key interconnected areas: (1) blockchain development for secure data transfer between competing aviation stakeholders, (2) self-learning networking architectures that distribute consensus to achieve secure air traffic control, (3) explainable AI to build trust with human stakeholders and backpropagate requirements for blockchain and network optimisation. When connected together, this new digital ecosystem blueprint is tailored for safety critical UTM sectors. We motivate the readers with a case study, where a federated learning UTM uses real air traffic and weather data is secured and explained to human operators. This emerging area still requires significant research and development by the community to ensure it can enable future autonomous air mobility. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: under review in IEEE

arXiv:2304.13907 [pdf, other]

Network Analysis as a Tool for Shaping Conservation and Development Policy: A Case Study of Timber Market Optimization in India

Authors: Xiou Ge, Sarah E. Brown, Pushpendra Rana, Lav R. Varshney, Daniel C. Miller

Abstract: The incorporation of trees on farms can help to improve livelihoods and build resilience among small-holder farmers in developing countries. On-farm trees can help gen- erate additional income from commercial tree harvest as well as contribute significant environmental benefits and ecosystem services to increase resiliency. Long-term benefits from tree-based livelihoods, however, depend on sustain… ▽ More The incorporation of trees on farms can help to improve livelihoods and build resilience among small-holder farmers in developing countries. On-farm trees can help gen- erate additional income from commercial tree harvest as well as contribute significant environmental benefits and ecosystem services to increase resiliency. Long-term benefits from tree-based livelihoods, however, depend on sustainable timber harvesting. In this paper, we discuss the potential for network analysis as a tool to inform conservation and development decision-making. Specifically, we formulate the commercial tree market between farmers and traders as a transportation problem and optimize the transactions. We create a model of the commercial tree market in the Bilaspur district of Himachal Pradesh, India based on a detailed dataset of market interactions between farmers and timber traders, using the existing road network of this region. Using this model, we perform a maximum-flow-minimum-cost optimization for tree commodity flow. We compare the results of our optimized model with actual flow within the network, and we find a high potential to increase efficiency of market transactions within this region, noting a significant reduction to the minimum- cost flow value for our optimized model compared to the flow cost for actual transactions. We propose that using this network flow optimization model to strategically distribute permits can reduce costs associated with market transactions. Our results suggest that this direct policy action would be beneficial to the region. Finally, we suggest that cost savings could be used to establish tree planting programs to support a long-term sustainable tree market. Shaping policies to address these market inefficiencies in developing regions could help support and elevate tree-based livelihoods for farmers, traders, and industries. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Paper accepted to proceedings of the 5th Data for Good Exchange (D4GX)

arXiv:2304.11743 [pdf, other]

GamutMLP: A Lightweight MLP for Color Loss Recovery

Authors: Hoang M. Le, Brian Price, Scott Cohen, Michael S. Brown

Abstract: Cameras and image-editing software often process images in the wide-gamut ProPhoto color space, encompassing 90% of all visible colors. However, when images are encoded for sharing, this color-rich representation is transformed and clipped to fit within the small-gamut standard RGB (sRGB) color space, representing only 30% of visible colors. Recovering the lost color information is challenging due… ▽ More Cameras and image-editing software often process images in the wide-gamut ProPhoto color space, encompassing 90% of all visible colors. However, when images are encoded for sharing, this color-rich representation is transformed and clipped to fit within the small-gamut standard RGB (sRGB) color space, representing only 30% of visible colors. Recovering the lost color information is challenging due to the clipping procedure. Inspired by neural implicit representations for 2D images, we propose a method that optimizes a lightweight multi-layer-perceptron (MLP) model during the gamut reduction step to predict the clipped values. GamutMLP takes approximately 2 seconds to optimize and requires only 23 KB of storage. The small memory footprint allows our GamutMLP model to be saved as metadata in the sRGB image -- the model can be extracted when needed to restore wide-gamut color values. We demonstrate the effectiveness of our approach for color recovery and compare it with alternative strategies, including pre-trained DNN-based gamut expansion networks and other implicit neural representation methods. As part of this effort, we introduce a new color gamut dataset of 2200 wide-gamut/small-gamut images for training and testing. Our code and dataset can be found on the project website: https://gamut-mlp.github.io. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2302.13048 [pdf, other]

doi 10.18653/v1/2023.acl-demo.1

Human-in-the-Loop Schema Induction

Authors: Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou, Hainiu Xu, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch

Abstract: Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic el… ▽ More Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 10 pages, ACL2023 demo track

arXiv:2301.04741 [pdf, other]

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Authors: Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

Abstract: Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we stu… ▽ More Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction. Our paper provides empirical evidence that learned dynamics models enable robots to learn customized policies based on user preferences in ways that are safer and more sample efficient than prior preference learning approaches. Supplementary materials and code are available at https://sites.google.com/berkeley.edu/mop-rl. △ Less

Submitted 9 February, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: In proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)

arXiv:2301.01392 [pdf, other]

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Authors: Daniel Shin, Anca D. Dragan, Daniel S. Brown

Abstract: Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization… ▽ More Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization via offline RL, our observation is that it can be a surprisingly rich source of information for preference learning as well. We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning, learns a distribution over reward functions, and optimizes a corresponding policy via offline RL. Crucially, our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps. To test our approach, we first evaluate existing offline RL benchmarks for their suitability for offline reward learning. Surprisingly, for many offline RL domains, we find that simply using a trivial reward function results good policy performance, making these domains ill-suited for evaluating learned rewards. To address this, we identify a subset of existing offline RL benchmarks that are well suited for offline reward learning and also propose new offline apprenticeship learning benchmarks which allow for more open-ended behaviors. When evaluated on this curated set of domains, our empirical results suggest that combining offline RL with learned human preferences can enable an agent to learn to perform novel tasks that were not explicitly shown in the offline data. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Transactions on Machine Learning Research. arXiv admin note: text overlap with arXiv:2107.09251

arXiv:2301.00810 [pdf, other]

doi 10.1145/3568162.3576989

SIRL: Similarity-based Implicit Representation Learning

Authors: Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan

Abstract: When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that co… ▽ More When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives. △ Less

Submitted 17 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: 12 pages, 6 figures, HRI 2023

arXiv:2212.03175 [pdf, other]

Learning Representations that Enable Generalization in Assistive Tasks

Authors: Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan

Abstract: Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Su… ▽ More Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2212.00951 [pdf, other]

SimpleMind adds thinking to deep neural networks

Authors: Youngwon Choi, M. Wasil Wahi-Anwar, Matthew S. Brown

Abstract: Deep neural networks (DNNs) detect patterns in data and have shown versatility and strong performance in many computer vision applications. However, DNNs alone are susceptible to obvious mistakes that violate simple, common sense concepts and are limited in their ability to use explicit knowledge to guide their search and decision making. While overall DNN performance metrics may be good, these ob… ▽ More Deep neural networks (DNNs) detect patterns in data and have shown versatility and strong performance in many computer vision applications. However, DNNs alone are susceptible to obvious mistakes that violate simple, common sense concepts and are limited in their ability to use explicit knowledge to guide their search and decision making. While overall DNN performance metrics may be good, these obvious errors, coupled with a lack of explainability, have prevented widespread adoption for crucial tasks such as medical image analysis. The purpose of this paper is to introduce SimpleMind, an open-source software framework for Cognitive AI focused on medical image understanding. It allows creation of a knowledge base that describes expected characteristics and relationships between image objects in an intuitive human-readable form. The SimpleMind framework brings thinking to DNNs by: (1) providing methods for reasoning with the knowledge base about image content, such as spatial inferencing and conditional reasoning to check DNN outputs; (2) applying process knowledge, in the form of general-purpose software agents, that are chained together to accomplish image preprocessing, DNN prediction, and result post-processing, and (3) performing automatic co-optimization of all knowledge base parameters to adapt agents to specific problems. SimpleMind enables reasoning on multiple detected objects to ensure consistency, providing cross checking between DNN outputs. This machine reasoning improves the reliability and trustworthiness of DNNs through an interpretable model and explainable decisions. Example applications are provided that demonstrate how SimpleMind supports and improves deep neural networks by embedding them within a Cognitive AI framework. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.15542 [pdf]

doi 10.1145/3610977.3634984

Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

Authors: Tu Trinh, Haoyu Chen, Daniel S. Brown

Abstract: We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high… ▽ More We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the human's unobserved reward function, and (2) percent improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation for both discrete and continuous state-space domains and illustrate the feasibility of developing a robotic system that can accurately evaluate demonstration sufficiency. We also show that the robot can utilize active learning in asking for demonstrations from specific states which results in fewer demos needed for the robot to still maintain high confidence in its policy. Finally, via a user study, we show that our approach successfully enables robots to perform at users' desired performance levels, without needing too many or perfectly optimal demonstrations. △ Less

Submitted 2 January, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Prior version appears in proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)". Current version appears in proceedings of HRI '24, March 11-14, 2024, Boulder, CO, USA

arXiv:2211.08772 [pdf, other]

MIMT: Multi-Illuminant Color Constancy via Multi-Task Local Surface and Light Color Learning

Authors: Shuwei Li, Jikai Wang, Michael S. Brown, Robby T. Tan

Abstract: The assumption of a uniform light color distribution is no longer applicable in scenes that have multiple light colors. Most color constancy methods are designed to deal with a single light color, and thus are erroneous when applied to multiple light colors. The spatial variability in multiple light colors causes the color constancy problem to be more challenging and requires the extraction of loc… ▽ More The assumption of a uniform light color distribution is no longer applicable in scenes that have multiple light colors. Most color constancy methods are designed to deal with a single light color, and thus are erroneous when applied to multiple light colors. The spatial variability in multiple light colors causes the color constancy problem to be more challenging and requires the extraction of local surface/light information. Motivated by this, we introduce a multi-task learning method to discount multiple light colors in a single input image. To have better cues of the local surface/light colors under multiple light color conditions, we design a novel multi-task learning framework. Our framework includes auxiliary tasks of achromatic-pixel detection and surface-color similarity prediction, providing better cues for local light and surface colors, respectively. Moreover, to ensure that our model maintains the constancy of surface colors regardless of the variations of light colors, a novel local surface color feature preservation scheme is developed. We demonstrate that our model achieves 47.1% improvement (from 4.69 mean angular error to 2.48) compared to a state-of-the-art multi-illuminant color constancy method on a multi-illuminant dataset (LSMI). △ Less

Submitted 22 August, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 8 pages, 6 figures

arXiv:2211.03711 [pdf, other]

Inpainting in discrete Sobolev spaces: structural information for uncertainty reduction

Authors: Marco Seracini, Stephen R. Brown

Abstract: In this article, using an exemplar-based approach, we investigate the inpainting problem, introducing a new mathematical functional, whose minimization determines the quality of the reconstructions. The new functional expression takes into account of fnite differences terms, in a similar fashion to what happens in the theoretical Sobolev spaces. Moreover, we introduce a new priority index to deter… ▽ More In this article, using an exemplar-based approach, we investigate the inpainting problem, introducing a new mathematical functional, whose minimization determines the quality of the reconstructions. The new functional expression takes into account of fnite differences terms, in a similar fashion to what happens in the theoretical Sobolev spaces. Moreover, we introduce a new priority index to determine the scanning order of the points to inpaint, prioritizing the uncertainty reduction in the choice. The achieved results highlight important theoretical-connected aspects of the inpainting by patch procedure. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 45 pages, 45 figures

MSC Class: 68U10 ACM Class: I.4; I.4.0; I.5

arXiv:2210.07432 [pdf, other]

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Authors: Albert Wilcox, Ashwin Balakrishna, Jules Dedieu, Wyame Benslimane, Daniel S. Brown, Ken Goldberg

Abstract: Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, p… ▽ More Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified $Q$-value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in the neighborhood of high-performing trajectories by encouraging high $Q$-values in corresponding regions of the state space. Experiments across $5$ continuous control domains suggest that MCAC can be used to significantly increase learning efficiency across $6$ commonly used RL and RL-from-demonstrations algorithms. See https://sites.google.com/view/mcac-rl for code and supplementary material. △ Less

Submitted 20 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: To be published in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). 19 pages. 11 figures

arXiv:2209.15543 [pdf, other]

Bayesian Neural Networks for Geothermal Resource Assessment: Prediction with Uncertainty

Authors: Stephen Brown, William L. Rodi, Marco Seracini, Chen Gu, Michael Fehler, James Faulds, Connor M. Smith, Sven Treitel

Abstract: We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) an… ▽ More We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) and negative training sites (known drill sites with unsuitable geothermal conditions) and use these to constrain and optimize artificial neural networks for this classification task. The main objective is to predict the geothermal resource potential at unknown sites within a large geographic area where the defining features are known. These predictions could be used to target promising areas for further detailed investigations. We describe the evolution of our work from defining a specific neural network architecture to training and optimization trials. Upon analysis we expose the inevitable problems of model variability and resulting prediction uncertainty. Finally, to address these problems we apply the concept of Bayesian neural networks, a heuristic approach to regularization in network training, and make use of the practical interpretation of the formal uncertainty measures they provide. △ Less

Submitted 25 October, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 27 pages, 12 figures

arXiv:2209.01291 [pdf, other]

doi 10.1145/3508352.3549369

Don't CWEAT It: Toward CWE Analysis Techniques in Early Stages of Hardware Design

Authors: Baleegh Ahmad, Wei-Kai Liu, Luca Collini, Hammond Pearce, Jason M. Fung, Jonathan Valamehr, Mohammad Bidmeshki, Piotr Sapiecha, Steve Brown, Krishnendu Chakrabarty, Ramesh Karri, Benjamin Tan

Abstract: To help prevent hardware security vulnerabilities from propagating to later design stages where fixes are costly, it is crucial to identify security concerns as early as possible, such as in RTL designs. In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code t… ▽ More To help prevent hardware security vulnerabilities from propagating to later design stages where fixes are costly, it is crucial to identify security concerns as early as possible, such as in RTL designs. In this work, we investigate the practical implications and feasibility of producing a set of security-specific scanners that operate on Verilog source files. The scanners indicate parts of code that might contain one of a set of MITRE's common weakness enumerations (CWEs). We explore the CWE database to characterize the scope and attributes of the CWEs and identify those that are amenable to static analysis. We prototype scanners and evaluate them on 11 open source designs - 4 system-on-chips (SoC) and 7 processor cores - and explore the nature of identified weaknesses. Our analysis reported 53 potential weaknesses in the OpenPiton SoC used in Hack@DAC-21, 11 of which we confirmed as security concerns. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.10687 [pdf, other]

The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

Authors: Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan

Abstract: When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, o… ▽ More When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, or quality, of human feedback. However, in many settings, giving one type of feedback (e.g. a demonstration) may be much more difficult than a different type of feedback (e.g. answering a comparison query). Thus, we expect to see more or less noise depending on the type of human feedback. In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning. We test this in both simulated experiments and in a user study with real human feedback. We find that overestimating human rationality can have dire effects on reward learning accuracy and regret. We also find that fitting the rationality coefficient to human data enables better reward learning, even when the human deviates significantly from the noisy-rational choice model due to systematic biases. Further, we find that the rationality level affects the informativeness of each feedback type: surprisingly, demonstrations are not always the most informative -- when the human acts very suboptimally, comparisons actually become more informative, even when the rationality level is the same for both. Ultimately, our results emphasize the importance and advantage of paying attention to the assumed human-rationality level, especially when agents actively learn from multiple types of human feedback. △ Less

Submitted 9 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: Published at AAAI 2023; 10 pages, 5 figures plus appendices

arXiv:2207.03070 [pdf, other]

Reservoir Computing with 3D Nanowire Networks

Authors: R. K. Daniels, J. B. Mallinson, Z. E. Heywood, P. J. Bones, M. D. Arnold, S. A. Brown

Abstract: Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the… ▽ More Networks of nanowires are currently being explored for a range of applications in brain-like (or neuromorphic) computing, and especially in reservoir computing (RC). Fabrication of real-world computing devices requires that the nanowires are deposited sequentially, leading to stacking of the wires on top of each other. However, most simulations of computational tasks using these systems treat the nanowires as 1D objects lying in a perfectly 2D plane - the effect of stacking on RC performance has not yet been established. Here we use detailed simulations to compare the performance of perfectly 2D and quasi-3D (stacked) networks of nanowires in two tasks: memory capacity and nonlinear transformation. We also show that our model of the junctions between nanowires is general enough to describe a wide range of memristive networks, and consider the impact of physically realistic electrode configurations on performance. We show that the various networks and configurations have a strikingly similar performance in RC tasks, which is surprising given their radically different topologies. Our results show that networks with an experimentally achievable number of electrodes perform close to the upper bounds achievable when using the information from every wire. However, we also show important differences, in particular that the quasi-3D networks are more resilient to changes in the input parameters, generalizing better to noisy training data. Since previous literature suggests that topology plays an important role in computing performance, these results may have important implications for future applications of nanowire networks in neuromorphic computing. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2207.00911 [pdf, other]

Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies

Authors: Satvik Sharma, Ellen Novoseller, Vainavi Viswanath, Zaynah Javed, Rishi Parikh, Ryan Hoque, Ashwin Balakrishna, Daniel S. Brown, Ken Goldberg

Abstract: Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behav… ▽ More Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behaviors on physical hardware. On the other hand, excessive training in simulation can cause policies to overfit to the visual appearance and dynamics of the simulator. In this work, we study strategies to automatically determine when policies trained in simulation can be reliably transferred to a physical robot. We specifically study these ideas in the context of robotic fabric manipulation, in which successful sim2real transfer is especially challenging due to the difficulties of precisely modeling the dynamics and visual appearance of fabric. Results in a fabric smoothing task suggest that our switching criteria correlate well with performance in real. In particular, our confidence-based switching criteria achieve average final fabric coverage of 87.2-93.7% within 55-60% of the total training budget. See https://tinyurl.com/lsc-case for code and supplemental materials. △ Less

Submitted 2 July, 2022; originally announced July 2022.

Comments: CASE 2022. The first two authors contributed equally. 9 pages; 5 figures; 1 table

arXiv:2206.02715 [pdf, other]

Day-to-Night Image Synthesis for Training Nighttime Neural ISPs

Authors: Abhijith Punnappurath, Abdullah Abuolaim, Abdelrahman Abdelhamed, Alex Levinshtein, Michael S. Brown

Abstract: Many flagship smartphone cameras now use a dedicated neural image signal processor (ISP) to render noisy raw sensor images to the final processed output. Training nightmode ISP networks relies on large-scale datasets of image pairs with: (1) a noisy raw image captured with a short exposure and a high ISO gain; and (2) a ground truth low-noise raw image captured with a long exposure and low ISO tha… ▽ More Many flagship smartphone cameras now use a dedicated neural image signal processor (ISP) to render noisy raw sensor images to the final processed output. Training nightmode ISP networks relies on large-scale datasets of image pairs with: (1) a noisy raw image captured with a short exposure and a high ISO gain; and (2) a ground truth low-noise raw image captured with a long exposure and low ISO that has been rendered through the ISP. Capturing such image pairs is tedious and time-consuming, requiring careful setup to ensure alignment between the image pairs. In addition, ground truth images are often prone to motion blur due to the long exposure. To address this problem, we propose a method that synthesizes nighttime images from daytime images. Daytime images are easy to capture, exhibit low-noise (even on smartphone cameras) and rarely suffer from motion blur. We outline a processing framework to convert daytime raw images to have the appearance of realistic nighttime raw images with different levels of noise. Our procedure allows us to easily produce aligned noisy and clean nighttime image pairs. We show the effectiveness of our synthesis framework by training neural ISPs for nightmode rendering. Furthermore, we demonstrate that using our synthetic nighttime images together with small amounts of real data (e.g., 5% to 10%) yields performance almost on par with training exclusively on real nighttime images. Our dataset and code are available at https://github.com/SamsungLabs/day-to-night. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2206.01813 [pdf, other]

Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata

Authors: Seonghyeon Nam, Abhijith Punnappurath, Marcus A. Brubaker, Michael S. Brown

Abstract: Most camera images are rendered and saved in the standard RGB (sRGB) format by the camera's hardware. Due to the in-camera photo-finishing routines, nonlinear sRGB images are undesirable for computer vision tasks that assume a direct relationship between pixel values and scene radiance. For such applications, linear raw-RGB sensor images are preferred. Saving images in their raw-RGB format is stil… ▽ More Most camera images are rendered and saved in the standard RGB (sRGB) format by the camera's hardware. Due to the in-camera photo-finishing routines, nonlinear sRGB images are undesirable for computer vision tasks that assume a direct relationship between pixel values and scene radiance. For such applications, linear raw-RGB sensor images are preferred. Saving images in their raw-RGB format is still uncommon due to the large storage requirement and lack of support by many imaging applications. Several "raw reconstruction" methods have been proposed that utilize specialized metadata sampled from the raw-RGB image at capture time and embedded in the sRGB image. This metadata is used to parameterize a mapping function to de-render the sRGB image back to its original raw-RGB format when needed. Existing raw reconstruction methods rely on simple sampling strategies and global mapping to perform the de-rendering. This paper shows how to improve the de-rendering results by jointly learning sampling and reconstruction. Our experiments show that our learned sampling can adapt to the image content to produce better raw reconstructions than existing methods. We also describe an online fine-tuning strategy for the reconstruction network to improve results further. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: CVPR 2022 (GitHub: https://github.com/SamsungLabs/content-aware-metadata)

arXiv:2206.01103 [pdf, other]

Noise2NoiseFlow: Realistic Camera Noise Modeling without Clean Images

Authors: Ali Maleky, Shayan Kousha, Michael S. Brown, Marcus A. Brubaker

Abstract: Image noise modeling is a long-standing problem with many applications in computer vision. Early attempts that propose simple models, such as signal-independent additive white Gaussian noise or the heteroscedastic Gaussian noise model (a.k.a., camera noise level function) are not sufficient to learn the complex behavior of the camera sensor noise. Recently, more complex learning-based models have… ▽ More Image noise modeling is a long-standing problem with many applications in computer vision. Early attempts that propose simple models, such as signal-independent additive white Gaussian noise or the heteroscedastic Gaussian noise model (a.k.a., camera noise level function) are not sufficient to learn the complex behavior of the camera sensor noise. Recently, more complex learning-based models have been proposed that yield better results in noise synthesis and downstream tasks, such as denoising. However, their dependence on supervised data (i.e., paired clean images) is a limiting factor given the challenges in producing ground-truth images. This paper proposes a framework for training a noise model and a denoiser simultaneously while relying only on pairs of noisy images rather than noisy/clean paired image data. We apply this framework to the training of the Noise Flow architecture. The noise synthesis and density estimation results show that our framework outperforms previous signal-processing-based noise models and is on par with its supervised counterpart. The trained denoiser is also shown to significantly improve upon both supervised and weakly supervised baseline denoising approaches. The results indicate that the joint training of a denoiser and a noise model yields significant improvements in the denoiser. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: CVPR 2022

arXiv:2206.00812 [pdf, other]

Modeling sRGB Camera Noise with Normalizing Flows

Authors: Shayan Kousha, Ali Maleky, Michael S. Brown, Marcus A. Brubaker

Abstract: Noise modeling and reduction are fundamental tasks in low-level computer vision. They are particularly important for smartphone cameras relying on small sensors that exhibit visually noticeable noise. There has recently been renewed interest in using data-driven approaches to improve camera noise models via neural networks. These data-driven approaches target noise present in the raw-sensor image… ▽ More Noise modeling and reduction are fundamental tasks in low-level computer vision. They are particularly important for smartphone cameras relying on small sensors that exhibit visually noticeable noise. There has recently been renewed interest in using data-driven approaches to improve camera noise models via neural networks. These data-driven approaches target noise present in the raw-sensor image before it has been processed by the camera's image signal processor (ISP). Modeling noise in the RAW-rgb domain is useful for improving and testing the in-camera denoising algorithm; however, there are situations where the camera's ISP does not apply denoising or additional denoising is desired when the RAW-rgb domain image is no longer available. In such cases, the sensor noise propagates through the ISP to the final rendered image encoded in standard RGB (sRGB). The nonlinear steps on the ISP culminate in a significantly more complex noise distribution in the sRGB domain and existing raw-domain noise models are unable to capture the sRGB noise distribution. We propose a new sRGB-domain noise model based on normalizing flows that is capable of learning the complex noise distribution found in sRGB images under various ISO levels. Our normalizing flows-based approach outperforms other models by a large margin in noise modeling and synthesis tasks. We also show that image denoisers trained on noisy images synthesized with our noise model outperforms those trained with noise from baselines models. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: CVPR 2022

arXiv:2205.04221 [pdf, other]

doi 10.1145/3531146.3533157

The Forgotten Margins of AI Ethics

Authors: Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventresque, Christopher L. Dancy

Abstract: How has recent AI Ethics literature addressed topics such as fairness and justice in the context of continued social and structural power asymmetries? We trace both the historical roots and current landmark work that have been shaping the field and categorize these works under three broad umbrellas: (i) those grounded in Western canonical philosophy, (ii) mathematical and statistical methods, and… ▽ More How has recent AI Ethics literature addressed topics such as fairness and justice in the context of continued social and structural power asymmetries? We trace both the historical roots and current landmark work that have been shaping the field and categorize these works under three broad umbrellas: (i) those grounded in Western canonical philosophy, (ii) mathematical and statistical methods, and (iii) those emerging from critical data/algorithm/information studies. We also survey the field and explore emerging trends by examining the rapidly growing body of literature that falls under the broad umbrella of AI Ethics. To that end, we read and annotated peer-reviewed papers published over the past four years in two premier conferences: FAccT and AIES. We organize the literature based on an annotation scheme we developed according to three main dimensions: whether the paper deals with concrete applications, use-cases, and/or people's lived experience; to what extent it addresses harmed, threatened, or otherwise marginalized groups; and if so, whether it explicitly names such groups. We note that although the goals of the majority of FAccT and AIES papers were often commendable, their consideration of the negative impacts of AI on traditionally marginalized groups remained shallow. Taken together, our conceptual analysis and the data from annotated papers indicate that the field would benefit from an increased focus on ethical analysis grounded in concrete use-cases, people's experiences, and applications as well as from approaches that are sensitive to structural and historical power asymmetries. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: To appear in the FAccT 2022 proceedings

arXiv:2204.06601 [pdf, other]

Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

Authors: Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown

Abstract: Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification w… ▽ More Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences. In particular, we perform a series of sensitivity and ablation analyses on several benchmark domains where rewards learned from preferences achieve minimal test error but fail to generalize to out-of-distribution states -- resulting in poor policy performance when optimized. We find that the presence of non-causal distractor features, noise in the stated preferences, and partial state observability can all exacerbate reward misidentification. We also identify a set of methods with which to interpret misidentified learned rewards. In general, we observe that optimizing misidentified rewards drives the policy off the reward's training distribution, resulting in high predicted (learned) rewards but low true rewards. These findings illuminate the susceptibility of preference learning to reward misidentification and causal confusion -- failure to consider even one of many factors can result in unexpected, undesirable behavior. △ Less

Submitted 18 March, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: In the proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023). https://iclr.cc/virtual/2023/poster/10822

Showing 1–50 of 114 results for author: Brown, S