Search | arXiv e-print repository

Learning Run-time Safety Monitors for Machine Learning Components

Authors: Ozan Vardal, Richard Hawkins, Colin Paterson, Chiara Picardi, Daniel Omeiza, Lars Kunze, Ibrahim Habli

Abstract: For machine learning components used as part of autonomous systems (AS) in carrying out critical tasks it is crucial that assurance of the models can be maintained in the face of post-deployment changes (such as changes in the operating environment of the system). A critical part of this is to be able to monitor when the performance of the model at runtime (as a result of changes) poses a safety r… ▽ More For machine learning components used as part of autonomous systems (AS) in carrying out critical tasks it is crucial that assurance of the models can be maintained in the face of post-deployment changes (such as changes in the operating environment of the system). A critical part of this is to be able to monitor when the performance of the model at runtime (as a result of changes) poses a safety risk to the system. This is a particularly difficult challenge when ground truth is unavailable at runtime. In this paper we introduce a process for creating safety monitors for ML components through the use of degraded datasets and machine learning. The safety monitor that is created is deployed to the AS in parallel to the ML component to provide a prediction of the safety risk associated with the model output. We demonstrate the viability of our approach through some initial experiments using publicly available speed sign datasets. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2312.04917 [pdf]

doi 10.1007/978-3-031-49266-2_10

Operationalizing Assurance Cases for Data Scientists: A Showcase of Concepts and Tooling in the Context of Test Data Quality for Machine Learning

Authors: Lisa Jöckel, Michael Kläs, Janek Groß, Pascal Gerber, Markus Scholz, Jonathan Eberle, Marc Teschner, Daniel Seifert, Richard Hawkins, John Molloy, Jens Ottnad

Abstract: Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way. In the context of quality assurance for Machine Learning (ML)-based software components, ACs are also being discussed and appear promising. Tools for operationalizing ACs do exist, yet mainly focus on supporting safety engineers on the system level. However, assuring the quality of… ▽ More Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way. In the context of quality assurance for Machine Learning (ML)-based software components, ACs are also being discussed and appear promising. Tools for operationalizing ACs do exist, yet mainly focus on supporting safety engineers on the system level. However, assuring the quality of an ML component within the system is commonly the responsibility of data scientists, who are usually less familiar with these tools. To address this gap, we propose a framework to support the operationalization of ACs for ML components based on technologies that data scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to make the process of creating ML-related evidence in ACs more effective. Results from the application of the framework, documented through notebooks, can be integrated into existing AC tools. We illustrate the application of the framework on an example excerpt concerned with the quality of the test data. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted for publication at International Conference on Product-Focused Software Process Improvement (Profes 2023), https://conf.researchr.org/home/profes-2023

arXiv:2311.09665 [pdf, other]

The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

Authors: Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human gro… ▽ More Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence. △ Less

Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09618 [pdf, other]

Simulating Opinion Dynamics with Networks of LLM-based Agents

Authors: Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings re… ▽ More Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings reveal a strong inherent bias in LLM agents towards producing accurate information, leading simulated agents to consensus in line with scientific reality. This bias limits their utility for understanding resistance to consensus views on issues like climate change. After inducing confirmation bias through prompt engineering, however, we observed opinion fragmentation in line with existing agent-based modeling and opinion dynamics research. These insights highlight the promise and limitations of LLM agents in this domain and suggest a path forward: refining LLMs with real-world discourse to better simulate the evolution of human beliefs. △ Less

Submitted 31 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.11614 [pdf, other]

Learning a Hierarchical Planner from Humans in Multiple Generations

Authors: Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Josh Tenenbaum, Armando Solar-Lezama

Abstract: A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for… ▽ More A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for granted, they are brittle when the contexts change, making it difficult to adapt complex programs to new contexts. We present natural programming, a library learning system that combines programmatic learning with a hierarchical planner. Natural programming maintains a library of decompositions, consisting of a goal, a linguistic description of how this goal decompose into sub-goals, and a concrete instance of its decomposition into sub-goals. A user teaches the system via curriculum building, by identifying a challenging yet not impossible goal along with linguistic hints on how this goal may be decomposed into sub-goals. The system solves for the goal via hierarchical planning, using the linguistic hints to guide its probability distribution in proposing the right plans. The system learns from this interaction by adding newly found decompositions in the successful search into its library. Simulated studies and a human experiment (n=360) on a controlled environment demonstrate that natural programming can robustly compose programs learned from different users and contexts, adapting faster and solving more complex tasks when compared to programmatic baselines. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: First two authors contributed equally

arXiv:2306.03882 [pdf, other]

Causal interventions expose implicit situation models for commonsense language understanding

Authors: Takateru Yamakoshi, James L. McClelland, Adele E. Goldberg, Robert D. Hawkins

Abstract: Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively sma… ▽ More Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution. △ Less

Submitted 7 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: Findings of ACL

arXiv:2305.07151 [pdf, other]

Overinformative Question Answering by Humans and Machines

Authors: Polina Tsvilodub, Michael Franke, Robert D. Hawkins, Noah D. Goodman

Abstract: When faced with a polar question, speakers often provide overinformative answers going beyond a simple "yes" or "no". But what principles guide the selection of additional information? In this paper, we provide experimental evidence from two studies suggesting that overinformativeness in human answering is driven by considerations of relevance to the questioner's goals which they flexibly adjust g… ▽ More When faced with a polar question, speakers often provide overinformative answers going beyond a simple "yes" or "no". But what principles guide the selection of additional information? In this paper, we provide experimental evidence from two studies suggesting that overinformativeness in human answering is driven by considerations of relevance to the questioner's goals which they flexibly adjust given the functional context in which the question is uttered. We take these human results as a strong benchmark for investigating question-answering performance in state-of-the-art neural language models, conducting an extensive evaluation on items from human experiments. We find that most models fail to adjust their answering behavior in a human-like way and tend to include irrelevant information. We show that GPT-3 is highly sensitive to the form of the prompt and only achieves human-like answer patterns when guided by an example and cognitively-motivated explanation. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 7 pages, 2 figures, to appear in the Proceedings of the 45th Annual Conference of the Cognitive Science Society (2023)

arXiv:2305.06539 [pdf, other]

Semantic uncertainty guides the extension of conventions to new referents

Authors: Ron Eliav, Anya Ji, Yoav Artzi, Robert D. Hawkins

Abstract: A long tradition of studies in psycholinguistics has examined the formation and generalization of ad hoc conventions in reference games, showing how newly acquired conventions for a given target transfer to new referential contexts. However, another axis of generalization remains understudied: how do conventions formed for one target transfer to completely distinct targets, when specific lexical c… ▽ More A long tradition of studies in psycholinguistics has examined the formation and generalization of ad hoc conventions in reference games, showing how newly acquired conventions for a given target transfer to new referential contexts. However, another axis of generalization remains understudied: how do conventions formed for one target transfer to completely distinct targets, when specific lexical choices are unlikely to repeat? This paper presents two dyadic studies (N = 240) that address this axis of generalization, focusing on the role of nameability -- the a priori likelihood that two individuals will share the same label. We leverage the recently-released KiloGram dataset, a collection of abstract tangram images that is orders of magnitude larger than previously available, exhibiting high diversity of properties like nameability. Our first study asks how nameability shapes convention formation, while the second asks how new conventions generalize to entirely new targets of reference. Our results raise new questions about how ad hoc conventions extend beyond target-specific re-use of specific lexical choices. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: Proceedings of the 45th Annual Conference of the Cognitive Science Society

arXiv:2212.00869 [pdf, other]

Flexible social inference facilitates targeted social learning when rewards are not observable

Authors: Robert D. Hawkins, Andrew M. Berdahl, Alex "Sandy" Pentland, Joshua B. Tenenbaum, Noah D. Goodman, P. M. Krafft

Abstract: Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable… ▽ More Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable trajectories of behavior. We compared our social inference model against simpler heuristics in three studies of human behavior in a collective sensing task. In Experiment 1, we found that average performance improves as a function of group size at a rate greater than predicted by non-inferential models. Experiment 2 introduced artificial agents to evaluate how individuals selectively rely on social information. Experiment 3 generalized these findings to a more complex reward landscape. Taken together, our findings provide insight into the relationship between individual social cognition and the flexibility of collective behavior. △ Less

Submitted 5 August, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Nature Human Behaviour

arXiv:2211.16492 [pdf, other]

Abstract Visual Reasoning with Tangram Shapes

Authors: Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi

Abstract: We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to i… ▽ More We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to include segmentation maps and part labels. We use this resource to evaluate the abstract visual reasoning capacities of recent multi-modal models. We observe that pre-trained weights demonstrate limited abstract reasoning, which dramatically improves with fine-tuning. We also observe that explicitly describing parts aids abstract reasoning for both humans and models, especially when jointly encoding the linguistic and visual inputs. KiloGram is available at https://lil.nlp.cornell.edu/kilogram . △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: EMNLP 2022 long paper

arXiv:2211.04530 [pdf, other]

Creating a Safety Assurance Case for an ML Satellite-Based Wildfire Detection and Alert System

Authors: Richard Hawkins, Chiara Picardi, Lucy Donnell, Murray Ireland

Abstract: Wildfires are a common problem in many areas of the world with often catastrophic consequences. A number of systems have been created to provide early warnings of wildfires, including those that use satellite data to detect fires. The increased availability of small satellites, such as CubeSats, allows the wildfire detection response time to be reduced by deploying constellations of multiple satel… ▽ More Wildfires are a common problem in many areas of the world with often catastrophic consequences. A number of systems have been created to provide early warnings of wildfires, including those that use satellite data to detect fires. The increased availability of small satellites, such as CubeSats, allows the wildfire detection response time to be reduced by deploying constellations of multiple satellites over regions of interest. By using machine learned components on-board the satellites, constraints which limit the amount of data that can be processed and sent back to ground stations can be overcome. There are hazards associated with wildfire alert systems, such as failing to detect the presence of a wildfire, or detecting a wildfire in the incorrect location. It is therefore necessary to be able to create a safety assurance case for the wildfire alert ML component that demonstrates it is sufficiently safe for use. This paper describes in detail how a safety assurance case for an ML wildfire alert system is created. This represents the first fully developed safety case for an ML component containing explicit argument and evidence as to the safety of the machine learning. △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2208.00853 [pdf, other]

Guidance on the Safety Assurance of Autonomous Systems in Complex Environments (SACE)

Authors: Richard Hawkins, Matt Osborne, Mike Parsons, Mark Nicholson, John McDermid, Ibrahim Habli

Abstract: Autonomous systems (AS) are systems that have the capability to take decisions free from direct human control. AS are increasingly being considered for adoption for applications where their behaviour may cause harm, such as when used for autonomous driving, medical applications or in domestic environments. For such applications, being able to ensure and demonstrate (assure) the safety of the opera… ▽ More Autonomous systems (AS) are systems that have the capability to take decisions free from direct human control. AS are increasingly being considered for adoption for applications where their behaviour may cause harm, such as when used for autonomous driving, medical applications or in domestic environments. For such applications, being able to ensure and demonstrate (assure) the safety of the operation of the AS is crucial for their adoption. This can be particularly challenging where AS operate in complex and changing real-world environments. Establishing justified confidence in the safety of AS requires the creation of a compelling safety case. This document introduces a methodology for the Safety Assurance of Autonomous Systems in Complex Environments (SACE). SACE comprises a set of safety case patterns and a process for (1) systematically integrating safety assurance into the development of the AS and (2) for generating the evidence base for explicitly justifying the acceptable safety of the AS. △ Less

Submitted 1 August, 2022; originally announced August 2022.

ACM Class: D.2.0

arXiv:2206.07870 [pdf, other]

How to talk so AI will learn: Instructions, descriptions, and autonomy

Authors: Theodore R Sumers, Robert D Hawkins, Mark K Ho, Thomas L Griffiths, Dylan Hadfield-Menell

Abstract: From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such language use. To address this challenge, we formalize learning from language in a contextual bandit setting and ask how a human might… ▽ More From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such language use. To address this challenge, we formalize learning from language in a contextual bandit setting and ask how a human might communicate preferences over behaviors. We study two distinct types of language: $\textit{instructions}$, which provide information about the desired policy, and $\textit{descriptions}$, which provide information about the reward function. We show that the agent's degree of autonomy determines which form of language is optimal: instructions are better in low-autonomy settings, but descriptions are better when the agent will need to act independently. We then define a pragmatic listener agent that robustly infers the speaker's reward function by reasoning about $\textit{how}$ the speaker expresses themselves. We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans' reward functions. Finally, we show that this form of social learning can integrate with and reduce regret in traditional reinforcement learning. We hope these insights facilitate a shift from developing agents that $\textit{obey}$ language to agents that $\textit{learn}$ from it. △ Less

Submitted 10 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: 10 pages, 5 figures. Published as a conference paper at NeurIPS 2022

arXiv:2205.11558 [pdf, other]

Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

Authors: Sreejan Kumar, Carlos G. Correa, Ishita Dasgupta, Raja Marjieh, Michael Y. Hu, Robert D. Hawkins, Nathaniel D. Daw, Jonathan D. Cohen, Karthik Narasimhan, Thomas L. Griffiths

Abstract: Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs… ▽ More Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs induced to generate such tasks guides them toward more human-like inductive biases. Human-generated language descriptions and program induction models that add new learned primitives both contain abstract concepts that can compress description length. Co-training on these representations result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without learned primitives), suggesting that the abstraction supported by these representations is key. △ Less

Submitted 5 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), winner of Outstanding Paper Award

arXiv:2205.05666 [pdf, other]

Identifying concept libraries from language about object structure

Authors: Catherine Wong, William P. McCarthy, Gabriel Grand, Yoni Friedman, Joshua B. Tenenbaum, Jacob Andreas, Robert D. Hawkins, Judith E. Fan

Abstract: Our understanding of the visual world goes beyond naming objects, encompassing our ability to parse objects into meaningful parts, attributes, and relations. In this work, we leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use and the principles leading these parts to be favored over others. We formalize our problem as sear… ▽ More Our understanding of the visual world goes beyond naming objects, encompassing our ability to parse objects into meaningful parts, attributes, and relations. In this work, we leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use and the principles leading these parts to be favored over others. We formalize our problem as search over a space of program libraries that contain different part concepts, using tools from machine translation to evaluate how well programs expressed in each library align to human language. By combining naturalistic language at scale with structured program representations, we discover a fundamental information-theoretic tradeoff governing the part concepts people name: people favor a lexicon that allows concise descriptions of each object, while also minimizing the size of the lexicon itself. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Appears in the conference proceedings of CogSci 2022

arXiv:2205.01749 [pdf, other]

Mixed-effects transformers for hierarchical adaptation

Authors: Julia White, Noah Goodman, Robert Hawkins

Abstract: Language use differs dramatically from context to context. To some degree, modern language models like GPT-3 are able to account for such variance by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this paper, w… ▽ More Language use differs dramatically from context to context. To some degree, modern language models like GPT-3 are able to account for such variance by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes -- lightweight modules prepended to the input -- to account for structured variation. Specifically, we show how the popular class of mixed-effects models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. We evaluate this approach on several domain-adaptation benchmarks, finding that it efficiently adapts to novel contexts with minimal data while still effectively generalizing to unseen contexts. △ Less

Submitted 8 December, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2204.05091 [pdf, other]

Linguistic communication as (inverse) reward design

Authors: Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths, Dylan Hadfield-Menell

Abstract: Natural language is an intuitive and expressive way to communicate reward information to autonomous agents. It encompasses everything from concrete instructions to abstract descriptions of the world. Despite this, natural language is often challenging to learn from: it is difficult for machine learning methods to make appropriate inferences from such a wide range of input. This paper proposes a ge… ▽ More Natural language is an intuitive and expressive way to communicate reward information to autonomous agents. It encompasses everything from concrete instructions to abstract descriptions of the world. Despite this, natural language is often challenging to learn from: it is difficult for machine learning methods to make appropriate inferences from such a wide range of input. This paper proposes a generalization of reward design as a unifying principle to ground linguistic communication: speakers choose utterances to maximize expected rewards from the listener's future behaviors. We first extend reward design to incorporate reasoning about unknown future states in a linear bandit setting. We then define a speaker model which chooses utterances according to this objective. Simulations show that short-horizon speakers (reasoning primarily about a single, known state) tend to use instructions, while long-horizon speakers (reasoning primarily about unknown, future states) tend to describe the reward function. We then define a pragmatic listener which performs inverse reward design by jointly inferring the speaker's latent horizon and rewards. Our findings suggest that this extension of reward design to linguistic communication, including the notion of a latent speaker horizon, is a promising direction for achieving more robust alignment outcomes from natural language supervision. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: 6 pages, 3 figures. Accepted at Learning from Natural Language Supervision workshop (ACL 2022)

arXiv:2203.05830 [pdf, other]

Analysing Ultra-Wide Band Positioning for Geofencing in a Safety Assurance Context

Authors: Victoria Hodge, Richard Hawkins, James Hilder, Ibrahim Habli

Abstract: There is a desire to move towards more flexible and automated factories. To enable this, we need to assure the safety of these dynamic factories. This safety assurance must be achieved in a manner that does not unnecessarily constrain the systems and thus negate the benefits of flexibility and automation. We previously developed a modular safety assurance approach, using safety contracts, as a way… ▽ More There is a desire to move towards more flexible and automated factories. To enable this, we need to assure the safety of these dynamic factories. This safety assurance must be achieved in a manner that does not unnecessarily constrain the systems and thus negate the benefits of flexibility and automation. We previously developed a modular safety assurance approach, using safety contracts, as a way to achieve this. In this case study we show how this approach can be applied to Autonomous Guided Vehicles (AGV) operating as part of a dynamic factory and why it is necessary. We empirically evaluate commercial, indoor fog/edge localisation technology to provide geofencing for hazardous areas in a laboratory. The experiments determine how factors such as AGV speeds, tag transmission timings, control software and AGV capabilities affect the ability of the AGV to stop outside the hazardous areas. We describe how this approach could be used to create a safety case for the AGV operation. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2202.12226 [pdf, other]

Probing BERT's priors with serial reproduction chains

Authors: Takateru Yamakoshi, Thomas L. Griffiths, Robert D. Hawkins

Abstract: Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated… ▽ More Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT's priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than other methods in a large corpus of naturalness judgments. Our findings establish a firmer theoretical foundation for bottom-up probing and highlight richer deviations from human priors. △ Less

Submitted 18 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: Findings of ACL 2022

arXiv:2112.03799 [pdf, other]

A pragmatic account of the weak evidence effect

Authors: Samuel A. Barnett, Thomas L. Griffiths, Robert D. Hawkins

Abstract: Language is not only used to transmit neutral information; we often seek to persuade by arguing in favor of a particular view. Persuasion raises a number of challenges for classical accounts of belief updating, as information cannot be taken at face value. How should listeners account for a speaker's "hidden agenda" when incorporating new information? Here, we extend recent probabilistic models of… ▽ More Language is not only used to transmit neutral information; we often seek to persuade by arguing in favor of a particular view. Persuasion raises a number of challenges for classical accounts of belief updating, as information cannot be taken at face value. How should listeners account for a speaker's "hidden agenda" when incorporating new information? Here, we extend recent probabilistic models of recursive social reasoning to allow for persuasive goals and show that our model provides a pragmatic account for why weakly favorable arguments may backfire, a phenomenon known as the weak evidence effect. Critically, this model predicts a systematic relationship between belief updates and expectations about the information source: weak evidence should only backfire when speakers are expected to act under persuasive goals and prefer the strongest evidence. We introduce a simple experimental paradigm called the Stick Contest to measure the extent to which the weak evidence effect depends on speaker expectations, and show that a pragmatic listener model accounts for the empirical data better than alternative models. Our findings suggest further avenues for rational models of social reasoning to illuminate classical decision-making phenomena. △ Less

Submitted 13 September, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: in press at Open Mind

arXiv:2110.09779 [pdf, other]

Open-domain clarification question generation without question examples

Authors: Julia White, Gabriel Poesia, Robert Hawkins, Dorsa Sadigh, Noah Goodman

Abstract: An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans. However, natural language can be ambiguous or unclear. In cases of uncertainty, humans engage in an interactive process known as repair: asking questions and seeking clarification until their uncertainty is resolved. We propose a framework for building a visually grounded question-asking… ▽ More An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans. However, natural language can be ambiguous or unclear. In cases of uncertainty, humans engage in an interactive process known as repair: asking questions and seeking clarification until their uncertainty is resolved. We propose a framework for building a visually grounded question-asking model capable of producing polar (yes-no) clarification questions to resolve misunderstandings in dialogue. Our model uses an expected information gain objective to derive informative questions from an off-the-shelf image captioner without requiring any supervised question-answer data. We demonstrate our model's ability to pose questions that improve communicative success in a goal-oriented 20 questions game with synthetic and human answerers. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: EMNLP 2021

arXiv:2109.13861 [pdf, other]

Visual resemblance and communicative context constrain the emergence of graphical conventions

Authors: Robert D. Hawkins, Megumi Sano, Noah D. Goodman, Judith E. Fan

Abstract: From photorealistic sketches to schematic diagrams, drawing provides a versatile medium for communicating about the visual world. How do images spanning such a broad range of appearances reliably convey meaning? Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images), or do they understand drawings based on shared but arbitrary associat… ▽ More From photorealistic sketches to schematic diagrams, drawing provides a versatile medium for communicating about the visual world. How do images spanning such a broad range of appearances reliably convey meaning? Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images), or do they understand drawings based on shared but arbitrary associations with these entities (i.e., as symbols)? In this paper, we provide evidence for a cognitive account of pictorial meaning in which both visual and social information is integrated to support effective visual communication. To evaluate this account, we used a communication task where pairs of participants used drawings to repeatedly communicate the identity of a target object among multiple distractor objects. We manipulated social cues across three experiments and a full internal replication, finding pairs of participants develop referent-specific and interaction-specific strategies for communicating more efficiently over time, going beyond what could be explained by either task practice or a pure resemblance-based account alone. Using a combination of model-based image analyses and crowdsourced sketch annotations, we further determined that drawings did not drift toward arbitrariness, as predicted by a pure convention-based account, but systematically preserved those visual features that were most distinctive of the target object. Taken together, these findings advance theories of pictorial meaning and have implications for how successful graphical conventions emerge via complex interactions between visual perception, communicative experience, and social context. △ Less

Submitted 17 September, 2021; originally announced September 2021.

Comments: 26 pages; 8 figures; submitted version of manuscript

arXiv:2107.00077 [pdf, other]

Learning to communicate about shared procedural abstractions

Authors: William P. McCarthy, Robert D. Hawkins, Haoliang Wang, Cameron Holdaway, Judith E. Fan

Abstract: Many real-world tasks require agents to coordinate their behavior to achieve shared goals. Successful collaboration requires not only adopting the same communicative conventions, but also grounding these conventions in the same task-appropriate conceptual abstractions. We investigate how humans use natural language to collaboratively solve physical assembly problems more effectively over time. Hum… ▽ More Many real-world tasks require agents to coordinate their behavior to achieve shared goals. Successful collaboration requires not only adopting the same communicative conventions, but also grounding these conventions in the same task-appropriate conceptual abstractions. We investigate how humans use natural language to collaboratively solve physical assembly problems more effectively over time. Human participants were paired up in an online environment to reconstruct scenes containing two block towers. One participant could see the target towers, and sent assembly instructions for the other participant to reconstruct. Participants provided increasingly concise instructions across repeated attempts on each pair of towers, using higher-level referring expressions that captured each scene's hierarchical structure. To explain these findings, we extend recent probabilistic models of ad-hoc convention formation with an explicit perceptual learning mechanism. These results shed light on the inductive biases that enable intelligent agents to coordinate upon shared procedural abstractions. △ Less

Submitted 30 June, 2021; originally announced July 2021.

arXiv:2106.01729 [pdf]

doi 10.1007/978-3-319-66972-4_13

DEIS: Dependability Engineering Innovation for Industrial CPS

Authors: Erik Armengaud, Georg Macher, Alexander Massoner, Sebastian Frager, Rasmus Adler, Daniel Schneider, Simone Longo, Massimiliano Melis, Riccardo Groppo, Federica Villa, Padraig OLeary, Kevin Bambury, Finnegan Anita, Marc Zeller, Kai Hoefig, Yiannis Papadopoulos, Richard Hawkins, Tim Kelly

Abstract: The open and cooperative nature of Cyber-Physical Systems (CPS) poses new challenges in assuring dependability. The DEIS project (Dependability Engineering Innovation for automotive CPS. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 732242, see http://www.deis-project.eu) addresses these challenges by developing… ▽ More The open and cooperative nature of Cyber-Physical Systems (CPS) poses new challenges in assuring dependability. The DEIS project (Dependability Engineering Innovation for automotive CPS. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 732242, see http://www.deis-project.eu) addresses these challenges by developing technologies that form a science of dependable system integration. In the core of these technologies lies the concept of a Digital Dependability Identity (DDI) of a component or system. DDIs are modular, composable, and executable in the field facilitating (a) efficient synthesis of component and system dependability information over the supply chain and (b) effective evaluation of this information in-the-field for safe and secure composition of highly distributed and autonomous CPS. The paper outlines the DDI concept and opportunities for application in four industrial use cases. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2105.11950 [pdf, other]

Extending rational models of communication from beliefs to actions

Authors: Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths

Abstract: Speakers communicate to influence their partner's beliefs and shape their actions. Belief- and action-based objectives have been explored independently in recent computational models, but it has been challenging to explicitly compare or integrate them. Indeed, we find that they are conflated in standard referential communication tasks. To distinguish these accounts, we introduce a new paradigm cal… ▽ More Speakers communicate to influence their partner's beliefs and shape their actions. Belief- and action-based objectives have been explored independently in recent computational models, but it has been challenging to explicitly compare or integrate them. Indeed, we find that they are conflated in standard referential communication tasks. To distinguish these accounts, we introduce a new paradigm called signaling bandits, generalizing classic Lewis signaling games to a multi-armed bandit setting where all targets in the context have some relative value. We develop three speaker models: a belief-oriented speaker with a purely informative objective; an action-oriented speaker with an instrumental objective; and a combined speaker which integrates the two by inducing listener beliefs that generally lead to desirable actions. We then present a series of simulations demonstrating that grounding production choices in future listener actions results in relevance effects and flexible uses of nonliteral language. More broadly, our findings suggest that language games based on richer decision problems are a promising avenue for insight into rational communication. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: 7 pages, 4 figures. Proceedings for the 43rd Annual Meeting of the Cognitive Science Society

arXiv:2105.06546 [pdf, other]

Shades of confusion: Lexical uncertainty modulates ad hoc coordination in an interactive communication task

Authors: Sonia K. Murthy, Thomas L. Griffiths, Robert D. Hawkins

Abstract: There is substantial variability in the expectations that communication partners bring into interactions, creating the potential for misunderstandings. To directly probe these gaps and our ability to overcome them, we propose a communication task based on color-concept associations. In Experiment 1, we establish several key properties of the mental representations of these expectations, or lexical… ▽ More There is substantial variability in the expectations that communication partners bring into interactions, creating the potential for misunderstandings. To directly probe these gaps and our ability to overcome them, we propose a communication task based on color-concept associations. In Experiment 1, we establish several key properties of the mental representations of these expectations, or lexical priors, based on recent probabilistic theories. Associations are more variable for abstract concepts, variability is represented as uncertainty within each individual, and uncertainty enables accurate predictions about whether others are likely to share the same association. In Experiment 2, we then examine the downstream consequences of these representations for communication. Accuracy is initially low when communicating about concepts with more variable associations, but rapidly increases as participants form ad hoc conventions. Together, our findings suggest that people cope with variability by maintaining well-calibrated uncertainty about their partner and appropriately adaptable representations of their own. △ Less

Submitted 26 April, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: in press at Cognition

arXiv:2104.05857 [pdf, other]

From partners to populations: A hierarchical Bayesian account of coordination and convention

Authors: Robert D. Hawkins, Michael Franke, Michael C. Frank, Adele E. Goldberg, Kenny Smith, Thomas L. Griffiths, Noah D. Goodman

Abstract: Languages are powerful solutions to coordination problems: they provide stable, shared expectations about how the words we say correspond to the beliefs and intentions in our heads. Yet language use in a variable and non-stationary social environment requires linguistic representations to be flexible: old words acquire new ad hoc or partner-specific meanings on the fly. In this paper, we introduce… ▽ More Languages are powerful solutions to coordination problems: they provide stable, shared expectations about how the words we say correspond to the beliefs and intentions in our heads. Yet language use in a variable and non-stationary social environment requires linguistic representations to be flexible: old words acquire new ad hoc or partner-specific meanings on the fly. In this paper, we introduce CHAI (Continual Hierarchical Adaptation through Inference), a hierarchical Bayesian theory of coordination and convention formation that aims to reconcile the long-standing tension between these two basic observations. We argue that the central computational problem of communication is not simply transmission, as in classical formulations, but continual learning and adaptation over multiple timescales. Partner-specific common ground quickly emerges from social inferences within dyadic interactions, while community-wide social conventions are stable priors that have been abstracted away from interactions with multiple partners. We present new empirical data alongside simulations showing how our model provides a computational foundation for several phenomena that have posed a challenge for previous accounts: (1) the convergence to more efficient referring expressions across repeated interaction with the same partner, (2) the gradual transfer of partner-specific common ground to strangers, and (3) the influence of communicative context on which conventions eventually form. △ Less

Submitted 2 December, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: In press at Psychological Review

arXiv:2102.01564 [pdf, other]

Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS)

Authors: Richard Hawkins, Colin Paterson, Chiara Picardi, Yan Jia, Radu Calinescu, Ibrahim Habli

Abstract: Machine Learning (ML) is now used in a range of systems with results that are reported to exceed, under certain conditions, human performance. Many of these systems, in domains such as healthcare , automotive and manufacturing, exhibit high degrees of autonomy and are safety critical. Establishing justified confidence in ML forms a core part of the safety case for these systems. In this document w… ▽ More Machine Learning (ML) is now used in a range of systems with results that are reported to exceed, under certain conditions, human performance. Many of these systems, in domains such as healthcare , automotive and manufacturing, exhibit high degrees of autonomy and are safety critical. Establishing justified confidence in ML forms a core part of the safety case for these systems. In this document we introduce a methodology for the Assurance of Machine Learning for use in Autonomous Systems (AMLAS). AMLAS comprises a set of safety case patterns and a process for (1) systematically integrating safety assurance into the development of ML components and (2) for generating the evidence base for explicitly justifying the acceptable safety of these components when integrated into autonomous system applications. △ Less

Submitted 2 February, 2021; originally announced February 2021.

arXiv:2010.02375 [pdf, other]

Investigating representations of verb bias in neural language models

Authors: Robert D. Hawkins, Takateru Yamakoshi, Thomas L. Griffiths, Adele E. Goldberg

Abstract: Languages typically provide more than one grammatical construction to express certain types of messages. A speaker's choice of construction is known to depend on multiple factors, including the choice of main verb -- a phenomenon known as \emph{verb bias}. Here we introduce DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternati… ▽ More Languages typically provide more than one grammatical construction to express certain types of messages. A speaker's choice of construction is known to depend on multiple factors, including the choice of main verb -- a phenomenon known as \emph{verb bias}. Here we introduce DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments. We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences. Results show that larger models perform better than smaller models, and transformer architectures (e.g. GPT-2) tend to out-perform recurrent architectures (e.g. LSTMs) even under comparable parameter and training settings. Additional analyses of internal feature representations suggest that transformers may better integrate specific lexical information with grammatical constructions. △ Less

Submitted 15 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to EMNLP

arXiv:2009.14715 [pdf, other]

Learning Rewards from Linguistic Feedback

Authors: Theodore R. Sumers, Mark K. Ho, Robert D. Hawkins, Karthik Narasimhan, Thomas L. Griffiths

Abstract: We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, using aspect-based sentiment analysis to decompose feedback into sentiment about… ▽ More We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, using aspect-based sentiment analysis to decompose feedback into sentiment about the features of a Markov decision process. We then perform an analogue of inverse reinforcement learning, regressing the sentiment on the features to infer the teacher's latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards. We then repeat our initial experiment and pair them with human teachers. All three successfully learn from interactive human feedback. The sentiment models outperform the inference network, with the "pragmatic" model approaching human performance. Our work thus provides insight into the information structure of naturalistic linguistic feedback as well as methods to leverage it for reinforcement learning. △ Less

Submitted 3 July, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

Comments: 9 pages, 4 figures. AAAI '21

arXiv:2005.08381 [pdf]

doi 10.1136/bmjhci-2020-100165

Enhancing Covid-19 Decision-Making by Creating an Assurance Case for Simulation Models

Authors: Ibrahim Habli, Rob Alexander, Richard Hawkins, Mark Sujan, John McDermid, Chiara Picardi, Tom Lawton

Abstract: Simulation models have been informing the COVID-19 policy-making process. These models, therefore, have significant influence on risk of societal harms. But how clearly are the underlying modelling assumptions and limitations communicated so that decision-makers can readily understand them? When making claims about risk in safety-critical systems, it is common practice to produce an assurance case… ▽ More Simulation models have been informing the COVID-19 policy-making process. These models, therefore, have significant influence on risk of societal harms. But how clearly are the underlying modelling assumptions and limitations communicated so that decision-makers can readily understand them? When making claims about risk in safety-critical systems, it is common practice to produce an assurance case, which is a structured argument supported by evidence with the aim to assess how confident we should be in our risk-based decisions. We argue that any COVID-19 simulation model that is used to guide critical policy decisions would benefit from being supported with such a case to explain how, and to what extent, the evidence from the simulation can be relied on to substantiate policy conclusions. This would enable a critical review of the implicit assumptions and inherent uncertainty in modelling, and would give the overall decision-making process greater transparency and accountability. △ Less

Submitted 17 May, 2020; originally announced May 2020.

Comments: 6 pages and 2 figures

Journal ref: BMJ Health & Care Informatics 2020;27:e100165

arXiv:2002.01510 [pdf, other]

Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks

Authors: Robert D. Hawkins, Noah D. Goodman, Adele E. Goldberg, Thomas L. Griffiths

Abstract: A key property of linguistic conventions is that they hold over an entire community of speakers, allowing us to communicate efficiently even with people we have never met before. At the same time, much of our language use is partner-specific: we know that words may be understood differently by different people based on our shared history. This poses a challenge for accounts of convention formation… ▽ More A key property of linguistic conventions is that they hold over an entire community of speakers, allowing us to communicate efficiently even with people we have never met before. At the same time, much of our language use is partner-specific: we know that words may be understood differently by different people based on our shared history. This poses a challenge for accounts of convention formation. Exactly how do agents make the inferential leap to community-wide expectations while maintaining partner-specific knowledge? We propose a hierarchical Bayesian model to explain how speakers and listeners solve this inductive problem. To evaluate our model's predictions, we conducted an experiment where participants played an extended natural-language communication game with different partners in a small community. We examine several measures of generalization and find key signatures of both partner-specificity and community convergence that distinguish our model from alternatives. These results suggest that partner-specificity is not only compatible with the formation of community-wide conventions, but may facilitate it when coupled with a powerful inductive mechanism. △ Less

Submitted 30 May, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

Comments: CogSci 2020

arXiv:1912.07199 [pdf, other]

Characterizing the dynamics of learning in repeated reference games

Authors: Robert D. Hawkins, Michael C. Frank, Noah D. Goodman

Abstract: The language we use over the course of conversation changes as we establish common ground and learn what our partner finds meaningful. Here we draw upon recent advances in natural language processing to provide a finer-grained characterization of the dynamics of this learning process. We release an open corpus (>15,000 utterances) of extended dyadic interactions in a classic repeated reference gam… ▽ More The language we use over the course of conversation changes as we establish common ground and learn what our partner finds meaningful. Here we draw upon recent advances in natural language processing to provide a finer-grained characterization of the dynamics of this learning process. We release an open corpus (>15,000 utterances) of extended dyadic interactions in a classic repeated reference game task where pairs of participants had to coordinate on how to refer to initially difficult-to-describe tangram stimuli. We find that different pairs discover a wide variety of idiosyncratic but efficient and stable solutions to the problem of reference. Furthermore, these conventions are shaped by the communicative context: words that are more discriminative in the initial context (i.e. that are used for one target more than others) are more likely to persist through the final repetition. Finally, we find systematic structure in how a speaker's referring expressions become more efficient over time: syntactic units drop out in clusters following positive feedback from the listener, eventually leaving short labels containing open-class parts of speech. These findings provide a higher resolution look at the quantitative dynamics of ad hoc convention formation and support further development of computational models of learning in communication. △ Less

Submitted 13 April, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

Comments: Accepted at Cognitive Science

arXiv:1911.09896 [pdf, other]

Continual adaptation for efficient machine communication

Authors: Robert D. Hawkins, Minae Kwon, Dorsa Sadigh, Noah D. Goodman

Abstract: To communicate with new partners in new contexts, humans rapidly form new linguistic conventions. Recent neural language models are able to comprehend and produce the existing conventions present in their training data, but are not able to flexibly and interactively adapt those conventions on the fly as humans do. We introduce an interactive repeated reference task as a benchmark for models of ada… ▽ More To communicate with new partners in new contexts, humans rapidly form new linguistic conventions. Recent neural language models are able to comprehend and produce the existing conventions present in their training data, but are not able to flexibly and interactively adapt those conventions on the fly as humans do. We introduce an interactive repeated reference task as a benchmark for models of adaptation in communication and propose a regularized continual learning framework that allows an artificial agent initialized with a generic language model to more accurately and efficiently communicate with a partner over time. We evaluate this framework through simulations on COCO and in real-time reference game experiments with human partners. △ Less

Submitted 13 October, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: Accepted at CoNLL

arXiv:1905.02925 [pdf, other]

ShapeGlot: Learning Language for Shape Differentiation

Authors: Panos Achlioptas, Judy Fan, Robert X. D. Hawkins, Noah D. Goodman, Leonidas J. Guibas

Abstract: In this work we explore how fine-grained differences between the shapes of common objects are expressed in language, grounded on images and 3D models of the objects. We first build a large scale, carefully controlled dataset of human utterances that each refers to a 2D rendering of a 3D CAD model so as to distinguish it from a set of shape-wise similar alternatives. Using this dataset, we develop… ▽ More In this work we explore how fine-grained differences between the shapes of common objects are expressed in language, grounded on images and 3D models of the objects. We first build a large scale, carefully controlled dataset of human utterances that each refers to a 2D rendering of a 3D CAD model so as to distinguish it from a set of shape-wise similar alternatives. Using this dataset, we develop neural language understanding (listening) and production (speaking) models that vary in their grounding (pure 3D forms via point-clouds vs. rendered 2D images), the degree of pragmatic reasoning captured (e.g. speakers that reason about a listener or not), and the neural architecture (e.g. with or without attention). We find models that perform well with both synthetic and human partners, and with held out utterances and objects. We also find that these models are amenable to zero-shot transfer learning to novel object classes (e.g. transfer from training on chairs to testing on lamps), as well as to real-world images drawn from furniture catalogs. Lesion studies indicate that the neural listeners depend heavily on part-related words and associate these words correctly with visual parts of objects (without any explicit network training on object parts), and that transfer to novel classes is most successful when known part-words are available. This work illustrates a practical approach to language grounding, and provides a case study in the relationship between object shape and linguistic structure when it comes to object differentiation. △ Less

Submitted 8 May, 2019; originally announced May 2019.

arXiv:1905.02427 [pdf]

doi 10.1016/j.jss.2019.05.013

Model Based System Assurance Using the Structured Assurance Case Metamodel

Authors: Ran Wei, Tim P. Kelly, Xiaotian Dai, Shuai Zhao, Richard Hawkins

Abstract: Assurance cases are used to demonstrate confidence in system properties of interest (e.g. safety and/or security). A number of system assurance approaches are adopted by industries in the safety-critical domain. However, the task of constructing assurance cases remains a manual, trivial and informal process. The Structured Assurance Case Metamodel (SACM) is a standard specified by the Object Manag… ▽ More Assurance cases are used to demonstrate confidence in system properties of interest (e.g. safety and/or security). A number of system assurance approaches are adopted by industries in the safety-critical domain. However, the task of constructing assurance cases remains a manual, trivial and informal process. The Structured Assurance Case Metamodel (SACM) is a standard specified by the Object Management Group (OMG). SACM provides a richer set of features than existing system assurance languages/approaches. SACM provides a foundation for model-based system assurance, which has great potentials in growing technology domains such as Open Adaptive Systems. However, the intended usage of SACM has not been sufficiently explained. In addition, there has been no support to interoperate between existing assurance case (models) and SACM models. In this article, we explain the intended usage of SACM based on our involvement in the OMG specification process of SACM. In addition, to promote a model-based approach, we provide SACM compliant metamodels for existing system assurance approaches (the Goal Structuring Notation and Claims-Arguments-Evidence), and the transformations from these models to SACM. We also briefly discuss the tool support for model-based system assurance which helps practitioners to make the transition from existing system assurance approaches to model-based system assurance using SACM. △ Less

Submitted 7 May, 2019; originally announced May 2019.

Comments: 45 pages, 41 figures, Accepted by Journal of Systems and Software

arXiv:1903.08237 [pdf, other]

When redundancy is useful: A Bayesian approach to 'overinformative' referring expressions

Authors: Judith Degen, Robert D. Hawkins, Caroline Graf, Elisa Kreiss, Noah D. Goodman

Abstract: Referring is one of the most basic and prevalent uses of language. How do speakers choose from the wealth of referring expressions at their disposal? Rational theories of language use have come under attack for decades for not being able to account for the seemingly irrational overinformativeness ubiquitous in referring expressions. Here we present a novel production model of referring expressions… ▽ More Referring is one of the most basic and prevalent uses of language. How do speakers choose from the wealth of referring expressions at their disposal? Rational theories of language use have come under attack for decades for not being able to account for the seemingly irrational overinformativeness ubiquitous in referring expressions. Here we present a novel production model of referring expressions within the Rational Speech Act framework that treats speakers as agents that rationally trade off cost and informativeness of utterances. Crucially, we relax the assumption that informativeness is computed with respect to a deterministic Boolean semantics, in favor of a non-deterministic continuous semantics. This innovation allows us to capture a large number of seemingly disparate phenomena within one unified framework: the basic asymmetry in speakers' propensity to overmodify with color rather than size; the increase in overmodification in complex scenes; the increase in overmodification with atypical features; and the increase in specificity in nominal reference as a function of typicality. These findings cast a new light on the production of referring expressions: rather than being wastefully overinformative, reference is usefully redundant. △ Less

Submitted 10 December, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

arXiv:1903.04448 [pdf, other]

doi 10.1007/s42113-019-00058-7

Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

Authors: Judith Fan, Robert Hawkins, Mike Wu, Noah Goodman

Abstract: Visual modes of communication are ubiquitous in modern life --- from maps to data plots to political cartoons. Here we investigate drawing, the most basic form of visual communication. Participants were paired in an online environment to play a drawing-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher's goal was to dra… ▽ More Visual modes of communication are ubiquitous in modern life --- from maps to data plots to political cartoons. Here we investigate drawing, the most basic form of visual communication. Participants were paired in an online environment to play a drawing-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher's goal was to draw one of these objects so that the viewer could select it from the array. On `close' trials, objects belonged to the same basic-level category, whereas on `far' trials objects belonged to different categories. We found that people exploited shared information to efficiently communicate about the target object: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core faculties: visual abstraction, the ability to perceive the correspondence between an object and a drawing of it; and pragmatic inference, the ability to judge what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both faculties, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants. Together, this work provides the first algorithmically explicit theory of how visual perception and social cognition jointly support contextual flexibility in visual communication. △ Less

Submitted 27 March, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: 29 pages; 5 figures; submitted draft of manuscript

arXiv:1807.09000 [pdf, other]

The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective

Authors: Robert D. Hawkins, Hyowon Gweon, Noah D. Goodman

Abstract: Recent debates over adults' theory of mind use have been fueled by surprising failures of perspective-taking in communication, suggesting that perspective-taking can be relatively effortful. How, then, should speakers and listeners allocate their resources to achieve successful communication? We begin with the observation that this shared goal induces a natural division of labor: the resources one… ▽ More Recent debates over adults' theory of mind use have been fueled by surprising failures of perspective-taking in communication, suggesting that perspective-taking can be relatively effortful. How, then, should speakers and listeners allocate their resources to achieve successful communication? We begin with the observation that this shared goal induces a natural division of labor: the resources one agent chooses to allocate toward perspective-taking should depend on their expectations about the other's allocation. We formalize this idea in a resource-rational model augmenting recent probabilistic weighting accounts with a mechanism for (costly) control over the degree of perspective-taking. In a series of simulations, we first derive an intermediate degree of perspective weighting as an optimal tradeoff between expected costs and benefits of perspective-taking. We then present two behavioral experiments testing novel predictions of our model. In Experiment 1, we manipulated the presence or absence of occlusions in a director-matcher task and found that speakers spontaneously produced more informative descriptions to account for "known unknowns" in their partner's private view. In Experiment 2, we compared the scripted utterances used by confederates in prior work with those produced in interactions with unscripted directors. We found that confederates were systematically less informative than listeners would initially expect given the presence of occlusions, but listeners used violations to adaptively make fewer errors over time. Taken together, our work suggests that people are not simply "mindblind"; they use contextually appropriate expectations to navigate the division of labor with their partner. We discuss how a resource rational framework may provide a more deeply explanatory foundation for understanding flexible perspective-taking under processing constraints. △ Less

Submitted 11 May, 2020; v1 submitted 24 July, 2018; originally announced July 2018.

arXiv:1703.10186 [pdf, other]

Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding

Authors: Will Monroe, Robert X. D. Hawkins, Noah D. Goodman, Christopher Potts

Abstract: We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework. Experiments show that this combined pragmatic model interprets color descriptions more accurately than th… ▽ More We present a model of pragmatic referring expression interpretation in a grounded communication task (identifying colors from descriptions) that draws upon predictions from two recurrent neural network classifiers, a speaker and a listener, unified by a recursive pragmatic reasoning framework. Experiments show that this combined pragmatic model interprets color descriptions more accurately than the classifiers from which it is built, and that much of this improvement results from combining the speaker and listener perspectives. We observe that pragmatic reasoning helps primarily in the hardest cases: when the model must distinguish very similar colors, or when few utterances adequately express the target color. Our findings make use of a newly-collected corpus of human utterances in color reference games, which exhibit a variety of pragmatic behaviors. We also show that the embedded speaker model reproduces many of these pragmatic behaviors. △ Less

Submitted 16 May, 2017; v1 submitted 29 March, 2017; originally announced March 2017.

Comments: 14 pages, 3 tables, 6 figures. TACL

arXiv:1509.02962 [pdf, other]

Coarse-to-Fine Sequential Monte Carlo for Probabilistic Programs

Authors: Andreas Stuhlmüller, Robert X. D. Hawkins, N. Siddharth, Noah D. Goodman

Abstract: Many practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as probabilistic programs, the models themselves are highly structured objects that can be u… ▽ More Many practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as probabilistic programs, the models themselves are highly structured objects that can be used to derive annealing sequences that are more sensitive to domain structure. We propose an algorithm for transforming probabilistic programs to coarse-to-fine programs which have the same marginal distribution as the original programs, but generate the data at increasing levels of detail, from coarse to fine. We apply this algorithm to an Ising model, its depth-from-disparity variation, and a factorial hidden Markov model. We show preliminary evidence that the use of coarse-to-fine models can make existing generic inference algorithms more efficient. △ Less

Submitted 9 September, 2015; originally announced September 2015.

arXiv:1302.0907 [pdf, other]

doi 10.3390/e15062246

Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems

Authors: Simon DeDeo, Robert X. D. Hawkins, Sara Klingenstein, Tim Hitchcock

Abstract: We characterize the statistical bootstrap for the estimation of information-theoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory---in particular, consistency under arbitrary coarse-graining---that motivate use of these quantiti… ▽ More We characterize the statistical bootstrap for the estimation of information-theoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory---in particular, consistency under arbitrary coarse-graining---that motivate use of these quantities in the first place, while providing reliability comparable to the state of the art for Bayesian estimators. We show how information-theoretic quantities allow for rigorous empirical study of the decision-making capacities of rational agents and the time-asymmetric flows of information in distributed systems. We provide illustrative examples by reference to ongoing collaborative work on the semantic structure of the British Criminal Court system and the conflict dynamics of the contemporary Afghanistan insurgency. △ Less

Submitted 5 June, 2013; v1 submitted 4 February, 2013; originally announced February 2013.

Comments: 32 pages, 8 figures, 5 tables. Matched published version. Code for NSB, naive, and bootstrap estimation of entropy, mutual information, and other quantities available at http://thoth-python.org

Journal ref: Entropy 2013, 15(6), 2246-2276

Showing 1–42 of 42 results for author: Hawkins, R