Search | arXiv e-print repository

Towards Compositional Interpretability for XAI

Authors: Sean Tull, Robin Lorenz, Stephen Clark, Ilyas Khan, Bob Coecke

Abstract: Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the… ▽ More Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2402.11288 [pdf]

Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review

Authors: Merryn D. Constable, Hubert P. H. Shum, Stephen Clark

Abstract: When technical requirements are high, and patient outcomes are critical, opportunities for monitoring and improving surgical skills via objective motion analysis feedback may be particularly beneficial. This narrative review synthesises work on technical and non-technical surgical skills, collaborative task performance, and pose estimation to illustrate new opportunities to advance cardiothoracic… ▽ More When technical requirements are high, and patient outcomes are critical, opportunities for monitoring and improving surgical skills via objective motion analysis feedback may be particularly beneficial. This narrative review synthesises work on technical and non-technical surgical skills, collaborative task performance, and pose estimation to illustrate new opportunities to advance cardiothoracic surgical performance with innovations from computer vision and artificial intelligence. These technological innovations are critically evaluated in terms of the benefits they could offer the cardiothoracic surgical community, and any barriers to the uptake of the technology are elaborated upon. Like some other specialities, cardiothoracic surgery has relatively few opportunities to benefit from tools with data capture technology embedded within them (as with robotic-assisted laparoscopic surgery, for example). In such cases, pose estimation techniques that allow for movement tracking across a conventional operating field without using specialist equipment or markers offer considerable potential. With video data from either simulated or real surgical procedures, these tools can (1) provide insight into the development of expertise and surgical performance over a surgeon's career, (2) provide feedback to trainee surgeons regarding areas for improvement, (3) provide the opportunity to investigate what aspects of skill may be linked to patient outcomes which can (4) inform the aspects of surgical skill which should be focused on within training or mentoring programmes. Classifier or assessment algorithms that use artificial intelligence to 'learn' what expertise is from expert surgical evaluators could further assist educators in determining if trainees meet competency thresholds. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2401.08585 [pdf, other]

From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models

Authors: Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljic, Stephen Clark

Abstract: In this article we present a new modelling framework for structured concepts using a category-theoretic generalisation of conceptual spaces, and show how the conceptual representations can be learned automatically from data, using two very different instantiations: one classical and one quantum. A contribution of the work is a thorough category-theoretic formalisation of our framework. We claim th… ▽ More In this article we present a new modelling framework for structured concepts using a category-theoretic generalisation of conceptual spaces, and show how the conceptual representations can be learned automatically from data, using two very different instantiations: one classical and one quantum. A contribution of the work is a thorough category-theoretic formalisation of our framework. We claim that the use of category theory, and in particular the use of string diagrams to describe quantum processes, helps elucidate some of the most important features of our approach. We build upon Gardenfors' classical framework of conceptual spaces, in which cognition is modelled geometrically through the use of convex spaces, which in turn factorise in terms of simpler spaces called domains. We show how concepts from the domains of shape, colour, size and position can be learned from images of simple shapes, where concepts are represented as Gaussians in the classical implementation, and quantum effects in the quantum one. In the classical case we develop a new model which is inspired by the Beta-VAE model of concepts, but is designed to be more closely connected with language, so that the names of concepts form part of the graphical model. In the quantum case, concepts are learned by a hybrid classical-quantum network trained to perform concept classification, where the classical image processing is carried out by a convolutional neural network and the quantum representations are produced by a parameterised quantum circuit. Finally, we consider the question of whether our quantum models of concepts can be considered conceptual spaces in the Gardenfors sense. △ Less

Submitted 6 November, 2023; originally announced January 2024.

Comments: This article consolidates our previous reports on concept formalisation and learning: arXiv:2302.14822 and arXiv:2203.11216

arXiv:2311.15696 [pdf, other]

Peptide Binding Classification on Quantum Computers

Authors: Charles London, Douglas Brown, Wenduan Xu, Sezen Vatansever, Christopher James Langmead, Dimitri Kartsaklis, Stephen Clark, Konstantinos Meichanetzidis

Abstract: We conduct an extensive study on using near-term quantum computers for a task in the domain of computational biology. By constructing quantum models based on parameterised quantum circuits we perform sequence classification on a task relevant to the design of therapeutic proteins, and find competitive performance with classical baselines of similar scale. To study the effect of noise, we run some… ▽ More We conduct an extensive study on using near-term quantum computers for a task in the domain of computational biology. By constructing quantum models based on parameterised quantum circuits we perform sequence classification on a task relevant to the design of therapeutic proteins, and find competitive performance with classical baselines of similar scale. To study the effect of noise, we run some of the best-performing quantum models with favourable resource requirements on emulators of state-of-the-art noisy quantum processors. We then apply error mitigation methods to improve the signal. We further execute these quantum models on the Quantinuum H1-1 trapped-ion quantum processor and observe very close agreement with noiseless exact simulation. Finally, we perform feature attribution methods and find that the quantum models indeed identify sensible relationships, at least as well as the classical baselines. This work constitutes the first proof-of-concept application of near-term quantum computing to a task critical to the design of therapeutic proteins, opening the route toward larger-scale applications in this and related fields, in line with the hardware development roadmaps of near-term quantum technologies. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.02074 [pdf, other]

ACE: A fast, skillful learned global atmospheric model for climate prediction

Authors: Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah D. Brenowitz, Karthik Kashinath, Michael S. Pritchard, Boris Bonev, Matthew E. Peters, Christopher S. Bretherton

Abstract: Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass… ▽ More Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass and moisture. The emulator is stable for 100 years, nearly conserves column moisture without explicit constraints and faithfully reproduces the reference model's climate, outperforming a challenging baseline on over 90% of tracked variables. ACE requires nearly 100x less wall clock time and is 100x more energy efficient than the reference model using typically available resources. Without fine-tuning, ACE can stably generalize to a previously unseen historical sea surface temperature dataset. △ Less

Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Accepted at Tackling Climate Change with Machine Learning: workshop at NeurIPS 2023

arXiv:2302.14822 [pdf, other]

Formalising and Learning a Quantum Model of Concepts

Authors: Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljic, Stephen Clark

Abstract: In this report we present a new modelling framework for concepts based on quantum theory, and demonstrate how the conceptual representations can be learned automatically from data. A contribution of the work is a thorough category-theoretic formalisation of our framework. We claim that the use of category theory, and in particular the use of string diagrams to describe quantum processes, helps elu… ▽ More In this report we present a new modelling framework for concepts based on quantum theory, and demonstrate how the conceptual representations can be learned automatically from data. A contribution of the work is a thorough category-theoretic formalisation of our framework. We claim that the use of category theory, and in particular the use of string diagrams to describe quantum processes, helps elucidate some of the most important features of our quantum approach to concept modelling. Our approach builds upon Gardenfors' classical framework of conceptual spaces, in which cognition is modelled geometrically through the use of convex spaces, which in turn factorise in terms of simpler spaces called domains. We show how concepts from the domains of shape, colour, size and position can be learned from images of simple shapes, where individual images are represented as quantum states and concepts as quantum effects. Concepts are learned by a hybrid classical-quantum network trained to perform concept classification, where the classical image processing is carried out by a convolutional neural network and the quantum representations are produced by a parameterised quantum circuit. We also use discarding to produce mixed effects, which can then be used to learn concepts which only apply to a subset of the domains, and show how entanglement (together with discarding) can be used to capture interesting correlations across domains. Finally, we consider the question of whether our quantum models of concepts can be considered conceptual spaces in the Gardenfors sense. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2212.03106 [pdf, other]

Scale-Invariant Specifications for Human-Swarm Systems

Authors: Joel Meyer, Ahalya Prabhakar, Allison Pinosky, Ian Abraham, Annalisa Taylor, Millicent Schlafly, Katarina Popovic, Giovani Diniz, Brendan Teich, Borislava Simidchieva, Shane Clark, Todd Murphey

Abstract: We present a method for controlling a swarm using its spectral decomposition -- that is, by describing the set of trajectories of a swarm in terms of a spatial distribution throughout the operational domain -- guaranteeing scale invariance with respect to the number of agents both for computation and for the operator tasked with controlling the swarm. We use ergodic control, decentralized across t… ▽ More We present a method for controlling a swarm using its spectral decomposition -- that is, by describing the set of trajectories of a swarm in terms of a spatial distribution throughout the operational domain -- guaranteeing scale invariance with respect to the number of agents both for computation and for the operator tasked with controlling the swarm. We use ergodic control, decentralized across the network, for implementation. In the DARPA OFFSET program field setting, we test this interface design for the operator using the STOMP interface -- the same interface used by Raytheon BBN throughout the duration of the OFFSET program. In these tests, we demonstrate that our approach is scale-invariant -- the user specification does not depend on the number of agents; it is persistent -- the specification remains active until the user specifies a new command; and it is real-time -- the user can interact with and interrupt the swarm at any time. Moreover, we show that the spectral/ergodic specification of swarm behavior degrades gracefully as the number of agents goes down, enabling the operator to maintain the same approach as agents become disabled or are added to the network. We demonstrate the scale-invariance and dynamic response of our system in a field relevant simulator on a variety of tactical scenarios with up to 50 agents. We also demonstrate the dynamic response of our system in the field with a smaller team of agents. Lastly, we make the code for our system available. △ Less

Submitted 12 December, 2022; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: Journal of Field Robotics, Accepted for Publication. 25 pages

arXiv:2211.11820 [pdf, other]

Machine-learned climate model corrections from a global storm-resolving model

Authors: Anna Kwa, Spencer K. Clark, Brian Henn, Noah D. Brenowitz, Jeremy McGibbon, W. Andre Perkins, Oliver Watt-Meyer, Lucas Harris, Christopher S. Bretherton

Abstract: Due to computational constraints, running global climate models (GCMs) for many years requires a lower spatial grid resolution (${\gtrsim}50$ km) than is optimal for accurately resolving important physical processes. Such processes are approximated in GCMs via subgrid parameterizations, which contribute significantly to the uncertainty in GCM predictions. One approach to improving the accuracy of… ▽ More Due to computational constraints, running global climate models (GCMs) for many years requires a lower spatial grid resolution (${\gtrsim}50$ km) than is optimal for accurately resolving important physical processes. Such processes are approximated in GCMs via subgrid parameterizations, which contribute significantly to the uncertainty in GCM predictions. One approach to improving the accuracy of a coarse-grid global climate model is to add machine-learned state-dependent corrections at each simulation timestep, such that the climate model evolves more like a high-resolution global storm-resolving model (GSRM). We train neural networks to learn the state-dependent temperature, humidity, and radiative flux corrections needed to nudge a 200 km coarse-grid climate model to the evolution of a 3~km fine-grid GSRM. When these corrective ML models are coupled to a year-long coarse-grid climate simulation, the time-mean spatial pattern errors are reduced by 6-25% for land surface temperature and 9-25% for land surface precipitation with respect to a no-ML baseline simulation. The ML-corrected simulations develop other biases in climate and circulation that differ from, but have comparable amplitude to, the baseline simulation. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2205.14507 [pdf, other]

HPC Extensions to the OpenKIM Processing Pipeline

Authors: Daniel S. Karls, Steven M. Clark, Brendon A. Waters, Ryan S. Elliott, Ellad B. Tadmor

Abstract: The Open Knowledgebase of Interatomic Models (OpenKIM) is an NSF Science Gateway that archives fully functional computer implementations of interatomic models (potentials and force fields) and simulation codes that use them to compute material properties. Interatomic models are coupled with compatible simulation codes and executed in a fully automated manner by the OpenKIM processing pipeline, a c… ▽ More The Open Knowledgebase of Interatomic Models (OpenKIM) is an NSF Science Gateway that archives fully functional computer implementations of interatomic models (potentials and force fields) and simulation codes that use them to compute material properties. Interatomic models are coupled with compatible simulation codes and executed in a fully automated manner by the OpenKIM processing pipeline, a cloud-based computation platform. The pipeline as previously introduced in the literature was insufficient to support the large-scale computations that have become necessary within the materials science community. Accordingly, we present extensions made to the pipeline that allow it to utilize High-Performance Computing (HPC) resources in an efficient and performant fashion. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2203.11216 [pdf, other]

The Conceptual VAE

Authors: Razin A. Shaikh, Sara Sabrina Zemljic, Sean Tull, Stephen Clark

Abstract: In this report we present a new model of concepts, based on the framework of variational autoencoders, which is designed to have attractive properties such as factored conceptual domains, and at the same time be learnable from data. The model is inspired by, and closely related to, the Beta-VAE model of concepts, but is designed to be more closely connected with language, so that the names of conc… ▽ More In this report we present a new model of concepts, based on the framework of variational autoencoders, which is designed to have attractive properties such as factored conceptual domains, and at the same time be learnable from data. The model is inspired by, and closely related to, the Beta-VAE model of concepts, but is designed to be more closely connected with language, so that the names of concepts form part of the graphical model. We provide evidence that our model -- which we call the Conceptual VAE -- is able to learn interpretable conceptual representations from simple images of coloured shapes together with the corresponding concept labels. We also show how the model can be used as a concept classifier, and how it can be adapted to learn from fewer labels per instance. Finally, we formally relate our model to Gardenfors' theory of conceptual spaces, showing how the Gaussians we use to represent concepts can be formalised in terms of "fuzzy concepts" in such a space. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2112.06872 [pdf, other]

Efficient Differentially Private Secure Aggregation for Federated Learning via Hardness of Learning with Errors

Authors: Timothy Stevens, Christian Skalka, Christelle Vincent, John Ring, Samuel Clark, Joseph Near

Abstract: Federated machine learning leverages edge computing to develop models from network user data, but privacy in federated learning remains a major challenge. Techniques using differential privacy have been proposed to address this, but bring their own challenges -- many require a trusted third party or else add too much noise to produce useful models. Recent advances in \emph{secure aggregation} usin… ▽ More Federated machine learning leverages edge computing to develop models from network user data, but privacy in federated learning remains a major challenge. Techniques using differential privacy have been proposed to address this, but bring their own challenges -- many require a trusted third party or else add too much noise to produce useful models. Recent advances in \emph{secure aggregation} using multiparty computation eliminate the need for a third party, but are computationally expensive especially at scale. We present a new federated learning protocol that leverages a novel differentially private, malicious secure aggregation protocol based on techniques from Learning With Errors. Our protocol outperforms current state-of-the art techniques, and empirical results show that it scales to a large number of parties, with optimal accuracy for any differentially private federated learning scheme. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 16 pages, 4 figures

arXiv:2110.06886 [pdf, other]

doi 10.1371/journal.pone.0264492

Sim2Ls: FAIR simulation workflows and data

Authors: Martin Hunt, Steven Clark, Daniel Mejia, Saaketh Desai, Alejandro Strachan

Abstract: Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility… ▽ More Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: 23 pages, 5 figures

arXiv:2110.04236 [pdf, other]

lambeq: An Efficient High-Level Python Library for Quantum NLP

Authors: Dimitri Kartsaklis, Ian Fan, Richie Yeung, Anna Pearson, Robin Lorenz, Alexis Toumi, Giovanni de Felice, Konstantinos Meichanetzidis, Stephen Clark, Bob Coecke

Abstract: We present lambeq, the first high-level Python library for Quantum Natural Language Processing (QNLP). The open-source toolkit offers a detailed hierarchy of modules and classes implementing all stages of a pipeline for converting sentences to string diagrams, tensor networks, and quantum circuits ready to be used on a quantum computer. lambeq supports syntactic parsing, rewriting and simplificati… ▽ More We present lambeq, the first high-level Python library for Quantum Natural Language Processing (QNLP). The open-source toolkit offers a detailed hierarchy of modules and classes implementing all stages of a pipeline for converting sentences to string diagrams, tensor networks, and quantum circuits ready to be used on a quantum computer. lambeq supports syntactic parsing, rewriting and simplification of string diagrams, ansatz creation and manipulation, as well as a number of compositional models for preparing quantum-friendly representations of sentences, employing various degrees of syntax sensitivity. We present the generic architecture and describe the most important modules in detail, demonstrating the usage with illustrative examples. Further, we test the toolkit in practice by using it to perform a number of experiments on simple NLP tasks, implementing both classical and quantum pipelines. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2109.10044 [pdf, other]

Something Old, Something New: Grammar-based CCG Parsing with Transformer Models

Authors: Stephen Clark

Abstract: This report describes the parsing problem for Combinatory Categorial Grammar (CCG), showing how a combination of Transformer-based neural models and a symbolic CCG grammar can lead to substantial gains over existing approaches. The report also documents a 20-year research program, showing how NLP methods have evolved over this time. The staggering accuracy improvements provided by neural models fo… ▽ More This report describes the parsing problem for Combinatory Categorial Grammar (CCG), showing how a combination of Transformer-based neural models and a symbolic CCG grammar can lead to substantial gains over existing approaches. The report also documents a 20-year research program, showing how NLP methods have evolved over this time. The staggering accuracy improvements provided by neural models for CCG parsing can be seen as a reflection of the improvements seen in NLP more generally. The report provides a minimal introduction to CCG and CCG parsing, with many pointers to the relevant literature. It then describes the CCG supertagging problem, and some recent work from Tian et al. (2020) which applies Transformer-based models to supertagging with great effect. I use this existing model to develop a CCG multitagger, which can serve as a front-end to an existing CCG parser. Simply using this new multitagger provides substantial gains in parsing accuracy. I then show how a Transformer-based model from the parsing literature can be combined with the grammar-based CCG parser, setting a new state-of-the-art for the CCGbank parsing task of almost 93% F-score for labelled dependencies, with complete sentence accuracies of over 50%. △ Less

Submitted 28 September, 2021; v1 submitted 21 September, 2021; originally announced September 2021.

Comments: o Added to the description of the formal properties of CCG o Added more description of how maxent and neural taggers differ o Added a ref to some very recent CCG parsing work o Fixed a bug in one of the figures o Added a note and ref to the conclusions o Added to the acknowledgements

arXiv:2108.09416 [pdf]

2020 U.S. presidential election in swing states: Gender differences in Twitter conversations

Authors: Amir Karami, Spring B. Clark, Anderson Mackenzie, Dorathea Lee, Michael Zhu, Hannah R. Boyajieff, Bailey Goldschmidt

Abstract: Social media is commonly used by the public during election campaigns to express their opinions regarding different issues. Among various social media channels, Twitter provides an efficient platform for researchers and politicians to explore public opinion regarding a wide range of topics such as the economy and foreign policy. Current literature mainly focuses on analyzing the content of tweets… ▽ More Social media is commonly used by the public during election campaigns to express their opinions regarding different issues. Among various social media channels, Twitter provides an efficient platform for researchers and politicians to explore public opinion regarding a wide range of topics such as the economy and foreign policy. Current literature mainly focuses on analyzing the content of tweets without considering the gender of users. This research collects and analyzes a large number of tweets and uses computational, human coding, and statistical analyses to identify topics in more than 300,000 tweets posted during the 2020 U.S. presidential election and to compare female and male users regarding the average weight of the discussed topics. Our findings are based upon a wide range of topics, such as tax, climate change, and the COVID-19 pandemic. Out of the topics, there exists a significant difference between female and male users for more than 70% of topics. △ Less

Submitted 13 July, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

arXiv:2101.05125 [pdf, other]

Formalising Concepts as Grounded Abstractions

Authors: Stephen Clark, Alexander Lerchner, Tamara von Glehn, Olivier Tieleman, Richard Tanburn, Misha Dashevskiy, Matko Bosnjak

Abstract: The notion of concept has been studied for centuries, by philosophers, linguists, cognitive scientists, and researchers in artificial intelligence (Margolis & Laurence, 1999). There is a large literature on formal, mathematical models of concepts, including a whole sub-field of AI -- Formal Concept Analysis -- devoted to this topic (Ganter & Obiedkov, 2016). Recently, researchers in machine learni… ▽ More The notion of concept has been studied for centuries, by philosophers, linguists, cognitive scientists, and researchers in artificial intelligence (Margolis & Laurence, 1999). There is a large literature on formal, mathematical models of concepts, including a whole sub-field of AI -- Formal Concept Analysis -- devoted to this topic (Ganter & Obiedkov, 2016). Recently, researchers in machine learning have begun to investigate how methods from representation learning can be used to induce concepts from raw perceptual data (Higgins, Sonnerat, et al., 2018). The goal of this report is to provide a formal account of concepts which is compatible with this latest work in deep learning. The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces. The mathematics of partial orders and lattices is a standard tool for modelling conceptual spaces (Ch.2, Mitchell (1997), Ganter and Obiedkov (2016)); however, there is no formal work that we are aware of which defines a conceptual lattice on top of a representation that is induced using unsupervised deep learning (Goodfellow et al., 2016). The advantages of partially-ordered lattice structures are that these provide natural mechanisms for use in concept discovery algorithms, through the meets and joins of the lattice. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2012.05672 [pdf, other]

Imitating Interactive Intelligence

Authors: Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Soňa Mokrá, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne , et al. (4 additional authors not shown)

Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha… ▽ More A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount. △ Less

Submitted 20 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

arXiv:2011.04838 [pdf, other]

Answer Graph: Factorization Matters in Large Graphs

Authors: Zahid Abul-Basher, Nikolay Yakovets, Parke Godfrey, Stanley Clark, Mark Chignell

Abstract: Our answer-graph method to evaluate SPARQL conjunctive queries (CQs) finds a factorized answer set first, an answer graph, and then finds the embedding tuples from this. This approach can reduce greatly the cost to evaluate CQs. This affords a second advantage: we can construct a cost-based planner. We present the answer-graph approach, and overview our prototype system, Wireframe. We then offer p… ▽ More Our answer-graph method to evaluate SPARQL conjunctive queries (CQs) finds a factorized answer set first, an answer graph, and then finds the embedding tuples from this. This approach can reduce greatly the cost to evaluate CQs. This affords a second advantage: we can construct a cost-based planner. We present the answer-graph approach, and overview our prototype system, Wireframe. We then offer proof of concept via a micro-benchmark over the YAGO2s dataset with two prevalent shapes of queries, snowflake and diamond. We compare Wireframe's performance over these against PostgreSQL, Virtuoso, MonetDB, and Neo4J to illustrate the performance advantages of our answer-graph approach. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2010.09892 [pdf, other]

Understanding YouTube Communities via Subscription-based Channel Embeddings

Authors: Sam Clark, Anna Zaitsev

Abstract: YouTube is an important source of news and entertainment worldwide, but the scale makes it challenging to study the ideas and topics being discussed on the platform. This paper presents new methods to discover and classify YouTube channels which enable the analysis of communities and categories on the platform using orders of magnitude more channels than have been used in previous studies. Instead… ▽ More YouTube is an important source of news and entertainment worldwide, but the scale makes it challenging to study the ideas and topics being discussed on the platform. This paper presents new methods to discover and classify YouTube channels which enable the analysis of communities and categories on the platform using orders of magnitude more channels than have been used in previous studies. Instead of using channel and video data as features for classification as other researchers have, these methods use a self-supervised learning approach that leverages the public subscription pages of commenters. We test the classification method on the task of predicting the political lean of YouTube news channels and find that it outperforms the previous best model on the task. Further experiments also show that there are important advantages to using commenter subscriptions to discover channels. The subscription data, along with an iterative approach, is applied to discover, to our current understanding, the most comprehensive set of English language socio-political YouTube channels yet to be analyzed. We experiment with predicting more fine grained political tags for channels using a previously annotated dataset and find that our model performs better than the average individual human reviewer for most of the top tags. This fine grained political tag model is then applied to the newly discovered English language socio-political channels to create a new dataset to analyze the amount of traffic going to different political content. The data shows that some tags, such as "Partisan Right" and "Conspiracy", are significantly under represented when looking only at the most popular socio-political channels. Through the use of our methods, we are able to get a much more accurate picture of the size of these communities on YouTube. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2009.08206 [pdf, ps, other]

doi 10.1145/3340531.3412050

Learning to Personalize for Web Search Sessions

Authors: Saad Aloteibi, Stephen Clark

Abstract: The task of session search focuses on using interaction data to improve relevance for the user's next query at the session level. In this paper, we formulate session search as a personalization task under the framework of learning to rank. Personalization approaches re-rank results to match a user model. Such user models are usually accumulated over time based on the user's browsing behaviour. We… ▽ More The task of session search focuses on using interaction data to improve relevance for the user's next query at the session level. In this paper, we formulate session search as a personalization task under the framework of learning to rank. Personalization approaches re-rank results to match a user model. Such user models are usually accumulated over time based on the user's browsing behaviour. We use a pre-computed and transparent set of user models based on concepts from the social science literature. Interaction data are used to map each session to these user models. Novel features are then estimated based on such models as well as sessions' interaction data. Extensive experiments on test collections from the TREC session track show statistically significant improvements over current session search algorithms. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: 10 pages; Preprint of the full paper accepted at CIKM 2020

ACM Class: H.3

arXiv:2009.01719 [pdf, other]

Grounded Language Learning Fast and Slow

Authors: Felix Hill, Olivier Tieleman, Tamara von Glehn, Nathaniel Wong, Hamza Merzic, Stephen Clark

Abstract: Recent work has shown that large text-based neural language models, trained with conventional supervised learning objectives, acquire a surprising propensity for few- and one-shot learning. Here, we show that an embodied agent situated in a simulated 3D world, and endowed with a novel dual-coding external memory, can exhibit similar one-shot word learning when trained with conventional reinforceme… ▽ More Recent work has shown that large text-based neural language models, trained with conventional supervised learning objectives, acquire a surprising propensity for few- and one-shot learning. Here, we show that an embodied agent situated in a simulated 3D world, and endowed with a novel dual-coding external memory, can exhibit similar one-shot word learning when trained with conventional reinforcement learning algorithms. After a single introduction to a novel object via continuous visual perception and a language prompt ("This is a dax"), the agent can re-identify the object and manipulate it as instructed ("Put the dax on the bed"). In doing so, it seamlessly integrates short-term, within-episode knowledge of the appropriate referent for the word "dax" with long-term lexical and motor knowledge acquired across episodes (i.e. "bed" and "putting"). We find that, under certain training conditions and with a particular memory writing mechanism, the agent's one-shot word-object binding generalizes to novel exemplars within the same ShapeNet category, and is effective in settings with unfamiliar numbers of objects. We further show how dual-coding memory can be exploited as a signal for intrinsic motivation, stimulating the agent to seek names for objects that may be useful for later executing instructions. Together, the results demonstrate that deep neural networks can exploit meta-learning, episodic memory and an explicitly multi-modal environment to account for 'fast-mapping', a fundamental pillar of human cognitive development and a potentially transformative capacity for agents that interact with human users. △ Less

Submitted 14 October, 2020; v1 submitted 3 September, 2020; originally announced September 2020.

arXiv:2006.06081 [pdf, other]

Ergodic Specifications for Flexible Swarm Control: From User Commands to Persistent Adaptation

Authors: Ahalya Prabhakar, Ian Abraham, Annalisa Taylor, Millicent Schlafly, Katarina Popovic, Giovani Diniz, Brendan Teich, Borislava Simidchieva, Shane Clark, Todd Murphey

Abstract: This paper presents a formulation for swarm control and high-level task planning that is dynamically responsive to user commands and adaptable to environmental changes. We design an end-to-end pipeline from a tactile tablet interface for user commands to onboard control of robotic agents based on decentralized ergodic coverage. Our approach demonstrates reliable and dynamic control of a swarm coll… ▽ More This paper presents a formulation for swarm control and high-level task planning that is dynamically responsive to user commands and adaptable to environmental changes. We design an end-to-end pipeline from a tactile tablet interface for user commands to onboard control of robotic agents based on decentralized ergodic coverage. Our approach demonstrates reliable and dynamic control of a swarm collective through the use of ergodic specifications for planning and executing agent trajectories as well as responding to user and external inputs. We validate our approach in a virtual reality simulation environment and in real-world experiments at the DARPA OFFSET Urban Swarm Challenge FX3 field tests with a robotic swarm where user-based control of the swarm and mission-based tasks require a dynamic and flexible response to changing conditions and objectives in real-time. △ Less

Submitted 10 June, 2020; originally announced June 2020.

Journal ref: Robotics: Science and Systems (RSS), 2020

arXiv:2006.01016 [pdf, other]

Probing Emergent Semantics in Predictive Agents via Question Answering

Authors: Abhishek Das, Federico Carnevale, Hamza Merzic, Laura Rimell, Rosalia Schneider, Josh Abramson, Alden Hung, Arun Ahuja, Stephen Clark, Gregory Wayne, Felix Hill

Abstract: Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling -action-conditional CPC (Guo et al., 2018) and… ▽ More Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling -action-conditional CPC (Guo et al., 2018) and SimCore (Gregor et al., 2019). After training agents with these predictive objectives in a visually-rich, 3D environment with an assortment of objects, colors, shapes, and spatial configurations, we probe their internal state representations with synthetic (English) questions, without backpropagating gradients from the question-answering decoder into the agent. The performance of different agents when probed this way reveals that they learn to encode factual, and seemingly compositional, information about objects, properties and spatial relations from their physical environment. Our approach is intuitive, i.e. humans can easily interpret responses of the model as opposed to inspecting continuous vectors, and model-agnostic, i.e. applicable to any modeling approach. By revealing the implicit knowledge of objects, quantities, properties and relations acquired by agents as they learn, question-conditional agent probing can stimulate the design and development of stronger predictive learning objectives. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: ICML 2020

arXiv:2005.03684 [pdf, other]

Learning to Segment Actions from Observation and Narration

Authors: Daniel Fried, Jean-Baptiste Alayrac, Phil Blunsom, Chris Dyer, Stephen Clark, Aida Nematzadeh

Abstract: We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision u… ▽ More We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision used in training, and we find that both task structure and narrative language provide large benefits in segmentation quality. △ Less

Submitted 11 August, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: ACL 2020

arXiv:1912.06686 [pdf, other]

doi 10.1038/s41386-021-01020-7

Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression

Authors: Claas Flint, Micah Cearns, Nils Opel, Ronny Redlich, David M. A. Mehler, Daniel Emden, Nils R. Winter, Ramona Leenings, Simon B. Eickhoff, Tilo Kircher, Axel Krug, Igor Nenadic, Volker Arolt, Scott Clark, Bernhard T. Baune, Xiaoyi Jiang, Udo Dannlowski, Tim Hahn

Abstract: We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied ques… ▽ More We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of $N = 1,868$ MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of $61\,\%$. Next, we mimicked the process by which researchers would draw samples of various sizes ($N = 4$ to $N = 150$) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes ($N = 20$), we observe accuracies of up to $95\,\%$. For medium sample sizes ($N = 100$) accuracies up to $75\,\%$ were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases. △ Less

Submitted 3 May, 2021; v1 submitted 13 December, 2019; originally announced December 2019.

Journal ref: Neuropsychopharmacology 46 (2021) 1510-1517

arXiv:1910.00571 [pdf, other]

Environmental drivers of systematicity and generalization in a situated agent

Authors: Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro

Abstract: The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparative… ▽ More The question of whether deep neural networks are good at generalising beyond their immediate training experience is of critical importance for learning-based approaches to AI. Here, we consider tests of out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room. We first describe a comparatively generic agent architecture that exhibits strong performance on these tests. We then identify three aspects of the training regime and environment that make a significant difference to its performance: (a) the number of object/word experiences in the training set; (b) the visual invariances afforded by the agent's perspective, or frame of reference; and (c) the variety of visual input inherent in the perceptual aspect of the agent's perception. Our findings indicate that the degree of generalisation that networks exhibit can depend critically on particulars of the environment in which a given task is instantiated. They further suggest that the propensity for neural networks to generalise in systematic ways may increase if, like human children, those networks have access to many frames of richly varying, multi-modal observations as they learn. △ Less

Submitted 19 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

arXiv:1909.11049 [pdf, ps, other]

Neural Generative Rhetorical Structure Parsing

Authors: Amandla Mabona, Laura Rimell, Stephen Clark, Andreas Vlachos

Abstract: Rhetorical structure trees have been shown to be useful for several document-level tasks including summarization and document classification. Previous approaches to RST parsing have used discriminative models; however, these are less sample efficient than generative models, and RST parsing datasets are typically small. In this paper, we present the first generative model for RST parsing. Our model… ▽ More Rhetorical structure trees have been shown to be useful for several document-level tasks including summarization and document classification. Previous approaches to RST parsing have used discriminative models; however, these are less sample efficient than generative models, and RST parsing datasets are typically small. In this paper, we present the first generative model for RST parsing. Our model is a document-level RNN grammar (RNNG) with a bottom-up traversal order. We show that, for our parser's traversal order, previous beam search algorithms for RNNGs have a left-branching bias which is ill-suited for RST parsing. We develop a novel beam search algorithm that keeps track of both structure- and word-generating actions without exhibiting this branching bias and results in absolute improvements of 6.8 and 2.9 on unlabelled and labelled F1 over previous algorithms. Overall, our generative model outperforms a discriminative model with the same features by 2.6 F1 points and achieves performance comparable to the state-of-the-art, outperforming all published parsers from a recent replication study that do not use additional training data. △ Less

Submitted 24 September, 2019; originally announced September 2019.

Comments: EMNLP 2019

arXiv:1906.06438 [pdf, other]

Scalable Syntax-Aware Language Models Using Knowledge Distillation

Authors: Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom

Abstract: Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of tra… ▽ More Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of training data. To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from. On targeted syntactic evaluations, we find that, while sequential LSTMs perform much better than previously reported, our proposed technique substantially improves on this baseline, yielding a new state of the art. Our findings and analysis affirm the importance of structural biases, even in models that learn from large amounts of data. △ Less

Submitted 14 June, 2019; originally announced June 2019.

Comments: ACL 2019

arXiv:1806.00840 [pdf, ps, other]

doi 10.18653/v1/W18-2903

Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing

Authors: Jean Maillard, Stephen Clark

Abstract: Latent tree learning models represent sentences by composing their words according to an induced parse tree, all based on a downstream task. These models often outperform baselines which use (externally provided) syntax trees to drive the composition order. This work contributes (a) a new latent tree learning model based on shift-reduce parsing, with competitive downstream performance and non-triv… ▽ More Latent tree learning models represent sentences by composing their words according to an induced parse tree, all based on a downstream task. These models often outperform baselines which use (externally provided) syntax trees to drive the composition order. This work contributes (a) a new latent tree learning model based on shift-reduce parsing, with competitive downstream performance and non-trivial induced trees, and (b) an analysis of the trees learned by our shift-reduce model and by a chart-based model. △ Less

Submitted 3 June, 2018; originally announced June 2018.

Comments: ACL 2018 workshop on Relevance of Linguistic Structure in Neural Architectures for NLP

Journal ref: Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP, ACL 2018

arXiv:1805.07051 [pdf, other]

Bayesian Joint Spike-and-Slab Graphical Lasso

Authors: Zehang Richard Li, Tyler H. McCormick, Samuel J. Clark

Abstract: In this article, we propose a new class of priors for Bayesian inference with multiple Gaussian graphical models. We introduce fully Bayesian treatments of two popular procedures, the group graphical lasso and the fused graphical lasso, and extend them to a continuous spike-and-slab framework to allow self-adaptive shrinkage and model selection simultaneously. We develop an EM algorithm that perfo… ▽ More In this article, we propose a new class of priors for Bayesian inference with multiple Gaussian graphical models. We introduce fully Bayesian treatments of two popular procedures, the group graphical lasso and the fused graphical lasso, and extend them to a continuous spike-and-slab framework to allow self-adaptive shrinkage and model selection simultaneously. We develop an EM algorithm that performs fast and dynamic explorations of posterior modes. Our approach selects sparse models efficiently with substantially smaller bias than would be induced by alternative regularization procedures. The performance of the proposed methods are demonstrated through simulation and two real data examples. △ Less

Submitted 9 May, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

arXiv:1804.07707 [pdf, ps, other]

Factorising AMR generation through syntax

Authors: Kris Cao, Stephen Clark

Abstract: Generating from Abstract Meaning Representation (AMR) is an underspecified problem, as many syntactic decisions are not constrained by the semantic graph. To explicitly account for this underspecification, we break down generating from AMR into two steps: first generate a syntactic structure, and then generate the surface form. We show that decomposing the generation process this way leads to stat… ▽ More Generating from Abstract Meaning Representation (AMR) is an underspecified problem, as many syntactic decisions are not constrained by the semantic graph. To explicitly account for this underspecification, we break down generating from AMR into two steps: first generate a syntactic structure, and then generate the surface form. We show that decomposing the generation process this way leads to state-of-the-art single model performance generating from AMR without additional unlabelled data. We also demonstrate that we can generate meaning-preserving syntactic paraphrases of the same AMR graph, as judged by humans. △ Less

Submitted 3 April, 2019; v1 submitted 20 April, 2018; originally announced April 2018.

Comments: Camera ready; accepted at NAACL-HLT 2019

arXiv:1804.03984 [pdf, other]

Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input

Authors: Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark

Abstract: The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, i… ▽ More The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: To appear at ICLR 2018

arXiv:1804.03980 [pdf, other]

Emergent Communication through Negotiation

Authors: Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark

Abstract: Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textit{a pri… ▽ More Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textit{a priori} ungrounded and is a form of cheap talk. We show that self-interested agents can use the pre-grounded communication channel to negotiate fairly, but are unable to effectively use the ungrounded channel. However, prosocial agents do learn to use cheap talk to find an optimal negotiating strategy, suggesting that cooperation is necessary for language to emerge. We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: Published as a conference paper at ICLR 2018

arXiv:1710.09867 [pdf, other]

Understanding Early Word Learning in Situated Artificial Agents

Authors: Felix Hill, Stephen Clark, Karl Moritz Hermann, Phil Blunsom

Abstract: Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with… ▽ More Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with no meaningful prior knowledge overcome these obstacles, researchers currently lack a clear understanding of how they do so, a problem that we attempt to address in this paper. For maximum control and generality, we focus on a simple neural network-based language learning agent, trained via policy-gradient methods, which can interpret single-word instructions in a simulated 3D world. Whilst the goal is not to explicitly model infant word learning, we take inspiration from experimental paradigms in developmental psychology and apply some of these to the artificial agent, exploring the conditions under which established human biases and learning effects emerge. We further propose a novel method for visualising semantic representations in the agent. △ Less

Submitted 1 October, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

arXiv:1705.09189 [pdf, other]

doi 10.1017/S1351324919000184

Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs

Authors: Jean Maillard, Stephen Clark, Dani Yogatama

Abstract: We introduce a neural network that represents sentences by composing their words according to induced binary parse trees. We use Tree-LSTM as our composition function, applied along a tree structure found by a fully differentiable natural language chart parser. Our model simultaneously optimises both the composition function and the parser, thus eliminating the need for externally-provided parse t… ▽ More We introduce a neural network that represents sentences by composing their words according to induced binary parse trees. We use Tree-LSTM as our composition function, applied along a tree structure found by a fully differentiable natural language chart parser. Our model simultaneously optimises both the composition function and the parser, thus eliminating the need for externally-provided parse trees which are normally required for Tree-LSTM. It can therefore be seen as a tree-based RNN that is unsupervised with respect to the parse trees. As it is fully differentiable, our model is easily trained with an off-the-shelf gradient descent method and backpropagation. We demonstrate that it achieves better performance compared to various supervised Tree-LSTM architectures on a textual entailment task and a reverse dictionary task. △ Less

Submitted 25 May, 2017; originally announced May 2017.

Journal ref: Natural Language Engineering 25, no. 4 (2019): 433-49

arXiv:1702.05962 [pdf, other]

Latent Variable Dialogue Models and their Diversity

Authors: Kris Cao, Stephen Clark

Abstract: We present a dialogue generation model that directly captures the variability in possible responses to a given input, which reduces the `boring output' issue of deterministic dialogue models. Experiments show that our model generates more diverse outputs than baseline models, and also generates more consistently acceptable output than sampling from a deterministic encoder-decoder model. We present a dialogue generation model that directly captures the variability in possible responses to a given input, which reduces the `boring output' issue of deterministic dialogue models. Experiments show that our model generates more diverse outputs than baseline models, and also generates more consistently acceptable output than sampling from a deterministic encoder-decoder model. △ Less

Submitted 20 February, 2017; originally announced February 2017.

Comments: Accepted at EACL 2017

arXiv:1612.04858 [pdf, other]

Bayesian Optimization for Machine Learning : A Practical Guidebook

Authors: Ian Dewancker, Michael McCourt, Scott Clark

Abstract: The engineering of machine learning systems is still a nascent field; relying on a seemingly daunting collection of quickly evolving tools and best practices. It is our hope that this guidebook will serve as a useful resource for machine learning practitioners looking to take advantage of Bayesian optimization techniques. We outline four example machine learning problems that can be solved using o… ▽ More The engineering of machine learning systems is still a nascent field; relying on a seemingly daunting collection of quickly evolving tools and best practices. It is our hope that this guidebook will serve as a useful resource for machine learning practitioners looking to take advantage of Bayesian optimization techniques. We outline four example machine learning problems that can be solved using open source machine learning libraries, and highlight the benefits of using Bayesian optimization in the context of these common machine learning applications. △ Less

Submitted 14 December, 2016; originally announced December 2016.

arXiv:1610.07432 [pdf, ps, other]

Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research

Authors: Douwe Kiela, Luana Bulat, Anita L. Vero, Stephen Clark

Abstract: Meaning has been called the "holy grail" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive… ▽ More Meaning has been called the "holy grail" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion. △ Less

Submitted 24 October, 2016; originally announced October 2016.

MSC Class: 68T01 ACM Class: I.2.6

arXiv:1605.06170 [pdf, other]

Evaluation System for a Bayesian Optimization Service

Authors: Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke

Abstract: Bayesian optimization is an elegant solution to the hyperparameter optimization problem in machine learning. Building a reliable and robust Bayesian optimization service requires careful testing methodology and sound statistical analysis. In this talk we will outline our development of an evaluation framework to rigorously test and measure the impact of changes to the SigOpt optimization service.… ▽ More Bayesian optimization is an elegant solution to the hyperparameter optimization problem in machine learning. Building a reliable and robust Bayesian optimization service requires careful testing methodology and sound statistical analysis. In this talk we will outline our development of an evaluation framework to rigorously test and measure the impact of changes to the SigOpt optimization service. We present an overview of our evaluation system and discuss how this framework empowers our research engineers to confidently and quickly make changes to our core optimization engine △ Less

Submitted 19 May, 2016; originally announced May 2016.

arXiv:1603.09441 [pdf, other]

A Stratified Analysis of Bayesian Optimization Methods

Authors: Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke

Abstract: Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of te… ▽ More Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of test functions. These test functions serve to mimic the complexity of hyperparameter optimization problems, the most prominent application of Bayesian optimization, but with a closed form which allows for rapid evaluation and more predictable behavior. This offers a flexible and efficient way to investigate functions with specific properties of interest, such as oscillatory behavior or an optimum on the domain boundary. △ Less

Submitted 30 March, 2016; originally announced March 2016.

arXiv:1411.7942 [pdf, other]

Using Sentence Plausibility to Learn the Semantics of Transitive Verbs

Authors: Tamara Polajnar, Laura Rimell, Stephen Clark

Abstract: The functional approach to compositional distributional semantics considers transitive verbs to be linear maps that transform the distributional vectors representing nouns into a vector representing a sentence. We conduct an initial investigation that uses a matrix consisting of the parameters of a logistic regression classifier trained on a plausibility task as a transitive verb function. We comp… ▽ More The functional approach to compositional distributional semantics considers transitive verbs to be linear maps that transform the distributional vectors representing nouns into a vector representing a sentence. We conduct an initial investigation that uses a matrix consisting of the parameters of a logistic regression classifier trained on a plausibility task as a transitive verb function. We compare our method to a commonly used corpus-based method for constructing a verb matrix and find that the plausibility training may be more effective for disambiguation tasks. △ Less

Submitted 12 December, 2014; v1 submitted 28 November, 2014; originally announced November 2014.

Comments: Full updated paper for NIPS learning semantics workshop, with some minor errata fixed

arXiv:1406.4690 [pdf, ps, other]

doi 10.1093/logcom/exu027

The Frobenius anatomy of word meanings II: possessive relative pronouns

Authors: Mehrnoosh Sadrzadeh, Stephen Clark, Bob Coecke

Abstract: Within the categorical compositional distributional model of meaning, we provide semantic interpretations for the subject and object roles of the possessive relative pronoun `whose'. This is done in terms of Frobenius algebras over compact closed categories. These algebras and their diagrammatic language expose how meanings of words in relative clauses interact with each other. We show how our int… ▽ More Within the categorical compositional distributional model of meaning, we provide semantic interpretations for the subject and object roles of the possessive relative pronoun `whose'. This is done in terms of Frobenius algebras over compact closed categories. These algebras and their diagrammatic language expose how meanings of words in relative clauses interact with each other. We show how our interpretation is related to Montague-style semantics and provide a truth-theoretic interpretation. We also show how vector spaces provide a concrete interpretation and provide preliminary corpus-based experimental evidence. In a prequel to this paper, we used similar methods and dealt with the case of subject and object relative pronouns. △ Less

Submitted 18 June, 2014; originally announced June 2014.

Comments: 40 pages, Journal of Logic and Computation, Essays dedicated to Roy Dyckhoff on the occasion of his retirement, S. Graham-Lengrand and D. Galmiche (eds.), 2014

MSC Class: 18Dxx; 18Axx ACM Class: I.2.7; F.4.1

arXiv:1404.5278 [pdf, ps, other]

doi 10.1093/logcom/ext044

The Frobenius anatomy of word meanings I: subject and object relative pronouns

Authors: Mehrnoosh Sadrzadeh, Stephen Clark, Bob Coecke

Abstract: This paper develops a compositional vector-based semantics of subject and object relative pronouns within a categorical framework. Frobenius algebras are used to formalise the operations required to model the semantics of relative pronouns, including passing information between the relative clause and the modified noun phrase, as well as copying, combining, and discarding parts of the relative cla… ▽ More This paper develops a compositional vector-based semantics of subject and object relative pronouns within a categorical framework. Frobenius algebras are used to formalise the operations required to model the semantics of relative pronouns, including passing information between the relative clause and the modified noun phrase, as well as copying, combining, and discarding parts of the relative clause. We develop two instantiations of the abstract semantics, one based on a truth-theoretic approach and one based on corpus statistics. △ Less

Submitted 21 April, 2014; originally announced April 2014.

Comments: 31 pages

Journal ref: Journal of Logic and Computation, Special Issue: The Incomputable, an Isaac Newton Institute Workshop, 23(6), pp.1293-1317, 2013

arXiv:1312.5985 [pdf, other]

Learning Type-Driven Tensor-Based Meaning Representations

Authors: Tamara Polajnar, Luana Fagarasan, Stephen Clark

Abstract: This paper investigates the learning of 3rd-order tensors representing the semantics of transitive verbs. The meaning representations are part of a type-driven tensor-based semantic framework, from the newly emerging field of compositional distributional semantics. Standard techniques from the neural networks literature are used to learn the tensors, which are tested on a selectional preference-st… ▽ More This paper investigates the learning of 3rd-order tensors representing the semantics of transitive verbs. The meaning representations are part of a type-driven tensor-based semantic framework, from the newly emerging field of compositional distributional semantics. Standard techniques from the neural networks literature are used to learn the tensors, which are tested on a selectional preference-style task with a simple 2-dimensional sentence space. Promising results are obtained against a competitive corpus-based baseline. We argue that extending this work beyond transitive verbs, and to higher-dimensional sentence spaces, is an interesting and challenging problem for the machine learning community to consider. △ Less

Submitted 18 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

Comments: Submitted as part of the open review process for ICLR'14. The paper contains 10 pages, 3 figures, 4 tables

ACM Class: H.3.1

arXiv:1305.0556 [pdf, other]

A quantum teleportation inspired algorithm produces sentence meaning from word meaning and grammatical structure

Authors: Stephen Clark, Bob Coecke, Edward Grefenstette, Stephen Pulman, Mehrnoosh Sadrzadeh

Abstract: We discuss an algorithm which produces the meaning of a sentence given meanings of its words, and its resemblance to quantum teleportation. In fact, this protocol was the main source of inspiration for this algorithm which has many applications in the area of Natural Language Processing. We discuss an algorithm which produces the meaning of a sentence given meanings of its words, and its resemblance to quantum teleportation. In fact, this protocol was the main source of inspiration for this algorithm which has many applications in the area of Natural Language Processing. △ Less

Submitted 11 October, 2013; v1 submitted 2 May, 2013; originally announced May 2013.

Comments: 10 pages, many pictures

MSC Class: 68T50 ACM Class: I.2.7

arXiv:1101.0309 [pdf, ps, other]

Concrete Sentence Spaces for Compositional Distributional Models of Meaning

Authors: Edward Grefenstette, Mehrnoosh Sadrzadeh, Stephen Clark, Bob Coecke, Stephen Pulman

Abstract: Coecke, Sadrzadeh, and Clark (arXiv:1003.4394v1 [cs.CL]) developed a compositional model of meaning for distributional semantics, in which each word in a sentence has a meaning vector and the distributional meaning of the sentence is a function of the tensor products of the word vectors. Abstractly speaking, this function is the morphism corresponding to the grammatical structure of the sentence i… ▽ More Coecke, Sadrzadeh, and Clark (arXiv:1003.4394v1 [cs.CL]) developed a compositional model of meaning for distributional semantics, in which each word in a sentence has a meaning vector and the distributional meaning of the sentence is a function of the tensor products of the word vectors. Abstractly speaking, this function is the morphism corresponding to the grammatical structure of the sentence in the category of finite dimensional vector spaces. In this paper, we provide a concrete method for implementing this linear meaning map, by constructing a corpus-based vector space for the type of sentence. Our construction method is based on structured vector spaces whereby meaning vectors of all sentences, regardless of their grammatical structure, live in the same vector space. Our proposed sentence space is the tensor product of two noun spaces, in which the basis vectors are pairs of words each augmented with a grammatical role. This enables us to compare meanings of sentences by simply taking the inner product of their vectors. △ Less

Submitted 31 December, 2010; originally announced January 2011.

Comments: 10 pages, presented at the International Conference on Computational Semantics 2011 (IWCS'11), to be published in proceedings

MSC Class: 68T50 ACM Class: G.1.3; H.3.1; H.3.3

Journal ref: Proceedings of the 9th International Conference on Computational Semantics (2011)

arXiv:1012.0531 [pdf, ps, other]

doi 10.1063/1.3672009

Categorical Tensor Network States

Authors: Jacob D. Biamonte, Stephen R. Clark, Dieter Jaksch

Abstract: We examine the use of string diagrams and the mathematics of category theory in the description of quantum states by tensor networks. This approach lead to a unification of several ideas, as well as several results and methods that have not previously appeared in either side of the literature. Our approach enabled the development of a tensor network framework allowing a solution to the quantum dec… ▽ More We examine the use of string diagrams and the mathematics of category theory in the description of quantum states by tensor networks. This approach lead to a unification of several ideas, as well as several results and methods that have not previously appeared in either side of the literature. Our approach enabled the development of a tensor network framework allowing a solution to the quantum decomposition problem which has several appealing features. Specifically, given an n-body quantum state S, we present a new and general method to factor S into a tensor network of clearly defined building blocks. We use the solution to expose a previously unknown and large class of quantum states which we prove can be sampled efficiently and exactly. This general framework of categorical tensor network states, where a combination of generic and algebraically defined tensors appear, enhances the theory of tensor network states. △ Less

Submitted 17 December, 2011; v1 submitted 2 December, 2010; originally announced December 2010.

Comments: 39 pages, 31 figures, published version

Journal ref: AIP Advances 1(4), 042172 (2011)

arXiv:1003.4394 [pdf, other]

Mathematical Foundations for a Compositional Distributional Model of Meaning

Authors: Bob Coecke, Mehrnoosh Sadrzadeh, Stephen Clark

Abstract: We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the meaning of a well-typed sentence from the meanings of its constituents. Concretely, the type reduct… ▽ More We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the meaning of a well-typed sentence from the meanings of its constituents. Concretely, the type reductions of Pregroups are `lifted' to morphisms in a category, a procedure that transforms meanings of constituents into a meaning of the (well-typed) whole. Importantly, meanings of whole sentences live in a single space, independent of the grammatical structure of the sentence. Hence the inner-product can be used to compare meanings of arbitrary sentences, as it is for comparing the meanings of words in the distributional model. The mathematical structure we employ admits a purely diagrammatic calculus which exposes how the information flows between the words in a sentence in order to make up the meaning of the whole sentence. A variation of our `categorical model' which involves constraining the scalars of the vector spaces to the semiring of Booleans results in a Montague-style Boolean-valued semantics. △ Less

Submitted 23 March, 2010; originally announced March 2010.

Comments: to appear

Journal ref: Lambek Festschirft, special issue of Linguistic Analysis, 2010.

Showing 1–48 of 48 results for author: Clark, S