Search | arXiv e-print repository

Optimal Activation of Halting Multi-Armed Bandit Models

Authors: Wesley Cowan, Michael N. Katehakis, Sheldon M. Ross

Abstract: We study new types of dynamic allocation problems the {\sl Halting Bandit} models. As an application, we obtain new proofs for the classic Gittins index decomposition result and recent results of the authors in `Multi-armed bandits under general depreciation and commitment.' We study new types of dynamic allocation problems the {\sl Halting Bandit} models. As an application, we obtain new proofs for the classic Gittins index decomposition result and recent results of the authors in `Multi-armed bandits under general depreciation and commitment.' △ Less

Submitted 20 April, 2023; originally announced April 2023.

MSC Class: 68T05; 68Q32; 62L10 ACM Class: G.3

arXiv:2304.09299 [pdf, other]

Virtual Fidgets: Opportunities and Design Principles for Bringing Fidgeting to Online Learning

Authors: Sam Ross, Nicole Sullivan, Jina Yoon

Abstract: We present design guidelines for incorporating fidgeting into the virtual world as a tool for students in online lectures. Fidgeting is associated with increased attention and self-regulation, and has the potential to help students focus. Currently there are no fidgets, physical or virtual, designed for preserving attention specifically in online learning environments, and no heuristics for design… ▽ More We present design guidelines for incorporating fidgeting into the virtual world as a tool for students in online lectures. Fidgeting is associated with increased attention and self-regulation, and has the potential to help students focus. Currently there are no fidgets, physical or virtual, designed for preserving attention specifically in online learning environments, and no heuristics for designing fidgets within this domain. We identify three virtual fidget proxies to serve as design probes for studying student experiences with virtual fidgeting. Through a study of eight students using our virtual fidget proxies in online lectures, we identify eight emergent themes that encompass student experience with virtual fidgeting in lectures. Based on these themes, we present four principles for designing domain-specific virtual fidgets for online lectures. We identify that virtual fidgets for lectures should be context-aware, visually appealing, easy to adopt, and physically interactive. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 6 pages, 3 figures, CHI LBW 2023

arXiv:2302.07080 [pdf, other]

doi 10.1145/3581641.3584037

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Authors: Steven I. Ross, Fernando Martinez, Stephanie Houde, Michael Muller, Justin D. Weisz

Abstract: Large language models (LLMs) have recently been applied in software engineering to perform tasks such as translating code between programming languages, generating code from natural language, and autocompleting code as it is being written. When used within development tools, these systems typically treat each model invocation independently from all previous invocations, and only a specific limited… ▽ More Large language models (LLMs) have recently been applied in software engineering to perform tasks such as translating code between programming languages, generating code from natural language, and autocompleting code as it is being written. When used within development tools, these systems typically treat each model invocation independently from all previous invocations, and only a specific limited functionality is exposed within the user interface. This approach to user interaction misses an opportunity for users to more deeply engage with the model by having the context of their previous interactions, as well as the context of their code, inform the model's responses. We developed a prototype system -- the Programmer's Assistant -- in order to explore the utility of conversational interactions grounded in code, as well as software engineers' receptiveness to the idea of conversing with, rather than invoking, a code-fluent LLM. Through an evaluation with 42 participants with varied levels of programming experience, we found that our system was capable of conducting extended, multi-turn discussions, and that it enabled additional knowledge and capabilities beyond code generation to emerge from the LLM. Despite skeptical initial expectations for conversational programming assistance, participants were impressed by the breadth of the assistant's capabilities, the quality of its responses, and its potential for improving their productivity. Our work demonstrates the unique potential of conversational interactions with LLMs for co-creative processes like software development. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 43 pages, 3 figures. To be published in IUI 2023

arXiv:2301.10016 [pdf, other]

A Case Study in Engineering a Conversational Programming Assistant's Persona

Authors: Steven I. Ross, Michael Muller, Fernando Martinez, Stephanie Houde, Justin D. Weisz

Abstract: The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A di… ▽ More The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A discussion of the evolution of the prompt provides a case study in how to coax an existing foundation model to behave in a desirable manner for a particular application. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 11 pages. Submitted to the 4th Workshop on Human-AI Co-Creation with Generative Models (HAI-GEN) at IUI 2023

arXiv:2208.14007 [pdf, other]

Finding neural signatures for obesity through feature selection on source-localized EEG

Authors: Yuan Yue, Dirk De Ridder, Patrick Manning, Samantha Ross, Jeremiah D. Deng

Abstract: Obesity is a serious issue in the modern society and is often associated to significantly reduced quality of life. Current research conducted to explore obesity-related neurological evidences using electroencephalography (EEG) data are limited to traditional approaches. In this study, we developed a novel machine learning model to identify brain networks of obese females using alpha band functiona… ▽ More Obesity is a serious issue in the modern society and is often associated to significantly reduced quality of life. Current research conducted to explore obesity-related neurological evidences using electroencephalography (EEG) data are limited to traditional approaches. In this study, we developed a novel machine learning model to identify brain networks of obese females using alpha band functional connectivity features derived from EEG data. An overall classification accuracy of 0.937 is achieved. Our finding suggests that the obese brain is characterized by a dysfunctional network in which the areas that responsible for processing self-referential information and environmental context information are impaired. △ Less

Submitted 21 June, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: 4 pages, 3 figures, conference submission

arXiv:2202.07682 [pdf, other]

doi 10.1145/3490099.3511157

Better Together? An Evaluation of AI-Supported Code Translation

Authors: Justin D. Weisz, Michael Muller, Steven I. Ross, Fernando Martinez, Stephanie Houde, Mayank Agarwal, Kartik Talamadupula, John T. Richards

Abstract: Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful i… ▽ More Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 35 pages, 3 figures. To be published in IUI 2022

arXiv:2110.05423 [pdf, other]

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Authors: Mayank Agarwal, Kartik Talamadupula, Fernando Martinez, Stephanie Houde, Michael Muller, John Richards, Steven I Ross, Justin D. Weisz

Abstract: Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, sup… ▽ More Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, supervised techniques have only been applied to a limited set of popular programming languages. To bypass this limitation, unsupervised neural machine translation techniques have been proposed to learn code translation using only monolingual corpora. In this work, we propose to use document similarity methods to create noisy parallel datasets of code, thus enabling supervised techniques to be applied for automated code translation without having to rely on the availability or expensive curation of parallel code datasets. We explore the noise tolerance of models trained on such automatically-created datasets and show that these models perform comparably to models trained on ground truth for reasonable levels of noise. Finally, we exhibit the practical utility of the proposed method by creating parallel datasets for languages beyond the ones explored in prior work, thus expanding the set of programming languages for automated code translation. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2109.11043 [pdf, other]

Learning Predictive and Interpretable Timeseries Summaries from ICU Data

Authors: Nari Johnson, Sonali Parbhoo, Andrew Slavin Ross, Finale Doshi-Velez

Abstract: Machine learning models that utilize patient data across time (rather than just the most recent measurements) have increased performance for many risk stratification tasks in the intensive care unit. However, many of these models and their learned representations are complex and therefore difficult for clinicians to interpret, creating challenges for validation. Our work proposes a new procedure t… ▽ More Machine learning models that utilize patient data across time (rather than just the most recent measurements) have increased performance for many risk stratification tasks in the intensive care unit. However, many of these models and their learned representations are complex and therefore difficult for clinicians to interpret, creating challenges for validation. Our work proposes a new procedure to learn summaries of clinical time-series that are both predictive and easily understood by humans. Specifically, our summaries consist of simple and intuitive functions of clinical data (e.g. falling mean arterial pressure). Our learned summaries outperform traditional interpretable model classes and achieve performance comparable to state-of-the-art deep learning models on an in-hospital mortality classification task. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: 10 pages, 3 figures, AMIA 2021 Annual Symposium

arXiv:2108.05295 [pdf, other]

doi 10.37236/10808

Linear Bounds for Cycle-free Saturation Games

Authors: Sean English, Tomáš Masařík, Grace McCourt, Erin Meger, Michael S. Ross, Sam Spiro

Abstract: Given a family of graphs $\mathcal{F}$, we define the $\mathcal{F}$-saturation game as follows. Two players alternate adding edges to an initially empty graph on $n$ vertices, with the only constraint being that neither player can add an edge that creates a subgraph in $\mathcal{F}$. The game ends when no more edges can be added to the graph. One of the players wishes to end the game as quickly as… ▽ More Given a family of graphs $\mathcal{F}$, we define the $\mathcal{F}$-saturation game as follows. Two players alternate adding edges to an initially empty graph on $n$ vertices, with the only constraint being that neither player can add an edge that creates a subgraph in $\mathcal{F}$. The game ends when no more edges can be added to the graph. One of the players wishes to end the game as quickly as possible, while the other wishes to prolong the game. We let $\textrm{sat}_g(n,\mathcal{F})$ denote the number of edges that are in the final graph when both players play optimally. In general there are very few non-trivial bounds on the order of magnitude of $\textrm{sat}_g(n,\mathcal{F})$. In this work, we find collections of infinite families of cycles $\mathcal{C}$ such that $\textrm{sat}_g(n,\mathcal{C})$ has linear growth rate. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: 18 pages, 2 figures

MSC Class: 05C57

Journal ref: The Electronic Journal of Combinatorics 29(3), 5:1-5:21, 2022

arXiv:2106.06848 [pdf, other]

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

Authors: MohammadJavad Azizi, Sheldon M Ross, Zhengyu Zhang

Abstract: We consider the problem of finding, through adaptive sampling, which of $n$ options (arms) has the largest mean. Our objective is to determine a rule which identifies the best arm with a fixed minimum confidence using as few observations as possible, i.e. this is a fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with bot… ▽ More We consider the problem of finding, through adaptive sampling, which of $n$ options (arms) has the largest mean. Our objective is to determine a rule which identifies the best arm with a fixed minimum confidence using as few observations as possible, i.e. this is a fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with both Bernoulli and Gaussian arms. We propose to use the classical "vector at a time" (VT) rule, which samples each remaining arm once in each round. We show how VT can be implemented and analyzed in our Bayesian setting and be improved by early elimination. Our analysis show that these algorithms guarantee an optimal strategy under the prior. We also propose and analyze a variant of the classical "play the winner" (PW) algorithm. Numerical results show that these rules compare favorably with state-of-art algorithms. △ Less

Submitted 15 March, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

arXiv:2105.08486 [pdf, other]

Univariate Long-Term Municipal Water Demand Forecasting

Authors: Blake VanBerlo, Matthew A. S. Ross, Daniel Hsia

Abstract: This study describes an investigation into the modelling of citywide water consumption in London, Canada. Multiple modelling techniques were evaluated for the task of univariate time series forecasting with water consumption, including linear regression, Facebook's Prophet method, recurrent neural networks, and convolutional neural networks. Prophet was identified as the model of choice, having ac… ▽ More This study describes an investigation into the modelling of citywide water consumption in London, Canada. Multiple modelling techniques were evaluated for the task of univariate time series forecasting with water consumption, including linear regression, Facebook's Prophet method, recurrent neural networks, and convolutional neural networks. Prophet was identified as the model of choice, having achieved a mean absolute percentage error of 2.51%, averaged across a 5-fold cross validation. Prophet was also found to have other advantages deemed valuable to water demand management stakeholders, including inherent interpretability and graceful handling of missing data. The implementation for the methods described in this paper has been open sourced, as they may be adaptable by other municipalities. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: 11 pages, 6 figures

arXiv:2104.03820 [pdf, other]

doi 10.1145/3397481.3450656

Perfection Not Required? Human-AI Partnerships in Code Translation

Authors: Justin D. Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, Kartik Talamadupula

Abstract: Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, suc… ▽ More Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, such as compilation or logical errors. We examine the extent to which software engineers would tolerate such imperfections and explore ways to aid the detection and correction of those errors. Using a design scenario approach, we interviewed 11 software engineers to understand their reactions to the use of an NMT model in the context of application modernization, focusing on the task of translating source code from one language to another. Our three-stage scenario sparked discussions about the utility and desirability of working with an imperfect AI system, how acceptance of that system's outputs would be established, and future opportunities for generative AI in application modernization. Our study highlights how UI features such as confidence highlighting and alternate translations help software engineers work with and better understand generative NMT models. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: 18 pages, 1 figure. To be published in IUI 2021

arXiv:2102.05185 [pdf, other]

Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Authors: Andrew Slavin Ross, Finale Doshi-Velez

Abstract: In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind a dataset, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hi… ▽ More In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind a dataset, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations. △ Less

Submitted 8 April, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: ICML 2021 paper, fixed incorrect version upload

arXiv:2102.01264 [pdf, other]

Evaluating the Interpretability of Generative Models by Interactive Reconstruction

Authors: Andrew Slavin Ross, Nina Chen, Elisa Zhao Hang, Elena L. Glassman, Finale Doshi-Velez

Abstract: For machine learning models to be most useful in numerous sociotechnical systems, many have argued that they must be human-interpretable. However, despite increasing interest in interpretability, there remains no firm consensus on how to measure it. This is especially true in representation learning, where interpretability research has focused on "disentanglement" measures only applicable to synth… ▽ More For machine learning models to be most useful in numerous sociotechnical systems, many have argued that they must be human-interpretable. However, despite increasing interest in interpretability, there remains no firm consensus on how to measure it. This is especially true in representation learning, where interpretability research has focused on "disentanglement" measures only applicable to synthetic datasets and not grounded in human factors. We introduce a task to quantify the human-interpretability of generative model representations, where users interactively modify representations to reconstruct target instances. On synthetic datasets, we find performance on this task much more reliably differentiates entangled and disentangled models than baseline approaches. On a real dataset, we find it differentiates between representation learning methods widely believed but never shown to produce more or less interpretable models. In both cases, we ran small-scale think-aloud studies and large-scale experiments on Amazon Mechanical Turk to confirm that our qualitative and quantitative results agreed. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: CHI 2021 accepted paper

arXiv:2012.07581 [pdf, other]

Quality Estimation & Interpretability for Code Translation

Authors: Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, Michael Muller, John Richards, Steven Ross, Justin D. Weisz

Abstract: Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the transl… ▽ More Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the translations; and consequently ascribe some measure of interpretability to the model's choices. In this paper, we attempt to estimate the quality of source code translations built on top of the TransCoder model. We consider the code translation task as an analog of machine translation (MT) for natural languages, with some added caveats. We present our main motivation from a user study built around code translation; and present a technique that correlates the confidences generated by that model to lint errors in the translated code. We conclude with some observations on these correlations, and some ideas for future work. △ Less

Submitted 26 April, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: NeurIPS 2020 Workshop on Computer-Assisted Programming

arXiv:2010.13778 [pdf]

doi 10.1088/2058-9565/abfa64

Achieving a quantum smart workforce

Authors: Clarice D. Aiello, D. D. Awschalom, Hannes Bernien, Tina Brower-Thomas, Kenneth R. Brown, Todd A. Brun, Justin R. Caram, Eric Chitambar, Rosa Di Felice, Michael F. J. Fox, Stephan Haas, Alexander W. Holleitner, Eric R. Hudson, Jeffrey H. Hunt, Robert Joynt, Scott Koziol, H. J. Lewandowski, Douglas T. McClure, Jens Palsberg, Gina Passante, Kristen L. Pudenz, Christopher J. K. Richardson, Jessica L. Rosenberg, R. S. Ross, Mark Saffman , et al. (7 additional authors not shown)

Abstract: Interest in building dedicated Quantum Information Science and Engineering (QISE) education programs has greatly expanded in recent years. These programs are inherently convergent, complex, often resource intensive and likely require collaboration with a broad variety of stakeholders. In order to address this combination of challenges, we have captured ideas from many members in the community. Thi… ▽ More Interest in building dedicated Quantum Information Science and Engineering (QISE) education programs has greatly expanded in recent years. These programs are inherently convergent, complex, often resource intensive and likely require collaboration with a broad variety of stakeholders. In order to address this combination of challenges, we have captured ideas from many members in the community. This manuscript not only addresses policy makers and funding agencies (both public and private and from the regional to the international level) but also contains needs identified by industry leaders and discusses the difficulties inherent in creating an inclusive QISE curriculum. We report on the status of eighteen post-secondary education programs in QISE and provide guidance for building new programs. Lastly, we encourage the development of a comprehensive strategic plan for quantum education and workforce development as a means to make the most of the ongoing substantial investments being made in QISE. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: 18 pages, 2 figures, 1 table

Journal ref: Quantum Sci. Technol. 6 030501 (2021)

arXiv:2009.09086 [pdf, other]

Focused Clinical Query Understanding and Retrieval of Medical Snippets powered through a Healthcare Knowledge Graph

Authors: Maulik R. Kamdar, Michael Carroll, Will Dowling, Linda Wogulis, Cailey Fitzgerald, Matt Corkum, Danielle Walsh, David Conrad, Craig E. Stanley, Jr., Steve Ross, Dru Henke, Mevan Samarasinghe

Abstract: Clinicians face several significant barriers to search and synthesize accurate, succinct, updated, and trustworthy medical information from several literature sources during the practice of medicine and patient care. In this talk, we will be presenting our research behind the development of a Focused Clinical Search Service, powered by a Healthcare Knowledge Graph, to interpret the query intent be… ▽ More Clinicians face several significant barriers to search and synthesize accurate, succinct, updated, and trustworthy medical information from several literature sources during the practice of medicine and patient care. In this talk, we will be presenting our research behind the development of a Focused Clinical Search Service, powered by a Healthcare Knowledge Graph, to interpret the query intent behind clinical search queries and retrieve relevant medical snippets from a diverse corpus of medical literature. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: Under Review as a Podium Talk at the AMIA Informatics Summit 2021

arXiv:2009.09072 [pdf, other]

Interpretable Machine Learning Approaches to Prediction of Chronic Homelessness

Authors: Blake VanBerlo, Matthew A. S. Ross, Jonathan Rivard, Ryan Booker

Abstract: We introduce a machine learning approach to predict chronic homelessness from de-identified client shelter records drawn from a commonly used Canadian homelessness management information system. Using a 30-day time step, a dataset for 6521 individuals was generated. Our model, HIFIS-RNN-MLP, incorporates both static and dynamic features of a client's history to forecast chronic homelessness 6 mont… ▽ More We introduce a machine learning approach to predict chronic homelessness from de-identified client shelter records drawn from a commonly used Canadian homelessness management information system. Using a 30-day time step, a dataset for 6521 individuals was generated. Our model, HIFIS-RNN-MLP, incorporates both static and dynamic features of a client's history to forecast chronic homelessness 6 months into the client's future. The training method was fine-tuned to achieve a high F1-score, giving a desired balance between high recall and precision. Mean recall and precision across 10-fold cross validation were 0.921 and 0.651 respectively. An interpretability method was applied to explain individual predictions and gain insight into the overall factors contributing to chronic homelessness among the population studied. The model achieves state-of-the-art performance and improved stakeholder trust of what is usually a "black box" neural network model through interpretable AI. △ Less

Submitted 12 September, 2020; originally announced September 2020.

Comments: 14 pages, 7 figures, submitted to Engineering Applications of Artificial Intelligence

arXiv:1911.01291 [pdf, other]

Ensembles of Locally Independent Prediction Models

Authors: Andrew Slavin Ross, Weiwei Pan, Leo Anthony Celi, Finale Doshi-Velez

Abstract: Ensembles depend on diversity for improved performance. Many ensemble training methods, therefore, attempt to optimize for diversity, which they almost always define in terms of differences in training set predictions. In this paper, however, we demonstrate the diversity of predictions on the training set does not necessarily imply diversity under mild covariate shift, which can harm generalizatio… ▽ More Ensembles depend on diversity for improved performance. Many ensemble training methods, therefore, attempt to optimize for diversity, which they almost always define in terms of differences in training set predictions. In this paper, however, we demonstrate the diversity of predictions on the training set does not necessarily imply diversity under mild covariate shift, which can harm generalization in practical settings. To address this issue, we introduce a new diversity metric and associated method of training ensembles of models that extrapolate differently on local patches of the data manifold. Across a variety of synthetic and real-world tasks, we find that our method improves generalization and diversity in qualitatively novel ways, especially under data limits and covariate shift. △ Less

Submitted 7 February, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: This is an expansion of arXiv:1806.08716 with different applications and focus, accepted to AAAI 2020. Latest update clarifies a derivation

arXiv:1909.08792 [pdf, other]

Agent Prioritization for Autonomous Navigation

Authors: Khaled S. Refaat, Kai Ding, Natalia Ponomareva, Stéphane Ross

Abstract: In autonomous navigation, a planning system reasons about other agents to plan a safe and plausible trajectory. Before planning starts, agents are typically processed with computationally intensive models for recognition, tracking, motion estimation and prediction. With limited computational resources and a large number of agents to process in real time, it becomes important to efficiently rank ag… ▽ More In autonomous navigation, a planning system reasons about other agents to plan a safe and plausible trajectory. Before planning starts, agents are typically processed with computationally intensive models for recognition, tracking, motion estimation and prediction. With limited computational resources and a large number of agents to process in real time, it becomes important to efficiently rank agents according to their impact on the decision making process. This allows spending more time processing the most important agents. We propose a system to rank agents around an autonomous vehicle (AV) in real time. We automatically generate a ranking data set by running the planner in simulation on real-world logged data, where we can afford to run more accurate and expensive models on all the agents. The causes of various planner actions are logged and used for assigning ground truth importance scores. The generated data set can be used to learn ranking models. In particular, we show the utility of combining learned features, via a convolutional neural network, with engineered features designed to capture domain knowledge. We show the benefits of various design choices experimentally. When tested on real AVs, our system demonstrates the capability of understanding complex driving situations. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: 8 pages, accepted to IEEE/RSJ International Conference on Robots and Systems (IROS) 2019

arXiv:1906.05433 [pdf, other]

Tackling Climate Change with Machine Learning

Authors: David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio

Abstract: Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine lea… ▽ More Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change. △ Less

Submitted 5 November, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: For additional resources, please visit the website that accompanies this paper: https://www.climatechange.ai/

arXiv:1810.00869 [pdf, other]

Training Machine Learning Models by Regularizing their Explanations

Authors: Andrew Slavin Ross

Abstract: Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially when conditions in training may differ from those in practice. Recent efforts to develop explanations for neural networks and machine learning models more generally have produced tools to shed light on the implicit rules beh… ▽ More Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially when conditions in training may differ from those in practice. Recent efforts to develop explanations for neural networks and machine learning models more generally have produced tools to shed light on the implicit rules behind predictions. These tools can help us identify when models are right for the wrong reasons. However, they do not always scale to explaining predictions for entire datasets, are not always at the right level of abstraction, and most importantly cannot correct the problems they reveal. In this thesis, we explore the possibility of training machine learning models (with a particular focus on neural networks) using explanations themselves. We consider approaches where models are penalized not only for making incorrect predictions but also for providing explanations that are either inconsistent with domain knowledge or overly complex. These methods let us train models which can not only provide more interpretable rationales for their predictions but also generalize better when training data is confounded or meaningfully different from test data (even adversarially so). △ Less

Submitted 29 September, 2018; originally announced October 2018.

Comments: Harvard CSE master's thesis; includes portions of arxiv:1703.03717 and arxiv:1711.09404

arXiv:1806.08716 [pdf, other]

Learning Qualitatively Diverse and Interpretable Rules for Classification

Authors: Andrew Slavin Ross, Weiwei Pan, Finale Doshi-Velez

Abstract: There has been growing interest in developing accurate models that can also be explained to humans. Unfortunately, if there exist multiple distinct but accurate models for some dataset, current machine learning methods are unlikely to find them: standard techniques will likely recover a complex model that combines them. In this work, we introduce a way to identify a maximal set of distinct but acc… ▽ More There has been growing interest in developing accurate models that can also be explained to humans. Unfortunately, if there exist multiple distinct but accurate models for some dataset, current machine learning methods are unlikely to find them: standard techniques will likely recover a complex model that combines them. In this work, we introduce a way to identify a maximal set of distinct but accurate models for a dataset. We demonstrate empirically that, in situations where the data supports multiple accurate classifiers, we tend to recover simpler, more interpretable classifiers rather than more complex ones. △ Less

Submitted 19 July, 2018; v1 submitted 22 June, 2018; originally announced June 2018.

Comments: Presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden (revision fixes minor issues)

arXiv:1805.11571 [pdf, other]

Human-in-the-Loop Interpretability Prior

Authors: Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez

Abstract: We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user stu… ▽ More We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user studies to find models that are both predictive and interpretable and demonstrate our approach on several data sets. Our human subjects results show trends towards different proxy notions of interpretability on different datasets, which suggests that different proxies are preferred on different tasks. △ Less

Submitted 30 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: To appear at NIPS 2018, selected for a spotlight. 13 pages (incl references and appendix)

arXiv:1711.09404 [pdf, other]

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

Authors: Andrew Slavin Ross, Finale Doshi-Velez

Abstract: Deep neural networks have proven remarkably effective at solving many classification problems, but have been criticized recently for two major weaknesses: the reasons behind their predictions are uninterpretable, and the predictions themselves can often be fooled by small adversarial perturbations. These problems pose major obstacles for the adoption of neural networks in domains that require secu… ▽ More Deep neural networks have proven remarkably effective at solving many classification problems, but have been criticized recently for two major weaknesses: the reasons behind their predictions are uninterpretable, and the predictions themselves can often be fooled by small adversarial perturbations. These problems pose major obstacles for the adoption of neural networks in domains that require security or transparency. In this work, we evaluate the effectiveness of defenses that differentiably penalize the degree to which small changes in inputs can alter model predictions. Across multiple attacks, architectures, defenses, and datasets, we find that neural networks trained with this input gradient regularization exhibit robustness to transferred adversarial examples generated to fool all of the other models. We also find that adversarial examples generated to fool gradient-regularized models fool all other models equally well, and actually lead to more "legitimate," interpretable misclassifications as rated by people (which we confirm in a human subject experiment). Finally, we demonstrate that regularizing input gradients makes them more naturally interpretable as rationales for model predictions. We conclude by discussing this relationship between interpretability and robustness in deep neural networks. △ Less

Submitted 26 November, 2017; originally announced November 2017.

Comments: To appear in AAAI 2018

arXiv:1703.03717 [pdf, other]

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

Authors: Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez

Abstract: Neural networks are among the most accurate supervised learning methods in use today, but their opacity makes them difficult to trust in critical applications, especially when conditions in training differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions, which can help us identify when models are righ… ▽ More Neural networks are among the most accurate supervised learning methods in use today, but their opacity makes them difficult to trust in critical applications, especially when conditions in training differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions, which can help us identify when models are right for the wrong reasons. However, these methods do not scale to explaining entire datasets and cannot correct the problems they reveal. We introduce a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary. We apply these penalties both based on expert annotation and in an unsupervised fashion that encourages diverse models with qualitatively different decision boundaries for the same classification problem. On multiple datasets, we show our approach generates faithful explanations and models that generalize much better when conditions differ between training and test. △ Less

Submitted 25 May, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

arXiv:1408.2065 [pdf]

Normalized Online Learning

Authors: Stephane Ross, Paul Mineiro, John Langford

Abstract: We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. △ Less

Submitted 9 August, 2014; originally announced August 2014.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Report number: UAI-P-2013-PG-537-545

arXiv:1406.5979 [pdf, ps, other]

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Authors: Stephane Ross, J. Andrew Bagnell

Abstract: Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti… ▽ More Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of actions. We extend existing results in two directions: first, we develop an interactive imitation learning approach that leverages cost information; second, we extend the technique to address reinforcement learning. The results provide theoretical support to the commonly observed successes of online approximate policy iteration. Our approach suggests a broad new family of algorithms and provides a unifying view of existing techniques for imitation and reinforcement learning. △ Less

Submitted 23 June, 2014; originally announced June 2014.

Comments: 14 pages. Under review for NIPS 2014 conference

arXiv:1406.1837 [pdf, other]

A Credit Assignment Compiler for Joint Prediction

Authors: Kai-Wei Chang, He He, Hal Daumé III, John Langford, Stephane Ross

Abstract: Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search s… ▽ More Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time. △ Less

Submitted 1 June, 2016; v1 submitted 6 June, 2014; originally announced June 2014.

arXiv:1401.3436 [pdf]

doi 10.1613/jair.2567

Online Planning Algorithms for POMDPs

Authors: Stéphane Ross, Joelle Pineau, Sébastien Paquet, Brahim Chaib-draa

Abstract: Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the exec… ▽ More Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently. △ Less

Submitted 14 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 32, pages 663-704, 2008

arXiv:1308.3541 [pdf, ps, other]

Knapsack Constrained Contextual Submodular List Prediction with Application to Multi-document Summarization

Authors: Jiaji Zhou, Stephane Ross, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell

Abstract: We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maximization under knapsack constraint problems: CONSEQ… ▽ More We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maximization under knapsack constraint problems: CONSEQOPT (Dey et al., 2012) and SCP (Ross et al., 2013). Experiments on extractive multi-document summarization show that our approach outperforms existing state-of-the-art methods. △ Less

Submitted 15 March, 2014; v1 submitted 15 August, 2013; originally announced August 2013.

Comments: 8 pages, ICML 2013 Workshop on Inferning: Interactions between Inference and Learning

arXiv:1305.6646 [pdf, other]

Normalized Online Learning

Authors: Stephane Ross, Paul Mineiro, John Langford

Abstract: We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. △ Less

Submitted 28 May, 2013; originally announced May 2013.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

arXiv:1305.2532 [pdf, other]

Learning Policies for Contextual Submodular Prediction

Authors: Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell

Abstract: Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning.… ▽ More Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization. △ Less

Submitted 11 May, 2013; originally announced May 2013.

Comments: 13 pages. To appear in proceedings of the International Conference on Machine Learning (ICML), 2013

arXiv:1211.1690 [pdf, other]

Learning Monocular Reactive UAV Control in Cluttered Natural Environments

Authors: Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell, Martial Hebert

Abstract: Autonomous navigation for large Unmanned Aerial Vehicles (UAVs) is fairly straight-forward, as expensive sensors and monitoring devices can be employed. In contrast, obstacle avoidance remains a challenging task for Micro Aerial Vehicles (MAVs) which operate at low altitude in cluttered environments. Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous… ▽ More Autonomous navigation for large Unmanned Aerial Vehicles (UAVs) is fairly straight-forward, as expensive sensors and monitoring devices can be employed. In contrast, obstacle avoidance remains a challenging task for Micro Aerial Vehicles (MAVs) which operate at low altitude in cluttered environments. Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-the-art imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors. △ Less

Submitted 7 November, 2012; originally announced November 2012.

Comments: 8 pages, 10 figures

arXiv:1206.3281 [pdf]

Model-Based Bayesian Reinforcement Learning in Large Structured Domains

Authors: Stephane Ross, Joelle Pineau

Abstract: Model-based Bayesian reinforcement learning has generated significant interest in the AI community as it provides an elegant solution to the optimal exploration-exploitation tradeoff in classical reinforcement learning. Unfortunately, the applicability of this type of approach has been limited to small domains due to the high complexity of reasoning about the joint posterior over model parameters.… ▽ More Model-based Bayesian reinforcement learning has generated significant interest in the AI community as it provides an elegant solution to the optimal exploration-exploitation tradeoff in classical reinforcement learning. Unfortunately, the applicability of this type of approach has been limited to small domains due to the high complexity of reasoning about the joint posterior over model parameters. In this paper, we consider the use of factored representations combined with online planning techniques, to improve scalability of these methods. The main contribution of this paper is a Bayesian framework for learning the structure and parameters of a dynamical system, while also simultaneously planning a (near-)optimal sequence of actions. △ Less

Submitted 13 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

Report number: UAI-P-2008-PG-476-483

arXiv:1203.1007 [pdf, other]

Agnostic System Identification for Model-Based Reinforcement Learning

Authors: Stephane Ross, J. Andrew Bagnell

Abstract: A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particular,… ▽ More A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particular, we show that any no-regret online learning algorithm can be used to obtain a near-optimal policy, provided some model achieves low training error and access to a good exploration distribution. Our approach applies to both discrete and continuous domains. We demonstrate its efficacy and scalability on a challenging helicopter domain from the literature. △ Less

Submitted 3 July, 2012; v1 submitted 5 March, 2012; originally announced March 2012.

Comments: 8 pages, published in ICML 2012

arXiv:1108.3154 [pdf, ps, other]

Stability Conditions for Online Learnability

Authors: Stephane Ross, J. Andrew Bagnell

Abstract: Stability is a general notion that quantifies the sensitivity of a learning algorithm's output to small change in the training dataset (e.g. deletion or replacement of a single training sample). Such conditions have recently been shown to be more powerful to characterize learnability in the general learning setting under i.i.d. samples where uniform convergence is not necessary for learnability, b… ▽ More Stability is a general notion that quantifies the sensitivity of a learning algorithm's output to small change in the training dataset (e.g. deletion or replacement of a single training sample). Such conditions have recently been shown to be more powerful to characterize learnability in the general learning setting under i.i.d. samples where uniform convergence is not necessary for learnability, but where stability is both sufficient and necessary for learnability. We here show that similar stability conditions are also sufficient for online learnability, i.e. whether there exists a learning algorithm such that under any sequence of examples (potentially chosen adversarially) produces a sequence of hypotheses that has no regret in the limit with respect to the best hypothesis in hindsight. We introduce online stability, a stability condition related to uniform-leave-one-out stability in the batch setting, that is sufficient for online learnability. In particular we show that popular classes of online learners, namely algorithms that fall in the category of Follow-the-(Regularized)-Leader, Mirror Descent, gradient-based methods and randomized algorithms like Weighted Majority and Hedge, are guaranteed to have no regret if they have such online stability property. We provide examples that suggest the existence of an algorithm with such stability condition might in fact be necessary for online learnability. For the more restricted binary classification setting, we establish that such stability condition is in fact both sufficient and necessary. We also show that for a large class of online learnable problems in the general learning setting, namely those with a notion of sub-exponential covering, no-regret online algorithms that have such stability condition exists. △ Less

Submitted 17 August, 2011; v1 submitted 16 August, 2011; originally announced August 2011.

Comments: 16 pages. Earlier version of this work submitted (but rejected) to COLT 2011

arXiv:1011.0686 [pdf, other]

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Authors: Stephane Ross, Geoffrey J. Gordon, J. Andrew Bagnell

Abstract: Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or… ▽ More Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem. △ Less

Submitted 16 March, 2011; v1 submitted 2 November, 2010; originally announced November 2010.

Comments: Appearing in the 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011)

Showing 1–38 of 38 results for author: Ross, S