Search | arXiv e-print repository

Compositional Models for Estimating Causal Effects

Abstract: Many real-world systems can be represented as sets of interacting components. Examples of such systems include computational systems such as query processors, natural systems such as cells, and social systems such as families. Many approaches have been proposed in traditional (associational) machine learning to model such structured systems, including statistical relational models and graph neural… ▽ More Many real-world systems can be represented as sets of interacting components. Examples of such systems include computational systems such as query processors, natural systems such as cells, and social systems such as families. Many approaches have been proposed in traditional (associational) machine learning to model such structured systems, including statistical relational models and graph neural networks. Despite this prior work, existing approaches to estimating causal effects typically treat such systems as single units, represent them with a fixed set of variables and assume a homogeneous data-generating process. We study a compositional approach for estimating individual treatment effects (ITE) in structured systems, where each unit is represented by the composition of multiple heterogeneous components. This approach uses a modular architecture to model potential outcomes at each component and aggregates component-level potential outcomes to obtain the unit-level potential outcomes. We discover novel benefits of the compositional approach in causal inference - systematic generalization to estimate counterfactual outcomes of unseen combinations of components and improved overlap guarantees between treatment and control groups compared to the classical methods for causal effect estimation. We also introduce a set of novel environments for empirically evaluating the compositional approach and demonstrate the effectiveness of our approach using both simulated and real-world data. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.14321 [pdf, other]

An 808 Line Phasor-Based Dehomogenisation Matlab Code For Multi-Scale Topology Optimisation

Authors: Rebekka Varum Woldseth, Ole Sigmund, Peter Dørffler Ladegaard Jensen

Abstract: This work presents an 808-line Matlab educational code for combined multi-scale topology optimisation and phasor-based dehomogenisation titled deHomTop808. The multi-scale formulation utilises homogenisation of optimal microstructures to facilitate efficient coarse-scale optimisation. Dehomogenisation allows for a high-resolution single-scale reconstruction of the optimised multi-scale structure,… ▽ More This work presents an 808-line Matlab educational code for combined multi-scale topology optimisation and phasor-based dehomogenisation titled deHomTop808. The multi-scale formulation utilises homogenisation of optimal microstructures to facilitate efficient coarse-scale optimisation. Dehomogenisation allows for a high-resolution single-scale reconstruction of the optimised multi-scale structure, achieving minor losses in structural performance, at a fraction of the computational cost, compared to its large-scale topology optimisation counterpart. The presented code utilises stiffness optimal Rank-2 microstructures to minimise the compliance of a single-load case problem, subject to a volume fraction constraint. By exploiting the inherent efficiency benefits of the phasor-based dehomogenisation procedure, on-the-fly dehomogenisation to a single-scale structure is obtained. The presented code includes procedures for structural verification of the final dehomogenised structure by comparison to the multi-scale solution. The code is introduced in terms of the underlying theory and its major components, including examples and potential extensions, and can be downloaded from https://github.com/peterdorffler/deHomTop808.git. △ Less

Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.10883 [pdf, other]

Automated Discovery of Functional Actual Causes in Complex Environments

Authors: Caleb Chuck, Sankaran Vaidyanathan, Stephen Giguere, Amy Zhang, David Jensen, Scott Niekum

Abstract: Reinforcement learning (RL) algorithms often struggle to learn policies that generalize to novel situations due to issues such as causal confusion, overfitting to irrelevant factors, and failure to isolate control of state factors. These issues stem from a common source: a failure to accurately identify and exploit state-specific causal relationships in the environment. While some prior works in R… ▽ More Reinforcement learning (RL) algorithms often struggle to learn policies that generalize to novel situations due to issues such as causal confusion, overfitting to irrelevant factors, and failure to isolate control of state factors. These issues stem from a common source: a failure to accurately identify and exploit state-specific causal relationships in the environment. While some prior works in RL aim to identify these relationships explicitly, they rely on informal domain-specific heuristics such as spatial and temporal proximity. Actual causality offers a principled and general framework for determining the causes of particular events. However, existing definitions of actual cause often attribute causality to a large number of events, even if many of them rarely influence the outcome. Prior work on actual causality proposes normality as a solution to this problem, but its existing implementations are challenging to scale to complex and continuous-valued RL environments. This paper introduces functional actual cause (FAC), a framework that uses context-specific independencies in the environment to restrict the set of actual causes. We additionally introduce Joint Optimization for Actual Cause Inference (JACI), an algorithm that learns from observational data to infer functional actual causes. We demonstrate empirically that FAC agrees with known results on a suite of examples from the actual causality literature, and JACI identifies actual causes with significantly higher accuracy than existing heuristic methods in a set of complex, continuous-valued environments. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2311.06275 [pdf]

Algorithmic Robustness

Authors: David Jensen, Brian LaMacchia, Ufuk Topcu, Pamela Wisniewski

Abstract: Algorithmic robustness refers to the sustained performance of a computational system in the face of change in the nature of the environment in which that system operates or in the task that the system is meant to perform. Below, we motivate the importance of algorithmic robustness, present a conceptual framework, and highlight the relevant areas of research for which algorithmic robustness is rele… ▽ More Algorithmic robustness refers to the sustained performance of a computational system in the face of change in the nature of the environment in which that system operates or in the task that the system is meant to perform. Below, we motivate the importance of algorithmic robustness, present a conceptual framework, and highlight the relevant areas of research for which algorithmic robustness is relevant. Why robustness? Robustness is an important enabler of other goals that are frequently cited in the context of public policy decisions about computational systems, including trustworthiness, accountability, fairness, and safety. Despite this dependence, it tends to be under-recognized compared to these other concepts. This is unfortunate, because robustness is often more immediately achievable than these other ultimate goals, which can be more subjective and exacting. Thus, we highlight robustness as an important goal for researchers, engineers, regulators, and policymakers when considering the design, implementation, and deployment of computational systems. We urge researchers and practitioners to elevate the attention paid to robustness when designing and evaluating computational systems. For many key systems, the immediate question after any demonstration of high performance should be: "How robust is that performance to realistic changes in the task or environment?" Greater robustness will set the stage for systems that are more trustworthy, accountable, fair, and safe. Toward that end, this document provides a brief roadmap to some of the concepts and existing research around the idea of algorithmic robustness. △ Less

Submitted 17 October, 2023; originally announced November 2023.

arXiv:2307.09518 [pdf, other]

Efficient Inverse-designed Structural Infill for Complex Engineering Structures

Authors: Peter Dørffler Ladegaard Jensen, Tim Felle Olsen, J. Andreas Bærentzen, Niels Aage, Ole Sigmund

Abstract: Inverse design of high-resolution and fine-detailed 3D lightweight mechanical structures is notoriously expensive due to the need for vast computational resources and the use of very fine-scaled complex meshes. Furthermore, in designing for additive manufacturing, infill is often neglected as a component of the optimized structure. In this paper, both concerns are addressed using a de-homogenizati… ▽ More Inverse design of high-resolution and fine-detailed 3D lightweight mechanical structures is notoriously expensive due to the need for vast computational resources and the use of very fine-scaled complex meshes. Furthermore, in designing for additive manufacturing, infill is often neglected as a component of the optimized structure. In this paper, both concerns are addressed using a de-homogenization topology optimization procedure on complex engineering structures discretized by 3D unstructured hexahedrals. Using a rectangular-hole microstructure (reminiscent to the stiffness optimal orthogonal rank-3 multi-scale) as a base material for the multi-scale optimization, a coarse-scale optimized geometry can be obtained using homogenization-based topology optimization. Due to the microstructure periodicity, this coarse-scale geometry can be up-sampled to a fine physical geometry with optimized infill, with minor loss in structural performance and at a fraction of the cost of a fine-scale solution. The upsampling on 3D unstructured grids is achieved through stream surface tracing which aligns with the optimized local orientation. The periodicity of the physical geometry can be tuned, such that the material serves as a structural component and also as an efficient infill for additive manufacturing designs. The method is demonstrated through three examples. It achieves comparable structural performance to state-of-the-art methods but stands out for its significant computational time reduction, much faster than the base-line method. By allowing multiple active layers, the mapped solution becomes more mechanically stable, leading to an increased critical buckling load factor without additional computational expense. The proposed approach achieves promising results, benchmarking against large-scale SIMP models demonstrates computational efficiency improvements of up to 250 times. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Submitted for review at Thin-walled Structures

arXiv:2302.12353 [pdf]

Autonomous Restructuring of Asteroids into Rotating Space Stations

Authors: David W. Jensen

Abstract: Asteroid restructuring uses robotics, self replication, and mechanical automatons to autonomously restructure an asteroid into a large rotating space station. The restructuring process makes structures from asteroid oxide materials; uses productive self-replication to make replicators, helpers, and products; and creates a multiple floor station to support a large population. In an example simulati… ▽ More Asteroid restructuring uses robotics, self replication, and mechanical automatons to autonomously restructure an asteroid into a large rotating space station. The restructuring process makes structures from asteroid oxide materials; uses productive self-replication to make replicators, helpers, and products; and creates a multiple floor station to support a large population. In an example simulation, it takes 12 years to autonomously restructure a large asteroid into the space station. This is accomplished with a single rocket launch. The single payload contains a base station, 4 robots (spiders), and a modest set of supplies. Our simulation creates 3000 spiders and over 23,500 other pieces of equipment. Only the base station and spiders (replicators) have advanced microprocessors and algorithms. These represent 21st century technologies created and trans-ported from Earth. The equipment and tools are built using in-situ materials and represent 18th or 19th century technologies. The equipment and tools (helpers) have simple mechanical programs to perform repetitive tasks. The resulting example station would be a rotating framework almost 5 kilometers in diameter. Once completed, it could support a population of over 700,000 people. Many researchers identify the high launch costs, the harsh space environment, and the lack of gravity as the key obstacles hindering the development of space stations. The single probe addresses the high launch cost. The autonomous construction eliminates the harsh space environment for construction crews. The completed rotating station provides radiation protection and centripetal gravity for the first work crews and colonists. △ Less

Submitted 27 November, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 65 pages, 53 figures, 25 tables; Version 2 includes editorial changes, improved dumbbell stability details, and reference updates and additions

arXiv:2212.10830 [pdf, other]

A Comparative Risk Analysis on CyberShip System with STPA-Sec, STRIDE and CORAS

Authors: Rishikesh Sahay, D. A. Sepulveda Estay, Weizhi Meng, Christian D. Jensen, Michael Bruhn Barfod

Abstract: The widespread use of software-intensive cyber systems in critical infrastructures such as ships (CyberShips) has brought huge benefits, yet it has also opened new avenues for cyber attacks to potentially disrupt operations. Cyber risk assessment plays a vital role in identifying cyber threats and vulnerabilities that can be exploited to compromise cyber systems. A number of methodologies have bee… ▽ More The widespread use of software-intensive cyber systems in critical infrastructures such as ships (CyberShips) has brought huge benefits, yet it has also opened new avenues for cyber attacks to potentially disrupt operations. Cyber risk assessment plays a vital role in identifying cyber threats and vulnerabilities that can be exploited to compromise cyber systems. A number of methodologies have been proposed to carry out these analyses. This paper evaluates and compares the application of three risk assessment methodologies: system theoretic process analysis (STPA-Sec), STRIDE and CORAS for identifying threats and vulnerabilities in a CyberShip system. We specifically selected these three methodologies because they identify threats not only at the component level, but also threats or hazards caused due to the interaction between components, resulting in sets of threats identified with each methodology and relevant differences. Moreover, STPA-Sec which is a variant of the STPA is widely used for safety and security analysis of cyber physical systems (CPS); CORAS offers a framework to perform cyber risk assessment in a top-down approach that aligns with STPA-Sec; and STRIDE (Spoofing, Tampering, Repudiation, Information disclosure, Denial of Service, Elevation of Privilege) considers threat at the component level as well as during the interaction that is similar to STPA-Sec. As a result of this analysis, this paper highlights the pros and cons of these methodologies, illustrates areas of special applicability, and suggests that their complementary use as threats identified through STRIDE can be used as an input to CORAS and STPA-Sec to make these methods more structured. △ Less

Submitted 21 December, 2022; originally announced December 2022.

arXiv:2211.06536 [pdf, other]

Improving the Efficiency of the PC Algorithm by Using Model-Based Conditional Independence Tests

Authors: Erica Cai, Andrew McGregor, David Jensen

Abstract: Learning causal structure is useful in many areas of artificial intelligence, including planning, robotics, and explanation. Constraint-based structure learning algorithms such as PC use conditional independence (CI) tests to infer causal structure. Traditionally, constraint-based algorithms perform CI tests with a preference for smaller-sized conditioning sets, partially because the statistical p… ▽ More Learning causal structure is useful in many areas of artificial intelligence, including planning, robotics, and explanation. Constraint-based structure learning algorithms such as PC use conditional independence (CI) tests to infer causal structure. Traditionally, constraint-based algorithms perform CI tests with a preference for smaller-sized conditioning sets, partially because the statistical power of conventional CI tests declines rapidly as the size of the conditioning set increases. However, many modern conditional independence tests are model-based, and these tests use well-regularized models that maintain statistical power even with very large conditioning sets. This suggests an intriguing new strategy for constraint-based algorithms which may result in a reduction of the total number of CI tests performed: Test variable pairs with large conditioning sets first, as a pre-processing step that finds some conditional independencies quickly, before moving on to the more conventional strategy that favors small conditioning sets. We propose such a pre-processing step for the PC algorithm which relies on performing CI tests on a few randomly selected large conditioning sets. We perform an empirical analysis on directed acyclic graphs (DAGs) that correspond to real-world systems and both empirical and theoretical analyses for Erdős-Renyi DAGs. Our results show that Pre-Processing Plus PC (P3PC) performs far fewer CI tests than the original PC algorithm, between 0.5% to 36%, and often less than 10%, of the CI tests that the PC algorithm alone performs. The efficiency gains are particularly significant for the DAGs corresponding to real-world systems. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted at NeurIPS 2022 Workshop on Causality for Real-world Impact; 8 pages of main text including references

arXiv:2209.09058 [pdf, other]

Measuring Interventional Robustness in Reinforcement Learning

Authors: Katherine Avery, Jack Kenney, Pracheta Amaranath, Erica Cai, David Jensen

Abstract: Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as th… ▽ More Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as the order of training data or the particular exploratory actions taken by agents. A training procedure has high IR when the agents it produces take very similar actions under intervention, despite variation in these incidental aspects of the training procedure. We develop an intuitive, quantitative measure of IR and calculate it for eight algorithms in three Atari environments across dozens of interventions and states. From these experiments, we find that IR varies with the amount of training and type of algorithm and that high performance does not imply high IR, as one might expect. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 17 pages, 13 figures

arXiv:2106.05506 [pdf, other]

Brittle AI, Causal Confusion, and Bad Mental Models: Challenges and Successes in the XAI Program

Authors: Jeff Druce, James Niehaus, Vanessa Moody, David Jensen, Michael L. Littman

Abstract: The advances in artificial intelligence enabled by deep learning architectures are undeniable. In several cases, deep neural network driven models have surpassed human level performance in benchmark autonomy tasks. The underlying policies for these agents, however, are not easily interpretable. In fact, given their underlying deep models, it is impossible to directly understand the mapping from ob… ▽ More The advances in artificial intelligence enabled by deep learning architectures are undeniable. In several cases, deep neural network driven models have surpassed human level performance in benchmark autonomy tasks. The underlying policies for these agents, however, are not easily interpretable. In fact, given their underlying deep models, it is impossible to directly understand the mapping from observations to actions for any reasonably complex agent. Producing this supporting technology to "open the black box" of these AI systems, while not sacrificing performance, was the fundamental goal of the DARPA XAI program. In our journey through this program, we have several "big picture" takeaways: 1) Explanations need to be highly tailored to their scenario; 2) many seemingly high performing RL agents are extremely brittle and are not amendable to explanation; 3) causal models allow for rich explanations, but how to present them isn't always straightforward; and 4) human subjects conjure fantastically wrong mental models for AIs, and these models are often hard to break. This paper discusses the origins of these takeaways, provides amplifying information, and suggestions for future work. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2102.11761 [pdf, other]

SBI: A Simulation-Based Test of Identifiability for Bayesian Causal Inference

Authors: Sam Witty, David Jensen, Vikash Mansinghka

Abstract: A growing family of approaches to causal inference rely on Bayesian formulations of assumptions that go beyond causal graph structure. For example, Bayesian approaches have been developed for analyzing instrumental variable designs, regression discontinuity designs, and within-subjects designs. This paper introduces simulation-based identifiability (SBI), a procedure for testing the identifiabilit… ▽ More A growing family of approaches to causal inference rely on Bayesian formulations of assumptions that go beyond causal graph structure. For example, Bayesian approaches have been developed for analyzing instrumental variable designs, regression discontinuity designs, and within-subjects designs. This paper introduces simulation-based identifiability (SBI), a procedure for testing the identifiability of queries in Bayesian causal inference approaches that are implemented as probabilistic programs. SBI complements analytical approaches to identifiability, leveraging a particle-based optimization scheme on simulated data to determine identifiability for analytically intractable models. We analyze SBI's soundness for a broad class of differentiable, finite-dimensional probabilistic programs with bounded effects. Finally, we provide an implementation of SBI using stochastic gradient descent, and show empirically that it agrees with known identification results on a suite of graph-based and quasi-experimental design benchmarks, including those using Gaussian processes. △ Less

Submitted 31 October, 2022; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: 17 pages, 3 figures

arXiv:2101.05855 [pdf, other]

Preserving Privacy in Personalized Models for Distributed Mobile Services

Authors: Akanksha Atrey, Prashant Shenoy, David Jensen

Abstract: The ubiquity of mobile devices has led to the proliferation of mobile services that provide personalized and context-aware content to their users. Modern mobile services are distributed between end-devices, such as smartphones, and remote servers that reside in the cloud. Such services thrive on their ability to predict future contexts to pre-fetch content or make context-specific recommendations.… ▽ More The ubiquity of mobile devices has led to the proliferation of mobile services that provide personalized and context-aware content to their users. Modern mobile services are distributed between end-devices, such as smartphones, and remote servers that reside in the cloud. Such services thrive on their ability to predict future contexts to pre-fetch content or make context-specific recommendations. An increasingly common method to predict future contexts, such as location, is via machine learning (ML) models. Recent work in context prediction has focused on ML model personalization where a personalized model is learned for each individual user in order to tailor predictions or recommendations to a user's mobile behavior. While the use of personalized models increases efficacy of the mobile service, we argue that it increases privacy risk since a personalized model encodes contextual behavior unique to each user. To demonstrate these privacy risks, we present several attribute inference-based privacy attacks and show that such attacks can leak privacy with up to 78% efficacy for top-3 predictions. We present Pelican, a privacy-preserving personalization system for context-aware mobile services that leverages both device and cloud resources to personalize ML models while minimizing the risk of privacy leakage for users. We evaluate Pelican using real world traces for location-aware mobile services and show that Pelican can substantially reduce privacy leakage by up to 75%. △ Less

Submitted 21 April, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: Published at ICDCS 2021

arXiv:2010.03051 [pdf, other]

How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference

Authors: Amanda Gentzel, Purva Pruthi, David Jensen

Abstract: Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We describe and analyze observational sa… ▽ More Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We describe and analyze observational sampling from randomized controlled trials (OSRCT), a method for evaluating causal inference methods using data from randomized controlled trials (RCTs). This method can be used to create constructed observational data sets with corresponding unbiased estimates of treatment effect, substantially increasing the number of data sets available for empirical evaluation of causal inference methods. We show that, in expectation, OSRCT creates data sets that are equivalent to those produced by randomly sampling from empirical data sets in which all potential outcomes are available. We then perform a large-scale evaluation of seven causal inference methods over 37 data sets, drawn from RCTs, as well as simulators, real-world computational systems, and observational data sets augmented with a synthetic response variable. We find notable performance differences when comparing across data from different sources, demonstrating the importance of using data from a variety of sources when evaluating any causal inference method. △ Less

Submitted 7 July, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

Journal ref: In Proceedings of the International Conference on Machine Learning (ICML) 2021

arXiv:2007.07127 [pdf, other]

Causal Inference using Gaussian Processes with Structured Latent Confounders

Authors: Sam Witty, Kenta Takatsu, David Jensen, Vikash Mansinghka

Abstract: Latent confounders---unobserved variables that influence both treatment and outcome---can bias estimates of causal effects. In some cases, these confounders are shared across observations, e.g. all students taking a course are influenced by the course's difficulty in addition to any educational interventions they receive individually. This paper shows how to semiparametrically model latent confoun… ▽ More Latent confounders---unobserved variables that influence both treatment and outcome---can bias estimates of causal effects. In some cases, these confounders are shared across observations, e.g. all students taking a course are influenced by the course's difficulty in addition to any educational interventions they receive individually. This paper shows how to semiparametrically model latent confounders that have this structure and thereby improve estimates of causal effects. The key innovations are a hierarchical Bayesian model, Gaussian processes with structured latent confounders (GP-SLC), and a Monte Carlo inference algorithm for this model based on elliptical slice sampling. GP-SLC provides principled Bayesian uncertainty estimates of individual treatment effect with minimal assumptions about the functional forms relating confounders, covariates, treatment, and outcome. Finally, this paper shows GP-SLC is competitive with or more accurate than widely used causal inference techniques on three benchmark datasets, including the Infant Health and Development Program and a dataset showing the effect of changing temperatures on state-wide energy consumption across New England. △ Less

Submitted 14 July, 2020; originally announced July 2020.

Comments: to be published at ICML2020

arXiv:2005.00649 [pdf, other]

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates

Authors: Katherine A. Keith, David Jensen, Brendan O'Connor

Abstract: Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an indiv… ▽ More Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an individual's entire history of social media posts or the content of a news article could provide a rich measurement of multiple confounders. Yet, methods and applications for this problem are scattered across different communities and evaluation practices are inconsistent. This review is the first to gather and categorize these examples and provide a guide to data-processing and evaluation decisions. Despite increased attention on adjusting for confounding using text, there are still many open problems, which we highlight in this paper. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: Accepted to ACL 2020

Journal ref: ACL 2020

arXiv:1912.05743 [pdf, other]

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning

Authors: Akanksha Atrey, Kaleigh Clary, David Jensen

Abstract: Saliency maps are frequently used to support explanations of the behavior of deep reinforcement learning (RL) agents. However, a review of how saliency maps are used in practice indicates that the derived explanations are often unfalsifiable and can be highly subjective. We introduce an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliency maps and… ▽ More Saliency maps are frequently used to support explanations of the behavior of deep reinforcement learning (RL) agents. However, a review of how saliency maps are used in practice indicates that the derived explanations are often unfalsifiable and can be highly subjective. We introduce an empirical approach grounded in counterfactual reasoning to test the hypotheses generated from saliency maps and assess the degree to which they correspond to the semantics of RL environments. We use Atari games, a common benchmark for deep RL, to evaluate three types of saliency maps. Our results show the extent to which existing claims about Atari games can be evaluated and suggest that saliency maps are best viewed as an exploratory tool rather than an explanatory tool. △ Less

Submitted 20 February, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: Published at ICLR 2020

arXiv:1910.14124 [pdf, other]

Bayesian causal inference via probabilistic program synthesis

Authors: Sam Witty, Alexander Lew, David Jensen, Vikash Mansinghka

Abstract: Causal inference can be formalized as Bayesian inference that combines a prior distribution over causal models and likelihoods that account for both observations and interventions. We show that it is possible to implement this approach using a sufficiently expressive probabilistic programming language. Priors are represented using probabilistic programs that generate source code in a domain specif… ▽ More Causal inference can be formalized as Bayesian inference that combines a prior distribution over causal models and likelihoods that account for both observations and interventions. We show that it is possible to implement this approach using a sufficiently expressive probabilistic programming language. Priors are represented using probabilistic programs that generate source code in a domain specific language. Interventions are represented using probabilistic programs that edit this source code to modify the original generative process. This approach makes it straightforward to incorporate data from atomic interventions, as well as shift interventions, variance-scaling interventions, and other interventions that modify causal structure. This approach also enables the use of general-purpose inference machinery for probabilistic programs to infer probable causal structures and parameters from data. This abstract describes a prototype of this approach in the Gen probabilistic programming language. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1910.05387 [pdf, other]

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data

Authors: Amanda Gentzel, Dan Garant, David Jensen

Abstract: Causal inference is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling… ▽ More Causal inference is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for more frequent use of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate those measures on empirical data rather than synthetic data. We survey the current practice in evaluation and show that the techniques we recommend are rarely used in practice. We show that such techniques are feasible and that data sets are available to conduct such evaluations. We also show that these techniques produce substantially different results than using structural measures and synthetic data. △ Less

Submitted 1 November, 2019; v1 submitted 11 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019

arXiv:1909.13649 [pdf, other]

doi 10.1145/3360608

PlanAlyzer: Assessing Threats to the Validity of Online Experiments

Authors: Emma Tosch, Eytan Bakshy, Emery D. Berger, David D. Jensen, J. Eliot B. Moss

Abstract: Online experiments are ubiquitous. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy---referred to as *internal validity*---can be difficult. C… ▽ More Online experiments are ubiquitous. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy---referred to as *internal validity*---can be difficult. Currently, verifying internal validity requires manual inspection by someone with substantial expertise in experimental design. We present the first approach for statically checking the internal validity of online experiments. Our checks are based on well-known problems that arise in experimental design and causal inference. Our analyses target PlanOut, a widely deployed, open-source experimentation framework that uses a domain-specific language to specify and run complex experiments. We have built a tool, PlanAlyzer, that checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency. PlanAlyzer uses its analyses to automatically generate *contrasts*, a key type of information required to perform valid statistical analyses over experimental results. We demonstrate PlanAlyzer's utility on a corpus of PlanOut scripts deployed in production at Facebook, and we evaluate its ability to identify threats to validity on a mutated subset of this corpus. PlanAlyzer has both precision and recall of 92% on the mutated corpus, and 82% of the contrasts it automatically generates match hand-specified data. △ Less

Submitted 30 September, 2019; originally announced September 2019.

Comments: 30 pages, hella long

Journal ref: OOPSLA 2019

arXiv:1905.02825 [pdf, other]

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning

Authors: Emma Tosch, Kaleigh Clary, John Foley, David Jensen

Abstract: Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behav… ▽ More Evaluation of deep reinforcement learning (RL) is inherently challenging. In particular, learned policies are largely opaque, and hypotheses about the behavior of deep RL agents are difficult to test in black-box environments. Considerable effort has gone into addressing opacity, but almost no effort has been devoted to producing high quality environments for experimental evaluation of agent behavior. We present TOYBOX, a new high-performance, open-source* subset of Atari environments re-designed for the experimental evaluation of deep RL. We show that TOYBOX enables a wide range of experiments and analyses that are impossible in other environments. *https://kdl-umass.github.io/Toybox/ △ Less

Submitted 7 May, 2019; originally announced May 2019.

arXiv:1904.06312 [pdf, other]

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

Authors: Kaleigh Clary, Emma Tosch, John Foley, David Jensen

Abstract: Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variabi… ▽ More Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: NeurIPS 2018 Critiquing and Correcting Trends Workshop

arXiv:1812.02868 [pdf, other]

Measuring and Characterizing Generalization in Deep Reinforcement Learning

Authors: Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen

Abstract: Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-… ▽ More Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported. △ Less

Submitted 11 December, 2018; v1 submitted 6 December, 2018; originally announced December 2018.

arXiv:1812.02850 [pdf, other]

ToyBox: Better Atari Environments for Testing Reinforcement Learning Agents

Authors: John Foley, Emma Tosch, Kaleigh Clary, David Jensen

Abstract: It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for d… ▽ More It is a widely accepted principle that software without tests has bugs. Testing reinforcement learning agents is especially difficult because of the stochastic nature of both agents and environments, the complexity of state-of-the-art models, and the sequential nature of their predictions. Recently, the Arcade Learning Environment (ALE) has become one of the most widely used benchmark suites for deep learning research, and state-of-the-art Reinforcement Learning (RL) agents have been shown to routinely equal or exceed human performance on many ALE tasks. Since ALE is based on emulation of original Atari games, the environment does not provide semantically meaningful representations of internal game state. This means that ALE has limited utility as an environment for supporting testing or model introspection. We propose ToyBox, a collection of reimplementations of these games that solves this critical problem and enables robust testing of RL agents. △ Less

Submitted 25 January, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: NeurIPS Systems for ML Workshop

arXiv:1608.04698 [pdf, other]

Evaluating Causal Models by Comparing Interventional Distributions

Authors: Dan Garant, David Jensen

Abstract: The predominant method for evaluating the quality of causal models is to measure the graphical accuracy of the learned model structure. We present an alternative method for evaluating causal models that directly measures the accuracy of estimated interventional distributions. We contrast such distributional measures with structural measures, such as structural Hamming distance and structural inter… ▽ More The predominant method for evaluating the quality of causal models is to measure the graphical accuracy of the learned model structure. We present an alternative method for evaluating causal models that directly measures the accuracy of estimated interventional distributions. We contrast such distributional measures with structural measures, such as structural Hamming distance and structural intervention distance, showing that structural measures often correspond poorly to the accuracy of estimated interventional distributions. We use a number of real and synthetic datasets to illustrate various scenarios in which structural measures provide misleading results with respect to algorithm selection and parameter tuning, and we recommend that distributional measures become the new standard for evaluating causal models. △ Less

Submitted 16 August, 2016; originally announced August 2016.

arXiv:1605.04056 [pdf, other]

Causal Discovery for Manufacturing Domains

Authors: Katerina Marazopoulou, Rumi Ghosh, Prasanth Lade, David Jensen

Abstract: Yield and quality improvement is of paramount importance to any manufacturing company. One of the ways of improving yield is through discovery of the root causal factors affecting yield. We propose the use of data-driven interpretable causal models to identify key factors affecting yield. We focus on factors that are measured in different stages of production and testing in the manufacturing cycle… ▽ More Yield and quality improvement is of paramount importance to any manufacturing company. One of the ways of improving yield is through discovery of the root causal factors affecting yield. We propose the use of data-driven interpretable causal models to identify key factors affecting yield. We focus on factors that are measured in different stages of production and testing in the manufacturing cycle of a product. We apply causal structure learning techniques on real data collected from this line. Specifically, the goal of this work is to learn interpretable causal models from observational data produced by manufacturing lines. Emphasis has been given to the interpretability of the models to make them actionable in the field of manufacturing. We highlight the challenges presented by assembly line data and propose ways to alleviate them.We also identify unique characteristics of data originating from assembly lines and how to leverage them in order to improve causal discovery. Standard evaluation techniques for causal structure learning shows that the learned causal models seem to closely represent the underlying latent causal relationship between different factors in the production process. These results were also validated by manufacturing domain experts who found them promising. This work demonstrates how data mining and knowledge discovery can be used for root cause analysis in the domain of manufacturing and connected industry. △ Less

Submitted 13 June, 2016; v1 submitted 13 May, 2016; originally announced May 2016.

arXiv:1412.5238 [pdf, other]

Refining the Semantics of Social Influence

Authors: Katerina Marazopoulou, David Arbour, David Jensen

Abstract: With the proliferation of network data, researchers are increasingly focusing on questions investigating phenomena occurring on networks. This often includes analysis of peer-effects, i.e., how the connections of an individual affect that individual's behavior. This type of influence is not limited to direct connections of an individual (such as friends), but also to individuals that are connected… ▽ More With the proliferation of network data, researchers are increasingly focusing on questions investigating phenomena occurring on networks. This often includes analysis of peer-effects, i.e., how the connections of an individual affect that individual's behavior. This type of influence is not limited to direct connections of an individual (such as friends), but also to individuals that are connected through longer paths (for example, friends of friends, or friends of friends of friends). In this work, we identify an ambiguity in the definition of what constitutes the extended neighborhood of an individual. This ambiguity gives rise to different semantics and supports different types of underlying phenomena. We present experimental results, both on synthetic and real networks, that quantify differences among the sets of extended neighbors under different semantics. Finally, we provide experimental evidence that demonstrates how the use of different semantics affects model selection. △ Less

Submitted 16 December, 2014; originally announced December 2014.

Comments: Networks: From Graphs to Rich Data - NIPS Workshop

arXiv:1405.5868 [pdf, other]

Learning to Generate Networks

Authors: James Atwood, Don Towsley, Krista Gile, David Jensen

Abstract: We investigate the problem of learning to generate complex networks from data. Specifically, we consider whether deep belief networks, dependency networks, and members of the exponential random graph family can learn to generate networks whose complex behavior is consistent with a set of input examples. We find that the deep model is able to capture the complex behavior of small networks, but that… ▽ More We investigate the problem of learning to generate complex networks from data. Specifically, we consider whether deep belief networks, dependency networks, and members of the exponential random graph family can learn to generate networks whose complex behavior is consistent with a set of input examples. We find that the deep model is able to capture the complex behavior of small networks, but that no model is able capture this behavior for networks with more than a handful of nodes. △ Less

Submitted 10 November, 2014; v1 submitted 22 May, 2014; originally announced May 2014.

Comments: Neural Information Processing Systems 2014 Workshop on Networks: From Graphs to Rich Data

arXiv:1401.8042 [pdf, other]

Online Dating Recommendations: Matching Markets and Learning Preferences

Authors: Kun Tu, Bruno Ribeiro, Hua Jiang, Xiaodong Wang, David Jensen, Benyuan Liu, Don Towsley

Abstract: Recommendation systems for online dating have recently attracted much attention from the research community. In this paper we proposed a two-side matching framework for online dating recommendations and design an LDA model to learn the user preferences from the observed user messaging behavior and user profile features. Experimental results using data from a large online dating website shows that… ▽ More Recommendation systems for online dating have recently attracted much attention from the research community. In this paper we proposed a two-side matching framework for online dating recommendations and design an LDA model to learn the user preferences from the observed user messaging behavior and user profile features. Experimental results using data from a large online dating website shows that two-sided matching improves significantly the rate of successful matches by as much as 45%. Finally, using simulated matchings we show that the the LDA model can correctly capture user preferences. △ Less

Submitted 30 January, 2014; originally announced January 2014.

Comments: 6 pages, 4 figures, submission on 5th International Workshop on Social Recommender Systems

arXiv:1309.6843 [pdf]

A Sound and Complete Algorithm for Learning Causal Models from Relational Data

Authors: Marc Maier, Katerina Marazopoulou, David Arbour, David Jensen

Abstract: The PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, a more expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orient… ▽ More The PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, a more expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orienting bivariate dependencies and forms the basis of a new algorithm for learning structure. We present the relational causal discovery (RCD) algorithm that learns causal relational models. We prove that RCD is sound and complete, and we present empirical results that demonstrate effectiveness. △ Less

Submitted 26 September, 2013; originally announced September 2013.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Report number: UAI-P-2013-PG-371-380

arXiv:1302.4381 [pdf, other]

Reasoning about Independence in Probabilistic Models of Relational Data

Authors: Marc Maier, Katerina Marazopoulou, David Jensen

Abstract: We extend the theory of d-separation to cases in which data instances are not independent and identically distributed. We show that applying the rules of d-separation directly to the structure of probabilistic models of relational data inaccurately infers conditional independence. We introduce relational d-separation, a theory for deriving conditional independence facts from relational models. We… ▽ More We extend the theory of d-separation to cases in which data instances are not independent and identically distributed. We show that applying the rules of d-separation directly to the structure of probabilistic models of relational data inaccurately infers conditional independence. We introduce relational d-separation, a theory for deriving conditional independence facts from relational models. We provide a new representation, the abstract ground graph, that enables a sound, complete, and computationally efficient method for answering d-separation queries about relational models, and we present empirical results that demonstrate effectiveness. △ Less

Submitted 6 January, 2014; v1 submitted 18 February, 2013; originally announced February 2013.

Comments: 61 pages, substantial revisions to formalisms, theory, and related work

arXiv:1206.3536

Identifying Independence in Relational Models

Authors: Marc Maier, David Jensen

Abstract: The rules of d-separation provide a framework for deriving conditional independence facts from model structure. However, this theory only applies to simple directed graphical models. We introduce relational d-separation, a theory for deriving conditional independence in relational models. We provide a sound, complete, and computationally efficient method for relational d-separation, and we present… ▽ More The rules of d-separation provide a framework for deriving conditional independence facts from model structure. However, this theory only applies to simple directed graphical models. We introduce relational d-separation, a theory for deriving conditional independence in relational models. We provide a sound, complete, and computationally efficient method for relational d-separation, and we present empirical results that demonstrate effectiveness. △ Less

Submitted 15 April, 2013; v1 submitted 15 June, 2012; originally announced June 2012.

Comments: This paper has been revised and expanded. See "Reasoning about Independence in Probabilistic Models of Relational Data" http://arxiv.org/abs/1302.4381

Showing 1–31 of 31 results for author: Jensen, D