Detecting and Measuring Confounding Using Causal Mechanism Shifts

Abbavaram Gowtham Reddy and Vineeth N Balasubramanian
Indian Institute of Technology Hyderabad, India
{cs19resch11002,vineethnb}@iith.ac.in
Abstract

Detecting and measuring confounding effects from data is a key challenge in causal inference. Existing methods frequently assume causal sufficiency, disregarding the presence of unobserved confounding variables. Causal sufficiency is both unrealistic and empirically untestable. Additionally, existing methods make strong parametric assumptions about the underlying causal generative process to guarantee the identifiability of confounding variables. Relaxing the causal sufficiency and parametric assumptions and leveraging recent advancements in causal discovery and confounding analysis with non-i.i.d. data, we propose a comprehensive approach for detecting and measuring confounding. We consider various definitions of confounding and introduce tailored methodologies to achieve three objectives: (i) detecting and measuring confounding among a set of variables, (ii) separating observed and unobserved confounding effects, and (iii) understanding the relative strengths of confounding bias between different sets of variables. We present useful properties of a confounding measure and present measures that satisfy those properties. Empirical results support the theoretical analysis.

1 Introduction

Understanding the underlying causal generative process of a set of variables is crucial in many scientific studies for applications in treatment and policy designs [44]. While randomized controlled trials (RCTs) and causal inference through active interventions are ideal choices for understanding the underlying causal model [19, 12, 13, 55], RCTs and/or active interventions are often impossible/infeasible, and some times unethical [50, 6]. Research efforts in causal inference hence rely on observational data to study causal relationships [44, 59, 65, 18, 41]. However, recovering the underlying causal model purely from observational data is challenging without further assumptions; this challenge is further exacerbated in the presence of unmeasured confounding variables.

A confounding variable is a variable that causes two other variables, resulting in a spurious association between those two variables. As exemplified with Simpson’s paradox [58] and many other studies [20, 1, 31], the presence of confounding variables is an important quantitative explanation for why correlation does not imply causation. It is challenging to observe and measure all confounding variables in a scientific study [60, 44]. Identifying latent or unobserved confounding variables is even more challenging, and misinterpretation presents various challenges in downstream applications, such as discovering causal structures from observational data. Numerous methods operate under the assumption of causal sufficiency [45, 4, 60, 8, 51, 65], implying the non-existence of unobserved confounding variables. Causal sufficiency presupposes that all pertinent variables required for causal inference have been observed. However, this may not be a practical or testable assumption.

The study of confounding has various applications, chief among them being causal discovery - identifying the causal relationships among variables [38, 40, 63]. It is also useful for determining whether a set of observed confounding variables is sufficient to adjust for estimating causal effects [29], measuring the extent to which statistical correlation between variables can be attributed to confounding [24, 25, 62], and verifying the comparability of treatment and control groups in non-randomized interventional studies [16].

A fundamental problem in causal inference tasks lies in detecting hidden confounding variables from observational data alone. However, this is non-trivial and poses various challenges. For example, a key issue is that given a marginal distribution over observed variables, there are infinitely many joint distributions corresponding to causal graphs involving unobserved variables [56]. To tackle such challenges, recent endeavors show that using data from different environments helps in improved causal discovery [40, 38, 33, 45, 23], detecting causal mechanism shifts [36], and detecting unobserved confounding [29, 38]. However, such recent efforts often subsume confounding detection under causal discovery, focusing primarily on identifying confounding factors while overlooking other useful information, such as the relative strength of confounding between variable sets and the distinction between observed and unobserved confounding within a variable set. We seek to address these gaps in this work.

We focus exclusively on the problem of studying confounding from multiple perspectives, including (i) detecting and measuring confounding among a set of variables, (ii) assessing the relative strengths of confounding among different sets of observed variables, and (iii) distinguishing between observed and unobserved confounding among a set of variables. The primary focus of causal inference often lies in verifying the presence or absence of confounding rather than determining the exact value of the measured confounding. However, we leverage the measured confounding to assess the relative strengths of confounding between sets of variables. To achieve the above objectives, we utilize data from various contexts, where each context results from shifts in the causal mechanisms of a set of variables [38, 45]. This allows us to propose different measures of confounding based on the available context information. Our contributions can be summarized as follows.

  • For various definitions of confounding, we propose corresponding measures of confounding and present useful properties of the proposed measures. To our knowledge, this is the first comprehensive study that examines various aspects of observed and unobserved confounding using data from multiple contexts without making parametric or causal sufficiency assumptions.

  • We study pair-wise confounding, confounding among multiple variables, how to separate unobserved confounding from overall confounding, and present ways to assess relative confounding.

  • We present an algorithm for detecting and measuring confounding using data from multiple contexts. Experimental results are performed to verify theoretical analysis.

2 Related Work

The study of confounding has typically been embedded as part of causal discovery algorithms in most existing work. Causal discovery methods can be categorized according to several criteria, including the type of data utilized (observational versus interventional/experimental), parametric versus non-parametric approaches, or whether they relax causal sufficiency assumptions [65, 59]. Considering our focus in this work on studying confounding comprehensively by going beyond observed confounding variables, we discuss literature that are directed towards methods that relax the causal sufficiency assumption and rely on experimental data.

Causal Discovery via Observational Data, Relaxing Causal Sufficiency: Constraint based causal discovery algorithms produce equivalence class of graphs that satisfy a set of conditional independence constraints [60, 11, 9, 42]. Other methods such as [2, 28, 27] reduce the problem complexity by assuming a parametric form of the underlying causal model (e.g., variables are jointly Gaussian in Chandrasekaran et al. [7]), thereby returning unique causal graphs. Nested Markov Models (NMMs) [56, 57, 49, 14] allow identifiability of causal models with latent factors by using (pairwise) Verma constraints. A recent approach using differentiable causal discovery [2] combines NMMs with the differentiable constraint [66] to discover a partially directed causal network and likely confounded nodes. Unlike these methods, our focus in this work is on detecting and measuring confounding under various settings, instead of recovering the entire causal graph or equivalence class.

Causal Discovery Using Data From Multiple Environments: Given access to a set of observed confounding variables, very recent work [29] presented testable conditional independence tests that are violated only when there is unobserved confounding. However, their analysis is focused towards the downstream causal effect estimation. We aim to provide a unified framework for studying and measuring confounding under different types of contextual information available. Other methods [33, 23] learn an equivalence class of graphs when data from observational and interventional distribution are available. Confounding has also shown to be detected in linear models with non-Gaussian variables [20]. In linear models, a spectral analysis method was proposed in [25] to understand to what extent the statistical correlation between a set of variables on a target variable can be attributed to confounding. See Tab. 4 of [40] for an overview of causal discovery methods that use data from multiple environments or contexts. Under the specific assumptions of causal sufficiency and sparse mechanism shift, a method was proposed in [45] to reduce the size of a given Markov equivalence class using mechanism shift score. A differentiable causal discovery method was proposed in [4] to use interventional data to recover interventional Markov equivalence class. While these methods use data from different contexts, they assume the absence of unobserved confounding variables; we instead focus on capturing both observed and unobserved confounding.

Measuring and Interpreting Confounding: Earlier efforts in the field have studied different measures for observed confounding, each tailored to address specific challenges [44, 15, 35, 3, 39, 30, 43, 34]. Such measures have also been refined to address specific issues [24, 54]; for e.g., a method to correct the non-linearity effect present in confounding estimates via the exposure–outcome association with and without adjustment for confounding was proposed in [24]. In contrast, we measure the effects of both observed and unobserved confounding. Motivated from the ignorability property in potential outcomes framework [61, 26], the divergence between nominal and complete propensity density has been considered as an indicative of hidden confounding [26]. To the best of our knowledge, the efforts closest to ours are [38, 40], which study confounding using data from multiple contexts without the causal sufficiency assumption. However, they do not measure confounding and detect confounding only as a step to discover the causal graph. Ours is a more general framework for studying and measuring confounding from multiple perspectives.

In regression models, certain difference thresold between the coefficients of treatment variable before and after adjusting for the possible confounding is considered as the indication for the presence of confounding. This process of choosing a threshold is also called change-in-estimate criterion. Typical threshold used in literature is 10%percent1010\%10 % [54, 32, 5].

3 Background and Problem Setup

Let 𝐗𝐗\mathbf{X}bold_X be a set of observed variables and 𝐙𝐙\mathbf{Z}bold_Z be a set of unobserved or latent variables. The values of 𝐗,𝐙𝐗𝐙\mathbf{X,Z}bold_X , bold_Z can be real, discrete, or mixed. Let 𝒢𝒢\mathcal{G}caligraphic_G be the underlying directed acyclic graph (DAG) among the variables 𝐕=𝐗𝐙𝐕𝐗𝐙\mathbf{V}=\mathbf{X}\cup\mathbf{Z}bold_V = bold_X ∪ bold_Z. Directed edges among the variables in 𝐕𝐕\mathbf{V}bold_V indicate direct causal influences. Assume that the set of unobserved variables 𝐙𝐙\mathbf{Z}bold_Z are jointly independent and are exogenous to 𝐗𝐗\mathbf{X}bold_X (i.e., ZiZj and Xk↛Zji,j,kZ_{i}\perp\!\!\!\!\perp Z_{j}\text{ and }X_{k}\not\rightarrow Z_{j}\ \ \forall i% ,j,kitalic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟂ ⟂ italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↛ italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∀ italic_i , italic_j , italic_k). In this setting, any two nodes Xi,Xj𝐗subscript𝑋𝑖subscript𝑋𝑗𝐗X_{i},X_{j}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X sharing a common parent Zk𝐙subscript𝑍𝑘𝐙Z_{k}\in\mathbf{Z}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ bold_Z are said to be confounded, and Zksubscript𝑍𝑘Z_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is said to be a confounding variable. For a node Xi𝐗,𝐏𝐀i={Xj𝐗|XjXi}{Zj𝐙|ZjXi}formulae-sequencesubscript𝑋𝑖𝐗subscript𝐏𝐀𝑖conditional-setsubscript𝑋𝑗𝐗subscript𝑋𝑗subscript𝑋𝑖conditional-setsubscript𝑍𝑗𝐙subscript𝑍𝑗subscript𝑋𝑖X_{i}\in\mathbf{X},\mathbf{PA}_{i}=\{X_{j}\in\mathbf{X}|X_{j}\rightarrow X_{i}% \}\cup\{Z_{j}\in\mathbf{Z}|Z_{j}\rightarrow X_{i}\}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_X , bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } ∪ { italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_Z | italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } denotes the set of parents of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

For a node Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, (Xi|𝐏𝐀i)conditionalsubscript𝑋𝑖subscript𝐏𝐀𝑖\mathbb{P}(X_{i}|\mathbf{PA}_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is called the causal mechanism of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The causal mechanism (Xi|𝐏𝐀i)conditionalsubscript𝑋𝑖subscript𝐏𝐀𝑖\mathbb{P}(X_{i}|\mathbf{PA}_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) encodes how the variable Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is influenced by its parents 𝐏𝐀isubscript𝐏𝐀𝑖\mathbf{PA}_{i}bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Following earlier work [22, 38, 45, 21, 46, 52], we make the following general assumption about the underlying causal mechanisms of data.

Assumption 3.1.

(Independent Causal Mechanisms [44, 47]) A change in (Xi|𝐏𝐀i)conditionalsubscriptXisubscript𝐏𝐀i\mathbb{P}(X_{i}|\mathbf{PA}_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) has no effect on and provides no information about (Xj|𝐏𝐀j)jiconditionalsubscriptXjsubscript𝐏𝐀jfor-allji\mathbb{P}(X_{j}|\mathbf{PA}_{j})\ \forall j\neq iblackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∀ italic_j ≠ italic_i.

Identifying confounding from only observational data is challenging without further assumptions [28]. Hence, following earlier work [38, 29, 40], we assume that the data over the variables 𝐗𝐗\mathbf{X}bold_X is observed over multiple contexts or environments. While there are various ways of formulating/constructing contexts, in this paper, we assume that each context is created as a result of either hard (a.k.a. structural) interventions or soft (a.k.a. parametric) interventions on a subset 𝐕S𝐕subscript𝐕𝑆𝐕\mathbf{V}_{S}\subseteq\mathbf{V}bold_V start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ⊆ bold_V of variables where S𝑆Sitalic_S is a set of indices. Performing hard intervention on a variable Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the same as setting the value of Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to a value visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Hard intervention on a variable Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT removes the influence of its parents 𝐏𝐀isubscript𝐏𝐀𝑖\mathbf{PA}_{i}bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Performing soft intervention on a variable Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the same as changing the causal mechanism of Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, (Vi|𝐏𝐀i)conditionalsubscript𝑉𝑖subscript𝐏𝐀𝑖\mathbb{P}(V_{i}|\mathbf{PA}_{i})blackboard_P ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), with a new causal mechanism ~(Vi|𝐏𝐀i)~conditionalsubscript𝑉𝑖subscript𝐏𝐀𝑖\tilde{\mathbb{P}}(V_{i}|\mathbf{PA}_{i})over~ start_ARG blackboard_P end_ARG ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Soft intervention on a variable Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT does not remove the influence of its parents 𝐏𝐀isubscript𝐏𝐀𝑖\mathbf{PA}_{i}bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The idea of explicitly considering context information and using different contexts as context variables to create extended causal graphs has been studied in the literature. Context variables are also called as policy variables, decision variables, regime variables, domain variables, environment variables, etc. [40, 45, 17, 22].

Let 𝐂={c1,c2,,cn}𝐂subscript𝑐1subscript𝑐2subscript𝑐𝑛\mathbf{C}=\{c_{1},c_{2},\dots,c_{n}\}bold_C = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } be the set of n𝑛nitalic_n contexts and let c(𝐗),c𝐂superscript𝑐𝐗𝑐𝐂\mathbb{P}^{c}(\mathbf{X}),c\in\mathbf{C}blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( bold_X ) , italic_c ∈ bold_C, denotes the probability distribution of the observed variables 𝐗𝐗\mathbf{X}bold_X in the context c𝑐citalic_c. Let 𝐂SRsubscript𝐂𝑆𝑅\mathbf{C}_{S\wedge R}bold_C start_POSTSUBSCRIPT italic_S ∧ italic_R end_POSTSUBSCRIPT, where S,R𝑆𝑅S,Ritalic_S , italic_R are sets of indices, be the set of contexts in which we observe mechanism changes for the set of variables 𝐗SRsubscript𝐗𝑆𝑅\mathbf{X}_{S\cup R}bold_X start_POSTSUBSCRIPT italic_S ∪ italic_R end_POSTSUBSCRIPT. Similarly, let 𝐂S¬Rsubscript𝐂𝑆𝑅\mathbf{C}_{S\wedge\neg R}bold_C start_POSTSUBSCRIPT italic_S ∧ ¬ italic_R end_POSTSUBSCRIPT be the set of contexts in which we observe mechanism changes for the set of variables 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT but not for the variables 𝐗Rsubscript𝐗𝑅\mathbf{X}_{R}bold_X start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT. We say that the causal mechanism of a variable Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT changes between two contexts c,c𝑐superscript𝑐c,c^{\prime}italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if c(Xi|𝐏𝐀i)c(Xi|𝐏𝐀i)superscript𝑐conditionalsubscript𝑋𝑖subscript𝐏𝐀𝑖superscriptsuperscript𝑐conditionalsubscript𝑋𝑖subscript𝐏𝐀𝑖\mathbb{P}^{c}(X_{i}|\mathbf{PA}_{i})\neq\mathbb{P}^{c^{\prime}}(X_{i}|\mathbf% {PA}_{i})blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≠ blackboard_P start_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Given the data over observed variables in each context, there exist methods for detecting mechanism shifts of each variable between the contexts [36, 38, 45, 37]. For example, the p-value(c(Xi|𝐏𝐀io)c(Xi|𝐏𝐀io))𝑝-valuesuperscript𝑐conditionalsubscript𝑋𝑖superscriptsubscript𝐏𝐀𝑖𝑜superscriptsuperscript𝑐conditionalsubscript𝑋𝑖superscriptsubscript𝐏𝐀𝑖𝑜p\text{-value}(\mathbb{P}^{c}(X_{i}|\mathbf{PA}_{i}^{o})\neq\mathbb{P}^{c^{% \prime}}(X_{i}|\mathbf{PA}_{i}^{o}))italic_p -value ( blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) ≠ blackboard_P start_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) ) where 𝐏𝐀iosuperscriptsubscript𝐏𝐀𝑖𝑜\mathbf{PA}_{i}^{o}bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT is the set of observed parents of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be used to detect mechanism change for Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT between the contexts c,c𝑐superscript𝑐c,c^{\prime}italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [38, 36]. Hence, we focus on detecting and measuring confounding among a set of variables, assuming that the causal mechanism shifts are observed among that set of variables.

Context information is not very useful if there is no restriction on how causal mechanisms are changed between the contexts [45, 38]. For example, the causal mechanisms of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT both differing across all (or no) contexts would trivially satisfy Assumption 3.1, but reveal no information about the underlying causal mechanisms [10, 38]. Hence, following earlier work [45, 38, 17], we make the following assumptions.

Assumption 3.2.

(Sparse Causal Mechanism Shift [53]) Causal mechanisms of variables change sparsely across contexts, i.e., if p(c(Xi|𝐏𝐀i)c(Xi|𝐏𝐀i))psuperscriptcconditionalsubscriptXisubscript𝐏𝐀isuperscriptsuperscriptcconditionalsubscriptXisubscript𝐏𝐀ip\coloneqq(\mathbb{P}^{c}(X_{i}|\mathbf{PA}_{i})\neq\mathbb{P}^{c^{\prime}}(X_% {i}|\mathbf{PA}_{i}))italic_p ≔ ( blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≠ blackboard_P start_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), then 0<p<0.5;c,c𝐂formulae-sequence0p0.5for-allcsuperscriptc𝐂0<p<0.5;\ \ \ \forall c,c^{\prime}\in\mathbf{C}0 < italic_p < 0.5 ; ∀ italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_C.

Assumption 3.2 implies that the causal mechanisms change infrequently across contexts. This assumption is more general because, in many scientific studies, for any given context, interventions typically affect only a few variables [53].

Assumption 3.3.

(Markov Property under Mechanism Shifts [17]) The distribution (𝐕)𝐕\mathbb{P}(\mathbf{V})blackboard_P ( bold_V ) is given by (𝐕)=C(𝐕)d(C)=ΠiC(Vi|𝐏𝐀i)d(C)𝐕superscriptC𝐕differential-dCsubscriptΠisuperscriptCconditionalsubscriptVisubscript𝐏𝐀idifferential-dC\mathbb{P}(\mathbf{V})=\int\mathbb{P}^{C}(\mathbf{V})d\mathbb{P}(C)=\int\Pi_{i% }\mathbb{P}^{C}(V_{i}|\mathbf{PA}_{i})d\mathbb{P}(C)blackboard_P ( bold_V ) = ∫ blackboard_P start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ( bold_V ) italic_d blackboard_P ( italic_C ) = ∫ roman_Π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_P start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_PA start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d blackboard_P ( italic_C ). In other words, variables 𝐕𝐕\mathbf{V}bold_V are assumed to be conditionally exchangeable, so that the same graph 𝒢𝒢\mathcal{G}caligraphic_G applies in every context c𝐂c𝐂c\in\mathbf{C}italic_c ∈ bold_C.

Assumption 3.4.

(Causal Sufficiency Over XZXZ\mathbf{X}\cup\mathbf{Z}bold_X ∪ bold_Z) All common parents of any pair of observed nodes belong to the set 𝐗𝐙𝐗𝐙\mathbf{X}\cup\mathbf{Z}bold_X ∪ bold_Z. In other words, all relevant variables for detecting confounding and the unobserved confounding variables are already present in 𝐗𝐙𝐗𝐙\mathbf{X}\cup\mathbf{Z}bold_X ∪ bold_Z.

Problem Statement: Given data over the observed variables 𝐗𝐗\mathbf{X}bold_X in multiple contexts, each context resulting from a sparse causal mechanism shift of variables in 𝐕𝐕\mathbf{V}bold_V, (i) can we identify which pairs or sets of variables are confounded and can we measure the confounding strength? (ii) can we isolate the confounding effects of observed and unobserved confounding variables? and (iii) can we study the relative strengths of confounding among different sets of variables?

To address the above problem, in the next section, we consider various definitions of confounding and present appropriate confounding measures depending on the context information available.

4 Detecting and Measuring Confounding

Settings Confounding Definition Required Context Type of
Based On Information Intervention
1 Directed Information [48] & 𝐂{i}¬Pijsubscript𝐂𝑖subscript𝑃𝑖𝑗\mathbf{C}_{\{i\}\wedge\neg P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT Hard / Structural
Noncollapsibility [15, 43, 54] 𝐂{j}¬Pjisubscript𝐂𝑗subscript𝑃𝑗𝑖\mathbf{C}_{\{j\}\wedge\neg P_{ji}}bold_C start_POSTSUBSCRIPT { italic_j } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
2 & 3 Mutual Information 𝐂{i}{j}subscript𝐂𝑖𝑗\mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT Soft / Parametric
Table 1: Summary of the various settings for detecting and measuring confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Here Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the set of node indices that belong to a path from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT including j𝑗jitalic_j.

In this section, we present methods for detecting and measuring confounding for various scenarios in which shifts in causal mechanisms are observed. Considering any three observed variables Xi,Xj,Xo𝐗subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑜𝐗X_{i},X_{j},X_{o}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∈ bold_X and an unobserved confounding variable Z𝐙𝑍𝐙Z\in\mathbf{Z}italic_Z ∈ bold_Z, we present measures of confounding depending on the information about mechanism shifts of Xi,Xj,Xo,Zsubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑜𝑍X_{i},X_{j},X_{o},Zitalic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT , italic_Z. Each of the following subsections includes: (i) a definition of confounding, (ii) a corresponding definition of the confounding measure, (iii) a method for isolating the unobserved confounding measure from the overall confounding, (iv) an extension of the confounding measure to more than two variables, and (v) key properties of the proposed confounding measures. See Tab. 1 and Fig. 1 for an overview.

4.1 Setting 1: Measuring Confounding Using Directed Information Between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

In this setting, we use the fact that directed information does not vanish in the presence of a confounding variable [64, 48]. To this end, we leverage the interventional effects of Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT on each other to define a measure of confounding.

Definition 4.1.

(Directed Information [48]). The directed information I(XiXj)IsubscriptXisubscriptXjI(X_{i}\rightarrow X_{j})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) from Xi𝐗subscriptXi𝐗X_{i}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_X to Xj𝐗subscriptXj𝐗X_{j}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X is defined as the conditional Kullback-Leibler divergence between the distributions (Xi|Xj),(Xi|do(Xj))conditionalsubscriptXisubscriptXjconditionalsubscriptXidosubscriptXj\mathbb{P}(X_{i}|X_{j}),\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ). That is:

I(XiXj)DKL((Xi|Xj)||(Xi|do(Xj)))𝔼(Xi,Xj)log(Xi|Xj)(Xi|do(Xj))I(X_{i}\rightarrow X_{j})\coloneqq D_{KL}(\mathbb{P}(X_{i}|X_{j})||\mathbb{P}(% X_{i}|do(X_{j})))\coloneqq\mathbb{E}_{\mathbb{P}(X_{i},X_{j})}\log\frac{% \mathbb{P}(X_{i}|X_{j})}{\mathbb{P}(X_{i}|do(X_{j}))}italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≔ italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | | blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) ) ≔ blackboard_E start_POSTSUBSCRIPT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_log divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) end_ARG (1)
Definition 4.2.

(No Confounding [44]) When measuring the causal effect of a (treatment) variable XisubscriptXiX_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on a (target) variable XjsubscriptXjX_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, the ordered pair (Xi,Xj)subscriptXisubscriptXj(X_{i},X_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is unconfounded if and only if the directed information from XjsubscriptXjX_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to XisubscriptXiX_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: I(XjXi)IsubscriptXjsubscriptXiI(X_{j}\rightarrow X_{i})italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is zero. Equivalently, (Xj|Xi)=(Xj|do(Xi))conditionalsubscriptXjsubscriptXiconditionalsubscriptXjdosubscriptXi\mathbb{P}(X_{j}|X_{i})=\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ).

A similar definition of confounding that relates the conditional distribution (Xi|Xj)conditionalsubscript𝑋𝑖subscript𝑋𝑗\mathbb{P}(X_{i}|X_{j})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and interventional distribution (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) is defined as follows.

Definition 4.3.

(Noncollapsibility) [15, 43, 54] The statistical association between two variables XisubscriptXiX_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and XjsubscriptXjX_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is said to be noncollapsible if the association strength differs in each level/strata of other variable XksubscriptXkX_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. That is, if XksubscriptXkX_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a confounding variable between Xi,XjsubscriptXisubscriptXjX_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we have (Xj|Xi)(Xj|do(Xi))=𝔼Xk((Xj|Xi,Xk))conditionalsubscriptXjsubscriptXiconditionalsubscriptXjdosubscriptXisubscript𝔼subscriptXkconditionalsubscriptXjsubscriptXisubscriptXk\mathbb{P}(X_{j}|X_{i})\neq\mathbb{P}(X_{j}|do(X_{i}))=\mathbb{E}_{X_{k}}(% \mathbb{P}(X_{j}|X_{i},X_{k}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≠ blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) = blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ).

From Defns. 4.1 and 4.2, for a pair of variables (Xi,Xj)subscript𝑋𝑖subscript𝑋𝑗(X_{i},X_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), observing I(XjXi)>0𝐼subscript𝑋𝑗subscript𝑋𝑖0I(X_{j}\rightarrow X_{i})>0italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > 0 and I(XiXj)>0𝐼subscript𝑋𝑖subscript𝑋𝑗0I(X_{i}\rightarrow X_{j})>0italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > 0 implies that (Xj|do(Xi))(Xj|Xi)conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑖\mathbb{P}(X_{j}|do(X_{i}))\neq\mathbb{P}(X_{j}|X_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ≠ blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and hence the presence of confounding (see Tab. 2). Using the above properties of directed information, we measure confounding as follows.

Definition 4.4.

(Confounding Measure 1) When causal mechanism shifts of two variables Xi,Xj𝐗subscriptXisubscriptXj𝐗X_{i},X_{j}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X are observed, resulting in different contexts, under the Assumptions 3.2-3.4, the measure of confounding CNF1(Xi,Xj)CNF1subscriptXisubscriptXjCNF\mathchar 45\relax 1(X_{i},X_{j})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) between XisubscriptXiX_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and XjsubscriptXjX_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is defined as follows.

CNF1(Xi,Xj):=1emin(I(XiXj),I(XjXi))assign𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗1superscript𝑒𝐼subscript𝑋𝑖subscript𝑋𝑗𝐼subscript𝑋𝑗subscript𝑋𝑖CNF\mathchar 45\relax 1(X_{i},X_{j}):=1-e^{-\min(I(X_{i}\rightarrow X_{j}),I(X% _{j}\rightarrow X_{i}))}italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) := 1 - italic_e start_POSTSUPERSCRIPT - roman_min ( italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT (2)
Refer to caption
Figure 1: Setting 1: When contexts 𝐂{i}¬Pijsubscript𝐂𝑖subscript𝑃𝑖𝑗\mathbf{C}_{\{i\}\wedge\neg P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝐂{j}¬Pjisubscript𝐂𝑗subscript𝑃𝑗𝑖\mathbf{C}_{\{j\}\wedge\neg P_{ji}}bold_C start_POSTSUBSCRIPT { italic_j } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT are known where Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the set of node indices that belong to a path from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT including j𝑗jitalic_j, we leverage directed information from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and from Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to define a measure of confounding (Defn. 4.4). Setting 2: Causal mechanism changes in Z𝑍Zitalic_Z introduces dependencies on the observed distributions of Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. We leverage such dependencies to measure confounding when contexts 𝐂{i}{j}subscript𝐂𝑖𝑗\mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT are known (Defn. 4.6). Setting 3: If we know that there is a causal path from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we leverage dependencies between the pairs (Xi,Xj)subscript𝑋𝑖subscript𝑋𝑗(X_{i},X_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and (Z,Xj)𝑍subscript𝑋𝑗(Z,X_{j})( italic_Z , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) to measure confounding. Similarly, if we know that there is a causal path from Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we leverage dependencies between the pairs (Xi,Xj)subscript𝑋𝑖subscript𝑋𝑗(X_{i},X_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and (Z,Xi)𝑍subscript𝑋𝑖(Z,X_{i})( italic_Z , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) to measure confounding (Defn. 4.7). Dashed arrows from Z𝑍Zitalic_Z indicate that Z𝑍Zitalic_Z is unobserved.
Graph I(XiXj)𝐼subscript𝑋𝑖subscript𝑋𝑗I(X_{i}\rightarrow X_{j})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) I(XjXi)𝐼subscript𝑋𝑗subscript𝑋𝑖I(X_{j}\rightarrow X_{i})italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
XiXjsubscript𝑋𝑖subscript𝑋𝑗X_{i}\rightarrow X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT >0absent0>0> 0 =0absent0=0= 0
Uncnf. XjXisubscript𝑋𝑗subscript𝑋𝑖X_{j}\rightarrow X_{i}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =0absent0=0= 0 >0absent0>0> 0
XiXjsubscript𝑋𝑖subscript𝑋𝑗X_{i}\rightarrow X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT >0absent0>0> 0 >0absent0>0> 0
ZXi,ZXjformulae-sequence𝑍subscript𝑋𝑖𝑍subscript𝑋𝑗Z\rightarrow X_{i},Z\rightarrow X_{j}italic_Z → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
XjXisubscript𝑋𝑗subscript𝑋𝑖X_{j}\rightarrow X_{i}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT >0absent0>0> 0 >0absent0>0> 0
Confounded ZXi,ZXjformulae-sequence𝑍subscript𝑋𝑖𝑍subscript𝑋𝑗Z\rightarrow X_{i},Z\rightarrow X_{j}italic_Z → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
Table 2: Directed information values in two and three node graphs. If Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded by Z𝑍Zitalic_Z, we observe positive directed information from both directions.

For all the confounding measures, we use exponential transformation to limit the range of the measure between 00 and 1111. Note that in a DAG, one of I(XiXj),I(XjXi)𝐼subscript𝑋𝑖subscript𝑋𝑗𝐼subscript𝑋𝑗subscript𝑋𝑖I(X_{i}\rightarrow X_{j}),I(X_{j}\rightarrow X_{i})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is zero under no confounding (see Tab. 2 for a simple example with two and three node graphs). Hence CNF1(Xi,Xj)𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 1(X_{i},X_{j})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) outputs zero when there is no confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Similarly CNF1(Xi,Xj)𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 1(X_{i},X_{j})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) outputs positive real value in the range (0,1]01(0,1]( 0 , 1 ] when there is confounding. We leverage data from multiple contexts to evaluate (Xi|Xj)conditionalsubscript𝑋𝑖subscript𝑋𝑗\mathbb{P}(X_{i}|X_{j})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) as follows. In this setting, we assume each context is generated as a result of hard interventions on a subset of variables. Let Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT be the set of node indices that belong to a path from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT including j𝑗jitalic_j, we use the contexts 𝐂{i}¬Pijsubscript𝐂𝑖subscript𝑃𝑖𝑗\mathbf{C}_{\{i\}\wedge\neg P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT to evaluate (Xj|do(Xi))conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) as (Xj|do(Xi))=𝔼c𝐂{i}¬Pij[c(Xj|Xi)]conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖subscript𝔼𝑐subscript𝐂𝑖subscript𝑃𝑖𝑗delimited-[]superscript𝑐conditionalsubscript𝑋𝑗subscript𝑋𝑖\mathbb{P}(X_{j}|do(X_{i}))=\mathbb{E}_{c\in\mathbf{C}_{\{i\}\wedge\neg P_{ij}% }}[\mathbb{P}^{c}(X_{j}|X_{i})]blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) = blackboard_E start_POSTSUBSCRIPT italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ]. Intuitively, to compute the interventional effects of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we need to observe mechanism changes only for Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to account for the potential causal influence from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. In addition, none of the nodes in a causal path from Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT should be intervened. We use observational data to evaluate (Xj|Xi)conditionalsubscript𝑋𝑗subscript𝑋𝑖\mathbb{P}(X_{j}|X_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

Proposition 4.1.

(Identifiability of (Xj|do(Xi))conditionalsubscriptXjdosubscriptXi\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )) (Xj|do(Xi))conditionalsubscriptXjdosubscriptXi\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) is identifiable from the set of contexts 𝐂{i}¬Pijsubscript𝐂isubscriptPij\mathbf{C}_{\{i\}\wedge\neg P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT. To detect and measure confounding between a pair of nodes Xi,XjsubscriptXisubscriptXjX_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, it is enough to observe two sets of contexts 𝐂{i}¬Pijsubscript𝐂isubscriptPij\mathbf{C}_{\{i\}\wedge\neg P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝐂{j}¬Pjisubscript𝐂jsubscriptPji\mathbf{C}_{\{j\}\wedge\neg P_{ji}}bold_C start_POSTSUBSCRIPT { italic_j } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Thus, nnnitalic_n sets of contexts are needed to detect and measure confounding between (n2)binomialn2\binom{n}{2}( FRACOP start_ARG italic_n end_ARG start_ARG 2 end_ARG ) distinct pairs of nodes in a causal DAG with nnnitalic_n nodes.

When a confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is observed, and there may exist an unobserved confounding variable Z𝑍Zitalic_Z, it is crucial to detect and measure unobserved confounding effect [29]. We utilize conditional directed information to define the measure of unobserved confounding.

Definition 4.5.

(Conditional Directed Information [48]). The conditional directed information I(XiXj|Xo)IsubscriptXiconditionalsubscriptXjsubscriptXoI(X_{i}\rightarrow X_{j}|X_{o})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) from XisubscriptXiX_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to XjsubscriptXjX_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT conditioned on XosubscriptXoX_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is defined as the conditional Kullback-Leibler divergence between the distributions (Xi|Xj,Xo),(Xi|do(Xj),Xo)conditionalsubscriptXisubscriptXjsubscriptXoconditionalsubscriptXidosubscriptXjsubscriptXo\mathbb{P}(X_{i}|X_{j},X_{o}),\mathbb{P}(X_{i}|do(X_{j}),X_{o})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) , blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) as follows.

I(XiXj|Xo)DKL((Xi|Xj,Xo)||(Xi|do(Xj),Xo))𝔼(Xi,Xj,Xo)log(Xi|Xj,Xo)(Xi|do(Xj),Xo)I(X_{i}\rightarrow X_{j}|X_{o})\coloneqq D_{KL}(\mathbb{P}(X_{i}|X_{j},X_{o})|% |\mathbb{P}(X_{i}|do(X_{j}),X_{o}))\coloneqq\underset{\mathbb{P}(X_{i},X_{j},X% _{o})}{\mathbb{E}}\log\frac{\mathbb{P}(X_{i}|X_{j},X_{o})}{\mathbb{P}(X_{i}|do% (X_{j}),X_{o})}italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ≔ italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) | | blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ) ≔ start_UNDERACCENT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_UNDERACCENT start_ARG blackboard_E end_ARG roman_log divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG (3)

This measure can trivially be extended to the case where there exist multiple observed and unobserved confounding variables. The expression (Xi|do(Xj),Xo)conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗subscript𝑋𝑜\mathbb{P}(X_{i}|do(X_{j}),X_{o})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) means conditioning on Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT in the interventional distribution (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ). Now, the conditional confounding can be measured as:

CNF1(Xi,Xj|Xo):=1emin(I(XiXj|Xo),I(XjXi|Xo))assign𝐶𝑁𝐹1subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜1superscript𝑒𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜𝐼subscript𝑋𝑗conditionalsubscript𝑋𝑖subscript𝑋𝑜CNF\mathchar 45\relax 1(X_{i},X_{j}|X_{o}):=1-e^{-\min(I(X_{i}\rightarrow X_{j% }|X_{o}),I(X_{j}\rightarrow X_{i}|X_{o}))}italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) := 1 - italic_e start_POSTSUPERSCRIPT - roman_min ( italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) , italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT (4)

Intuitively, by conditioning on an observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, we control the association between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT flowing via Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and measure the influence via the unobserved confounding variables.

Beyond Pairwise Confounding: We now study when a set 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT of variables where |𝐗S|>2subscript𝐗𝑆2|\mathbf{X}_{S}|>2| bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT | > 2 are jointly confounded i.e., share a common confounding variable and how to measure the joint confounding among the variables 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT.

Theorem 4.1.

A set of observed variables 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT are jointly unconfounded if and only if there exists three variables Xi,Xj,Xk𝐗Ssubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘subscript𝐗𝑆X_{i},X_{j},X_{k}\in\mathbf{X}_{S}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT such that I(XiXj|Xk)=I({Xi,Xk}Xj)𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑘𝐼subscript𝑋𝑖subscript𝑋𝑘subscript𝑋𝑗I(X_{i}\rightarrow X_{j}|X_{k})=I(\{X_{i},X_{k}\}\rightarrow X_{j})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_I ( { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

We now define the measure of confounding among the variables in 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT as follows.

CNF1(𝐗S)=iSCNF1(𝐗S{i},Xi)𝐶𝑁𝐹1subscript𝐗𝑆subscript𝑖𝑆𝐶𝑁𝐹1subscript𝐗𝑆𝑖subscript𝑋𝑖CNF\mathchar 45\relax 1(\mathbf{X}_{S})=\sum_{i\in S}CNF\mathchar 45\relax 1(% \mathbf{X}_{S\setminus\{i\}},X_{i})italic_C italic_N italic_F - 1 ( bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT italic_C italic_N italic_F - 1 ( bold_X start_POSTSUBSCRIPT italic_S ∖ { italic_i } end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (5)

Conditional confounding among a set of variables can be defined similarly to Eqn. 4. We now study some useful properties of the measure CNF1𝐶𝑁𝐹1CNF\mathchar 45\relax 1italic_C italic_N italic_F - 1.

Theorem 4.2.

For any three observed variables Xi,Xj,Xosubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑜X_{i},X_{j},X_{o}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and an unobserved confounding variable Z𝑍Zitalic_Z, the following statements are true for the measure CNF1𝐶𝑁𝐹1CNF\mathchar 45\relax 1italic_C italic_N italic_F - 1.

  1. 1.

    (Reflexivity and Symmetry.) CNF1(Xi,Xi|Xo)=0𝐶𝑁𝐹1subscript𝑋𝑖conditionalsubscript𝑋𝑖subscript𝑋𝑜0CNF\mathchar 45\relax 1(X_{i},X_{i}|X_{o})=0italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = 0, CNF1(Xi,Xj|Xo)=CNF1(Xj,Xi|Xo)𝐶𝑁𝐹1subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜𝐶𝑁𝐹1subscript𝑋𝑗conditionalsubscript𝑋𝑖subscript𝑋𝑜CNF\mathchar 45\relax 1(X_{i},X_{j}|X_{o})=CNF\mathchar 45\relax 1(X_{j},X_{i}% |X_{o})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ).

  2. 2.

    (Positivity.) CNF1(Xi,Xj)>0𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗0CNF\mathchar 45\relax 1(X_{i},X_{j})>0italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > 0 if and only if Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded. Given an observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, CNF1(Xi,Xj|Xo)>0𝐶𝑁𝐹1subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜0CNF\mathchar 45\relax 1(X_{i},X_{j}|X_{o})>0italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) > 0 if and only if there exists an unobserved confounding variable Z𝑍Zitalic_Z between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

  3. 3.

    (Monotonicity.) CNF1(Xi,Xj)>CNF1(Xk,Xl)𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹1subscript𝑋𝑘subscript𝑋𝑙CNF\mathchar 45\relax 1(X_{i},X_{j})>CNF\mathchar 45\relax 1(X_{k},X_{l})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) implies that the pair of variables Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are more strongly confounded than the pair of variables Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in the sense of Defns. 4.2 and 4.3.

4.2 Setting 2: Detecting and Measuring Confounding Using the Mechanism Shifts of Z𝑍Zitalic_Z.

The previous setting utilizes the interventional effects of Xi(Xj)subscript𝑋𝑖subscript𝑋𝑗X_{i}(X_{j})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) on Xj(Xi)subscript𝑋𝑗subscript𝑋𝑖X_{j}(X_{i})italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) to define a measure of confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. In this setting, we utilize the association between the observed marginal distributions of Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT under causal mechanism shifts of Z𝑍Zitalic_Z to measure confounding. To this end, similar to [38], we make the following assumption.

Assumption 4.1.

(Shift Faithfulness [38]) Let ZZZitalic_Z be a common parent for a set of variables XSXsubscriptXSX\textbf{X}_{S}\subseteq\textbf{X}X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ⊆ X. Then each causal mechanism shift in ZZZitalic_Z between two contexts c,ccsuperscriptcc,c^{\prime}italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT entails a causal mechanism change in each XiXSsubscriptXisubscriptXSX_{i}\in\textbf{X}_{S}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT between the same contexts c,ccsuperscriptcc,c^{\prime}italic_c , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

One consequence of the Assumption 4.1 is that a change in the causal mechanism of Z𝑍Zitalic_Z induces correlations between the expectations of Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in different contexts. To understand this, consider the following structural equations.

Z𝒩(μ(c),σ2(c))XiαZ+ϵiXjβXi+γZ+ϵjformulae-sequencesimilar-to𝑍𝒩𝜇𝑐superscript𝜎2𝑐formulae-sequencesubscript𝑋𝑖𝛼𝑍subscriptitalic-ϵ𝑖subscript𝑋𝑗𝛽subscript𝑋𝑖𝛾𝑍subscriptitalic-ϵ𝑗Z\sim\mathcal{N}(\mu(c),\sigma^{2}(c))\hskip 40.0ptX_{i}\coloneqq\alpha Z+% \epsilon_{i}\hskip 40.0ptX_{j}\coloneqq\beta X_{i}+\gamma Z+\epsilon_{j}italic_Z ∼ caligraphic_N ( italic_μ ( italic_c ) , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_c ) ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ italic_α italic_Z + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≔ italic_β italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ italic_Z + italic_ϵ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (6)

Where c𝑐citalic_c denotes the context and ϵxsubscriptitalic-ϵ𝑥\epsilon_{x}italic_ϵ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and ϵysubscriptitalic-ϵ𝑦\epsilon_{y}italic_ϵ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are noise variables with zero mean and have no additional restriction on the underlying probability distribution. The causal graph corresponding to this model has the nodes Xi,Xj,Zsubscript𝑋𝑖subscript𝑋𝑗𝑍X_{i},X_{j},Zitalic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Z and edges: ZXi,ZXj,XiXjformulae-sequence𝑍subscript𝑋𝑖formulae-sequence𝑍subscript𝑋𝑗subscript𝑋𝑖subscript𝑋𝑗Z\rightarrow X_{i},Z\rightarrow X_{j},X_{i}\rightarrow X_{j}italic_Z → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. It is easy to see that 𝔼(Xi)=αμ(c)𝔼subscript𝑋𝑖𝛼𝜇𝑐\mathbb{E}(X_{i})=\alpha\mu(c)blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_α italic_μ ( italic_c ) and 𝔼(Xj)=(αβ+γ)μ(c)𝔼subscript𝑋𝑗𝛼𝛽𝛾𝜇𝑐\mathbb{E}(X_{j})=(\alpha\beta+\gamma)\mu(c)blackboard_E ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = ( italic_α italic_β + italic_γ ) italic_μ ( italic_c ). Following Assumption 4.1, whenever there is a change in causal mechanism of Z𝑍Zitalic_Z (e.g., c𝑐citalic_c changes to c~~𝑐\tilde{c}over~ start_ARG italic_c end_ARG in Eqn. 6), there is a change in both 𝔼(Xi),𝔼(Xj)𝔼subscript𝑋𝑖𝔼subscript𝑋𝑗\mathbb{E}(X_{i}),\mathbb{E}(X_{j})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , blackboard_E ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Additionally, since Z𝑍Zitalic_Z is a common cause of both Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, there is a spurious association between 𝔼(Xi),𝔼(Xj)𝔼subscript𝑋𝑖𝔼subscript𝑋𝑗\mathbb{E}(X_{i}),\mathbb{E}(X_{j})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , blackboard_E ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Subsequently, in the set of contexts 𝐂{i}{j}subscript𝐂𝑖𝑗\mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT the values 𝔼(Xi)𝔼subscript𝑋𝑖\mathbb{E}(X_{i})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), 𝔼(Xi)𝔼subscript𝑋𝑖\mathbb{E}(X_{i})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are spuriously associated. Under Assumptions 3.2 and 4.1, restricting our analysis to 𝐂{i}{j}subscript𝐂𝑖𝑗\mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT ensures that with high probability, the association between 𝔼(Xi)𝔼subscript𝑋𝑖\mathbb{E}(X_{i})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), 𝔼(Xi)𝔼subscript𝑋𝑖\mathbb{E}(X_{i})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is due to the confounding variable Z𝑍Zitalic_Z. In this example, the association between 𝔼(Xi),𝔼(Xj)𝔼subscript𝑋𝑖𝔼subscript𝑋𝑗\mathbb{E}(X_{i}),\mathbb{E}(X_{j})blackboard_E ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , blackboard_E ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) exists even if β=0𝛽0\beta=0italic_β = 0, i.e., Xi↛Xj↛subscript𝑋𝑖subscript𝑋𝑗X_{i}\not\rightarrow X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ↛ italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. To define confounding measure, we create two random variables EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT which we define as EiC=𝔼Xic(Xi)(Xi),EjC=𝔼Xjc(Xj)(Xj)formulae-sequencesubscriptsuperscript𝐸𝐶𝑖subscript𝔼similar-tosubscript𝑋𝑖superscript𝑐subscript𝑋𝑖subscript𝑋𝑖subscriptsuperscript𝐸𝐶𝑗subscript𝔼similar-tosubscript𝑋𝑗superscript𝑐subscript𝑋𝑗subscript𝑋𝑗E^{C}_{i}=\mathbb{E}_{X_{i}\sim\mathbb{P}^{c}(X_{i})}(X_{i}),E^{C}_{j}=\mathbb% {E}_{X_{j}\sim\mathbb{P}^{c}(X_{j})}(X_{j})italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) respectively where c𝐂{i}{j}𝑐subscript𝐂𝑖𝑗c\in\mathbf{C}_{\{i\}\wedge\{j\}}italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT. Relying on the context information 𝐂{i}{j}subscript𝐂𝑖𝑗\mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT and utilizing the association between EiCsubscriptsuperscript𝐸𝐶𝑖E^{C}_{i}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and EjCsubscriptsuperscript𝐸𝐶𝑗E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we define a confounding measure as follows.

Proposition 4.2.

(Confounding Based on Mutual Information) If two variables Xi,XjsubscriptXisubscriptXjX_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded by a variable ZZZitalic_Z, the induced random variables EiC,EjCsuperscriptsubscriptEiCsuperscriptsubscriptEjCE_{i}^{C},E_{j}^{C}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT as described above have non zero mutual information I(EiC;EjC)IsubscriptsuperscriptECisubscriptsuperscriptECjI(E^{C}_{i};E^{C}_{j})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

Definition 4.6.

(Confounding Measure 2) When the causal mechanism shifts are observed for Xi,XjsubscriptXisubscriptXjX_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in different contexts and the contexts 𝐂{i}{j}subscript𝐂ij\mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT are known, under the Assumptions 3.2-4.1, the measure of confounding CNF2(Xi,Xj)CNF2subscriptXisubscriptXjCNF\mathchar 45\relax 2(X_{i},X_{j})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) between XisubscriptXiX_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and XjsubscriptXjX_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is defined as

CNF2(Xi,Xj):=1eI(EiC;EjC)assign𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗1superscript𝑒𝐼subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗CNF\mathchar 45\relax 2(X_{i},X_{j}):=1-e^{-I(E^{C}_{i};E^{C}_{j})}italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) := 1 - italic_e start_POSTSUPERSCRIPT - italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (7)

To measure the unobserved confounding strength when we already observe a confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, we condition on the observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT to define CNF2(Xi,Xj|Xo)𝐶𝑁𝐹2subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜CNF\mathchar 45\relax 2(X_{i},X_{j}|X_{o})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) as follows.

CNF2(Xi,Xj|Xo):=1eI(EiC;EjC|Xo)assign𝐶𝑁𝐹2subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜1superscript𝑒𝐼superscriptsubscript𝐸𝑖𝐶conditionalsuperscriptsubscript𝐸𝑗𝐶subscript𝑋𝑜CNF\mathchar 45\relax 2(X_{i},X_{j}|X_{o}):=1-e^{-I(E_{i}^{C};E_{j}^{C}|X_{o})}italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) := 1 - italic_e start_POSTSUPERSCRIPT - italic_I ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ; italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (8)

Beyond Pairwise Confounding: Following earlier work [38], we utilize total correlation among triplets (EiC,EjC,EkC)superscriptsubscript𝐸𝑖𝐶superscriptsubscript𝐸𝑗𝐶superscriptsubscript𝐸𝑘𝐶(E_{i}^{C},E_{j}^{C},E_{k}^{C})( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ) of random variables in {EiC}iSsubscriptsubscriptsuperscript𝐸𝐶𝑖𝑖𝑆\{E^{C}_{i}\}_{i\in S}{ italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT to verify whether a set of variables 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT are jointly confounded. By Assumption 4.1, we know that the variables in 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT jointly confounded only if each pair Xi,Xj;i,jSsubscript𝑋𝑖subscript𝑋𝑗𝑖𝑗𝑆X_{i},X_{j};\ \ i,j\in Sitalic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ; italic_i , italic_j ∈ italic_S is pairwise confounded. If all three variables share the same latent confounding variable Z𝑍Zitalic_Z, then knowing about one of EiC,EjC,EkCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗subscriptsuperscript𝐸𝐶𝑘E^{C}_{i},E^{C}_{j},E^{C}_{k}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT explains away some of the association between the other two, so that we have I(EiC,EjC|EkC)<I(EiC,EjC)𝐼subscriptsuperscript𝐸𝐶𝑖conditionalsubscriptsuperscript𝐸𝐶𝑗subscriptsuperscript𝐸𝐶𝑘𝐼subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗I(E^{C}_{i},E^{C}_{j}|E^{C}_{k})<I(E^{C}_{i},E^{C}_{j})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) < italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). However, for a triplet (Xi,Xj,Xk)subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘(X_{i},X_{j},X_{k})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), it is possible that, rather than jointly confounded, there may be three disjoint confounding variables Z12,Z13,Z23subscript𝑍12subscript𝑍13subscript𝑍23Z_{12},Z_{13},Z_{23}italic_Z start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 23 end_POSTSUBSCRIPT confounding each of the individual pairs: (Xi,Xj),(Xj,Xk),(Xk,Xi)subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑗subscript𝑋𝑘subscript𝑋𝑘subscript𝑋𝑖(X_{i},X_{j}),(X_{j},X_{k}),(X_{k},X_{i})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). In general, for a set of variables of size s𝑠sitalic_s to permit such an equivalent explanation, we would need to have a total of (s2)binomial𝑠2\binom{s}{2}( FRACOP start_ARG italic_s end_ARG start_ARG 2 end_ARG ) confounding variables with s(s1)𝑠𝑠1s(s-1)italic_s ( italic_s - 1 ) outgoing edges to obtain the same structure of pairwise confounding [38]. While this may plausibly occur for small sets of variables that appear to be pairwise correlated, we assume the true graph 𝒢𝒢\mathcal{G}caligraphic_G to be causally minimal in the following sense.

Assumption 4.2.

(Confounder Minimality [38]) For every subset 𝐗Ssubscript𝐗S\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT of at least |S|4S4|S|\geq 4| italic_S | ≥ 4 variables, there are at most 2|S|2S2|S|2 | italic_S | edges incoming into 𝐗Ssubscript𝐗S\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT from latent confounding variables with at least three children in 𝐗Ssubscript𝐗S\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT.

Assumption 4.2 ensures that variables that appear to be jointly confounded are indeed confounded. In other words, when a small number of latent variables suffice to explain the observed correlations, there should indeed exist only few confounding variables. With this assumption, we can guarantee that joint confounding can be identified from the total correlation.

Theorem 4.3.

Let 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT be a set of variables such that all Xi,Xj𝐗Ssubscript𝑋𝑖subscript𝑋𝑗subscript𝐗𝑆X_{i},X_{j}\in\mathbf{X}_{S}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT are pairwise confounded. Then 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is jointly confounded if and only if for each triple Xi,Xj,Xk𝐗Ssubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘subscript𝐗𝑆X_{i},X_{j},X_{k}\in\mathbf{X}_{S}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT we have I(EiC;EjC|EkC)<I(EiC;EjC)𝐼subscriptsuperscript𝐸𝐶𝑖conditionalsubscriptsuperscript𝐸𝐶𝑗subscriptsuperscript𝐸𝐶𝑘𝐼subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗I(E^{C}_{i};E^{C}_{j}|E^{C}_{k})<I(E^{C}_{i};E^{C}_{j})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) < italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ).

Now, the measure of joint confounding among a set of variables 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT can be defined using total correlation T(EiC,,E|S|C)𝑇subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑆T(E^{C}_{i},\dots,E^{C}_{|S|})italic_T ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT ) as follows. To evaluate the following expression, we need to use the contexts 𝐂{1}{|S|}subscript𝐂1𝑆\mathbf{C}_{\{1\}\cup\dots\cup\{|S|\}}bold_C start_POSTSUBSCRIPT { 1 } ∪ ⋯ ∪ { | italic_S | } end_POSTSUBSCRIPT to ensure that with high probability, the association among the variables in 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is due to the joint confounding variable Z𝑍Zitalic_Z.

CNF2(𝐗S)=1eT(EiC,,E|S|C)𝐶𝑁𝐹2subscript𝐗𝑆1superscript𝑒𝑇subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑆CNF\mathchar 45\relax 2(\mathbf{X}_{S})=1-e^{-T(E^{C}_{i},\dots,E^{C}_{|S|})}italic_C italic_N italic_F - 2 ( bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = 1 - italic_e start_POSTSUPERSCRIPT - italic_T ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_S | end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (9)
Theorem 4.4.

For any three observed variables Xi,Xj,Xosubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑜X_{i},X_{j},X_{o}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and an unobserved confounding variable Z𝑍Zitalic_Z, the following statements are true for the measure CNF2𝐶𝑁𝐹2CNF\mathchar 45\relax 2italic_C italic_N italic_F - 2.

  1. 1.

    (Reflexivity and Symmetry.) CNF2(Xi,Xi|Xo)=1eH(EiC|Xo)i𝐶𝑁𝐹2subscript𝑋𝑖conditionalsubscript𝑋𝑖subscript𝑋𝑜1superscript𝑒𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜for-all𝑖CNF\mathchar 45\relax 2(X_{i},X_{i}|X_{o})=1-e^{-H(E^{C}_{i}|X_{o})}\ \ \forall iitalic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = 1 - italic_e start_POSTSUPERSCRIPT - italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∀ italic_i where H(.|.)H(.|.)italic_H ( . | . ) denotes conditional entropy and CNF2(Xi,Xj|Xo)=CNF2(Xj,Xi|Xo)𝐶𝑁𝐹2subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜𝐶𝑁𝐹2subscript𝑋𝑗conditionalsubscript𝑋𝑖subscript𝑋𝑜CNF\mathchar 45\relax 2(X_{i},X_{j}|X_{o})=CNF\mathchar 45\relax 2(X_{j},X_{i}% |X_{o})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ).

  2. 2.

    (Positivity.) CNF2(Xi,Xj)>0𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗0CNF\mathchar 45\relax 2(X_{i},X_{j})>0italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > 0 if and only if Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded. Given an observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, CNF2(Xi,Xj|Xo)>0𝐶𝑁𝐹2subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜0CNF\mathchar 45\relax 2(X_{i},X_{j}|X_{o})>0italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) > 0 if and only if there exists an unobserved confounding variable Z𝑍Zitalic_Z between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

  3. 3.

    (Monotonicity.) CNF2(Xi,Xj)>CNF2(Xk,Xl)𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹2subscript𝑋𝑘subscript𝑋𝑙CNF\mathchar 45\relax 2(X_{i},X_{j})>CNF\mathchar 45\relax 2(X_{k},X_{l})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) implies that the pair of variables Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are more strongly confounded than the pair of variables Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in the sense of Defn. 4.2.

4.3 Setting 3: Observing the Causal Mechanism Shifts in Z𝑍Zitalic_Z and Known Causal Path Direction Between Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT

Similar to the previous settings, we utilize marginal and conditional distributions of Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to define a measure of confounding. By prior knowledge, if we know the direction of causal path between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we can utilize the causal direction to measure confounding as explained below. In addition to the notations EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT introduced in the previous setting, let us denote for each c𝐂{i}{j},𝔼Xic(Xi|Xj)(Xi|Xj),𝔼Xjc(Xj|Xi)(Xj|Xi)𝑐subscript𝐂𝑖𝑗subscript𝔼similar-tosubscript𝑋𝑖superscript𝑐conditionalsubscript𝑋𝑖subscript𝑋𝑗conditionalsubscript𝑋𝑖subscript𝑋𝑗subscript𝔼similar-tosubscript𝑋𝑗superscript𝑐conditionalsubscript𝑋𝑗subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑖c\in\mathbf{C}_{\{i\}\wedge\{j\}},\mathbb{E}_{X_{i}\sim\mathbb{P}^{c}(X_{i}|X_% {j})}(X_{i}|X_{j}),\mathbb{E}_{X_{j}\sim\mathbb{P}^{c}(X_{j}|X_{i})}(X_{j}|X_{% i})italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT , blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , blackboard_E start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ blackboard_P start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) with EijC,EjiCsubscriptsuperscript𝐸𝐶𝑖𝑗subscriptsuperscript𝐸𝐶𝑗𝑖E^{C}_{ij},E^{C}_{ji}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT respectively. We now leverage dependency among these variables to define the measure of confounding. Intuitively, if XiXjsubscript𝑋𝑖subscript𝑋𝑗X_{i}\rightarrow X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and if we observe a change in the causal mechanisms of both Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT due to the causal mechanism changes in Z𝑍Zitalic_Z, we also observe a change in the causal mechanism (Xj|Xi)conditionalsubscript𝑋𝑗subscript𝑋𝑖\mathbb{P}(X_{j}|X_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

Definition 4.7.

(Confounding Measure 3) When the causal mechanism shifts are observed for Xi,XjsubscriptXisubscriptXjX_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and the causal direction between the nodes Xi,XjsubscriptXisubscriptXjX_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is known, under the Assumptions 3.2-4.1, the measure of confounding CNF3(Xi,Xj)CNF3subscriptXisubscriptXjCNF\mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) between Xi𝐗subscriptXi𝐗X_{i}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_X and Xj𝐗subscriptXj𝐗X_{j}\in\mathbf{X}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X is defined as

CNF3(Xi,Xj):={1eI(EjiC;EjC)ifXiXj1eI(EijC;EiC)ifXjXiCNF2(Xi,Xj)Otherwiseassign𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗cases1superscript𝑒𝐼subscriptsuperscript𝐸𝐶𝑗𝑖subscriptsuperscript𝐸𝐶𝑗𝑖𝑓subscript𝑋𝑖subscript𝑋𝑗otherwise1superscript𝑒𝐼subscriptsuperscript𝐸𝐶𝑖𝑗subscriptsuperscript𝐸𝐶𝑖𝑖𝑓subscript𝑋𝑗subscript𝑋𝑖otherwise𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗𝑂𝑡𝑒𝑟𝑤𝑖𝑠𝑒otherwiseCNF\mathchar 45\relax 3(X_{i},X_{j}):=\begin{cases}1-e^{-I(E^{C}_{ji};E^{C}_{j% })}\ \ \ if\ \ \ X_{i}\rightarrow\dots\rightarrow X_{j}\\ 1-e^{-I(E^{C}_{ij};E^{C}_{i})}\ \ \ if\ \ \ X_{j}\rightarrow\dots\rightarrow X% _{i}\\ CNF\mathchar 45\relax 2(X_{i},X_{j})\ \ \ Otherwise\\ \end{cases}italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) := { start_ROW start_CELL 1 - italic_e start_POSTSUPERSCRIPT - italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_i italic_f italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → … → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 1 - italic_e start_POSTSUPERSCRIPT - italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_i italic_f italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → … → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_O italic_t italic_h italic_e italic_r italic_w italic_i italic_s italic_e end_CELL start_CELL end_CELL end_ROW (10)

To measure the unobserved confounding strength in the presence of an observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT, similar to setting 2, we can modify Eqn. 10 to condition on the variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT.

Beyond Pairwise Confounding: Using the Assumption 4.2, we have the following.

Theorem 4.5.

Let 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT be a set of variables such that all Xi,Xj𝐗Ssubscript𝑋𝑖subscript𝑋𝑗subscript𝐗𝑆X_{i},X_{j}\in\mathbf{X}_{S}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT are pairwise confounded and the causal relationships among each pair Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Then 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is jointly confounded if and only if for each triple Xi,Xj,Xk𝐗Ssubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘subscript𝐗𝑆X_{i},X_{j},X_{k}\in\mathbf{X}_{S}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT we have I(EijC;EjkC|EjC)<I(EijC;EjkC)𝐼subscriptsuperscript𝐸𝐶𝑖𝑗conditionalsubscriptsuperscript𝐸𝐶𝑗𝑘subscriptsuperscript𝐸𝐶𝑗𝐼subscriptsuperscript𝐸𝐶𝑖𝑗subscriptsuperscript𝐸𝐶𝑗𝑘I(E^{C}_{ij};E^{C}_{jk}|E^{C}_{j})<I(E^{C}_{ij};E^{C}_{jk})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) < italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ).

Since we have access to random variables EijCsubscriptsuperscript𝐸𝐶𝑖𝑗E^{C}_{ij}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT in addition to EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, it is not straightforward to use all of them to measure joint confounding. To keep the measure simple, we let the measure of joint confounding among the variables 𝐗Ssubscript𝐗𝑆\mathbf{X}_{S}bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT be the same as CNF2(𝐗S)𝐶𝑁𝐹2subscript𝐗𝑆CNF\mathchar 45\relax 2(\mathbf{X}_{S})italic_C italic_N italic_F - 2 ( bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ). That is, CNF3(𝐗S)=CNF2(𝐗S)𝐶𝑁𝐹3subscript𝐗𝑆𝐶𝑁𝐹2subscript𝐗𝑆CNF\mathchar 45\relax 3(\mathbf{X}_{S})=CNF\mathchar 45\relax 2(\mathbf{X}_{S})italic_C italic_N italic_F - 3 ( bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ) = italic_C italic_N italic_F - 2 ( bold_X start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ). Setting 3 is an alternative to Setting 2 when we know the direction of the causal path between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Settings 2 and 3 act as complementary to each other in validating the correctness of our analysis.

Theorem 4.6.

For any three observed variables Xi,Xj,Xosubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑜X_{i},X_{j},X_{o}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT and an unobserved confounding variable Z𝑍Zitalic_Z, the following statements are true for the measure CNF3𝐶𝑁𝐹3CNF\mathchar 45\relax 3italic_C italic_N italic_F - 3.

  1. 1.

    (Reflexivity and Symmetry.) CNF3(Xi,Xi|Xo)=1eH(EiC|Xo)i𝐶𝑁𝐹3subscript𝑋𝑖conditionalsubscript𝑋𝑖subscript𝑋𝑜1superscript𝑒𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜for-all𝑖CNF\mathchar 45\relax 3(X_{i},X_{i}|X_{o})=1-e^{-H(E^{C}_{i}|X_{o})}\ \ \forall iitalic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = 1 - italic_e start_POSTSUPERSCRIPT - italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∀ italic_i where H(.|.)H(.|.)italic_H ( . | . ) denotes conditional entropy and CNF3(Xi,Xj|Xo)=CNF3(Xj,Xi|Xo)𝐶𝑁𝐹3subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜𝐶𝑁𝐹3subscript𝑋𝑗conditionalsubscript𝑋𝑖subscript𝑋𝑜CNF\mathchar 45\relax 3(X_{i},X_{j}|X_{o})=CNF\mathchar 45\relax 3(X_{j},X_{i}% |X_{o})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ).

  2. 2.

    (Positivity.) CNF3(Xi,Xj)>0𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗0CNF\mathchar 45\relax 3(X_{i},X_{j})>0italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > 0 if and only if Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded. Given an observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, CNF3(Xi,Xj|Xo)>0𝐶𝑁𝐹3subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜0CNF\mathchar 45\relax 3(X_{i},X_{j}|X_{o})>0italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) > 0 if and only if there exists an unobserved confounding variable Z𝑍Zitalic_Z between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

  3. 3.

    (Monotonicity.) CNF3(Xi,Xj)>CNF3(Xk,Xl)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹3subscript𝑋𝑘subscript𝑋𝑙CNF\mathchar 45\relax 3(X_{i},X_{j})>CNF\mathchar 45\relax 3(X_{k},X_{l})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) implies that the pair of variables Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are more strongly confounded than the pair of variables Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in the sense of Defn. 4.2.

5 Algorithm

Algorithm 1 outlines the procedures to measure confounding in all three settings and can be extended to the case where we evaluate conditional confounding and evaluating confounding among multiple variables. We present two real-world examples where our methods can be applied in Appendix § B.

Data: Context information 𝐂{i}¬Pij,𝐂{j}¬Pji,𝐂{i}{j}subscript𝐂𝑖subscript𝑃𝑖𝑗subscript𝐂𝑗subscript𝑃𝑗𝑖subscript𝐂𝑖𝑗\mathbf{C}_{\{i\}\wedge\neg P_{ij}},\mathbf{C}_{\{j\}\wedge\neg P_{ji}},% \mathbf{C}_{\{i\}\wedge\{j\}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_C start_POSTSUBSCRIPT { italic_j } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT, Contextual Datasets {𝒟c}c𝐂subscriptsuperscript𝒟𝑐𝑐𝐂\{\mathcal{D}^{c}\}_{c\in\mathbf{C}}{ caligraphic_D start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∈ bold_C end_POSTSUBSCRIPT.
Result: CNF1(Xi,Xj),CNF2(Xi,Xj),CNF3(Xi,Xj)𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 1(X_{i},X_{j}),CNF\mathchar 45\relax 2(X_{i},X_{j}),CNF% \mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
Step 1:   Evaluate (Xi|Xj),(Xj|Xi)conditionalsubscript𝑋𝑖subscript𝑋𝑗conditionalsubscript𝑋𝑗subscript𝑋𝑖\mathbb{P}(X_{i}|X_{j}),\mathbb{P}(X_{j}|X_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) using observational data;
Step 2:   Evaluate (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) using {𝒟c}c𝐂{j}¬Pjisubscriptsuperscript𝒟𝑐𝑐subscript𝐂𝑗subscript𝑃𝑗𝑖\{\mathcal{D}^{c}\}_{c\in\mathbf{C}_{\{j\}\wedge\neg P_{ji}}}{ caligraphic_D start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_j } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT;
Step 3:   Evaluate (Xj|do(Xi))conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) using {𝒟c}c𝐂{i}¬Pijsubscriptsuperscript𝒟𝑐𝑐subscript𝐂𝑖subscript𝑃𝑖𝑗\{\mathcal{D}^{c}\}_{c\in\mathbf{C}_{\{i\}\wedge\neg P_{ij}}}{ caligraphic_D start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_i } ∧ ¬ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT;
Step 4:   Evaluate I(XiXj),I(XjXi)𝐼subscript𝑋𝑖subscript𝑋𝑗𝐼subscript𝑋𝑗subscript𝑋𝑖I(X_{i}\rightarrow X_{j}),I(X_{j}\rightarrow X_{i})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT );
Step 5:   CNF1(Xi,Xj)=1emin(I(XiXj),I(XjXi))𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗1superscript𝑒𝐼subscript𝑋𝑖subscript𝑋𝑗𝐼subscript𝑋𝑗subscript𝑋𝑖CNF\mathchar 45\relax 1(X_{i},X_{j})=1-e^{-\min(I(X_{i}\rightarrow X_{j}),I(X_% {j}\rightarrow X_{i}))}italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = 1 - italic_e start_POSTSUPERSCRIPT - roman_min ( italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT;
Step 6:   Evaluate EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT using {𝒟c}c𝐂{i}{j}subscriptsuperscript𝒟𝑐𝑐subscript𝐂𝑖𝑗\{\mathcal{D}^{c}\}_{c\in\mathbf{C}_{\{i\}\wedge\{j\}}}{ caligraphic_D start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT end_POSTSUBSCRIPT ;
Step 7:   CNF2(Xi,Xj)=1eI(Eic;Ejc)𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗1superscript𝑒𝐼subscriptsuperscript𝐸𝑐𝑖subscriptsuperscript𝐸𝑐𝑗CNF\mathchar 45\relax 2(X_{i},X_{j})=1-e^{-I(E^{c}_{i};E^{c}_{j})}italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = 1 - italic_e start_POSTSUPERSCRIPT - italic_I ( italic_E start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT;
Step 8:   Evaluate EijC,EjiCsubscriptsuperscript𝐸𝐶𝑖𝑗subscriptsuperscript𝐸𝐶𝑗𝑖E^{C}_{ij},E^{C}_{ji}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT using {𝒟c}c𝐂{i}{j}subscriptsuperscript𝒟𝑐𝑐subscript𝐂𝑖𝑗\{\mathcal{D}^{c}\}_{c\in\mathbf{C}_{\{i\}\wedge\{j\}}}{ caligraphic_D start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∈ bold_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT end_POSTSUBSCRIPT ;
Step 9:   compute CNF3(Xi,Xj)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) according to Defn. 4.7;
return CNF1(Xi,Xj),CNF2(Xi,Xj),CNF3(Xi,Xj)𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 1(X_{i},X_{j}),CNF\mathchar 45\relax 2(X_{i},X_{j}),CNF% \mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
Algorithm 1 Algorithm for evaluating pairwise CNF1,CNF2,CNF3𝐶𝑁𝐹1𝐶𝑁𝐹2𝐶𝑁𝐹3CNF\mathchar 45\relax 1,CNF\mathchar 45\relax 2,CNF\mathchar 45\relax 3italic_C italic_N italic_F - 1 , italic_C italic_N italic_F - 2 , italic_C italic_N italic_F - 3

6 Experiments and Results

We perform simulation studies to verify the correctness of the proposed measures. All the experiments are run on a CPU. We report the mean and standard deviation of results taken over five random seeds. Code to reproduce the results is presented in the supplementary material. Code is available at https://github.com/gautam0707/CD_CNF.

Measuring Confounding: In this set of experiments, we consider the following four causal structures made of three nodes Xi,Xj,Zsubscript𝑋𝑖subscript𝑋𝑗𝑍X_{i},X_{j},Zitalic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Z: 𝒢1::subscript𝒢1absent\mathcal{G}_{1}:caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : Empty graph over Z,Xi,Xj𝑍subscript𝑋𝑖subscript𝑋𝑗Z,X_{i},X_{j}italic_Z , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT i.e., nodes are isolated in the graph, 𝒢2:XiXj:subscript𝒢2subscript𝑋𝑖subscript𝑋𝑗\mathcal{G}_{2}:X_{i}\rightarrow X_{j}caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, 𝒢3:ZXi,ZXj:subscript𝒢3formulae-sequence𝑍subscript𝑋𝑖𝑍subscript𝑋𝑗\mathcal{G}_{3}:Z\rightarrow X_{i},Z\rightarrow X_{j}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT : italic_Z → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, 𝒢4::subscript𝒢4absent\mathcal{G}_{4}:caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT : ZXi,ZXj,XiXjformulae-sequence𝑍subscript𝑋𝑖formulae-sequence𝑍subscript𝑋𝑗subscript𝑋𝑖subscript𝑋𝑗Z\rightarrow X_{i},Z\rightarrow X_{j},X_{i}\rightarrow X_{j}italic_Z → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. In 𝒢1,𝒢2subscript𝒢1subscript𝒢2\mathcal{G}_{1},\mathcal{G}_{2}caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, there is no confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and in 𝒢3,𝒢4subscript𝒢3subscript𝒢4\mathcal{G}_{3},\mathcal{G}_{4}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT there is confounding effect of Z𝑍Zitalic_Z on Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Results in Fig. 2 show that our measures output zero when there is no confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and output positive values when Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded by a confounding variable Z𝑍Zitalic_Z.

Refer to caption
Figure 2: Measure of confounding between a pair of variables Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Our measures output zero when there is no confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and output positive values when Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded.
Refer to caption
Figure 3: Left: Conditioning on one of ,Z1,Z2subscript𝑍1subscript𝑍2\emptyset,Z_{1},Z_{2}∅ , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT will not remove confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in 𝒢5subscript𝒢5\mathcal{G}_{5}caligraphic_G start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT. Hence CNF2𝐶𝑁𝐹2CNF\mathchar 45\relax 2italic_C italic_N italic_F - 2 returns positive values. Right: In 𝒢6subscript𝒢6\mathcal{G}_{6}caligraphic_G start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, conditioning on \emptyset does not remove the confounding effect of Z𝑍Zitalic_Z on Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Hence, we observe a positive value for CNF2(Xi,Xj|)𝐶𝑁𝐹2subscript𝑋𝑖conditionalsubscript𝑋𝑗CNF\mathchar 45\relax 2(X_{i},X_{j}|\emptyset)italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ∅ ). Conditioning on Z𝑍Zitalic_Z will block the confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Hence CNF2𝐶𝑁𝐹2CNF\mathchar 45\relax 2italic_C italic_N italic_F - 2 is closer to zero.

Measuring Conditional Confounding: We consider the following two causal structures. 𝒢5:Z1Xi,Z1Xj,Z2Xi,Z2Xj,XiXj:subscript𝒢5formulae-sequencesubscript𝑍1subscript𝑋𝑖formulae-sequencesubscript𝑍1subscript𝑋𝑗formulae-sequencesubscript𝑍2subscript𝑋𝑖formulae-sequencesubscript𝑍2subscript𝑋𝑗subscript𝑋𝑖subscript𝑋𝑗\mathcal{G}_{5}:Z_{1}\rightarrow X_{i},Z_{1}\rightarrow X_{j},Z_{2}\rightarrow X% _{i},Z_{2}\rightarrow X_{j},X_{i}\rightarrow X_{j}caligraphic_G start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT : italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. 𝒢6:ZXi,ZXj,XiXj:subscript𝒢6formulae-sequence𝑍subscript𝑋𝑖formulae-sequence𝑍subscript𝑋𝑗subscript𝑋𝑖subscript𝑋𝑗\mathcal{G}_{6}:Z\rightarrow X_{i},Z\rightarrow X_{j},X_{i}\rightarrow X_{j}caligraphic_G start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT : italic_Z → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Z → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. In 𝒢5subscript𝒢5\mathcal{G}_{5}caligraphic_G start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded by two variables Z1,Z2subscript𝑍1subscript𝑍2Z_{1},Z_{2}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We measure conditional confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT conditioned on \emptyset, Z1subscript𝑍1Z_{1}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and Z2subscript𝑍2Z_{2}italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT respectively. Since confounding still exists in all of the above conditioning settings, CNF2𝐶𝑁𝐹2CNF\mathchar 45\relax 2italic_C italic_N italic_F - 2 correctly returns positive confounding value in all three cases (see Fig. 3 left). On the other hand, in 𝒢6subscript𝒢6\mathcal{G}_{6}caligraphic_G start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, we measure conditional confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT conditioning on empty set and Z𝑍Zitalic_Z. Since conditioning on Z𝑍Zitalic_Z will block the confounding association between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, CNF2𝐶𝑁𝐹2CNF\mathchar 45\relax 2italic_C italic_N italic_F - 2 returns confounding value closer to zero. However, the unconditioned confounding (conditioning on empty set) value is still large. These results empirically validate the correctness of the proposed measures.

Causal Not Controlling Confounding Controlling Confounding
Graph 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000
𝒢3subscript𝒢3\mathcal{G}_{3}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.55 0.57 0.55 0.52 0.52 0.06 0.02 0.007 0.03 0.009
𝒢4subscript𝒢4\mathcal{G}_{4}caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT 0.24 0.26 0.23 0.24 0.23 0.04 0.05 0.06 0.02 0.05
Table 3: Downstream application of causal effect estimation.

Downstream Causal Effect Estimation: For the causal graphs 𝒢3,𝒢4subscript𝒢3subscript𝒢4\mathcal{G}_{3},\mathcal{G}_{4}caligraphic_G start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , caligraphic_G start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, we examine the impact of controlling for nodes identified using our method. We measure the causal effect of Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with and without controlling for the detected confounding variable and report the absolute difference between the true and estimated causal effects in Tab. 3. The results show that controlling for the variables identified by our method reduces the bias in the estimated causal effects.

Binary Data - Erdös-Rényi Causal Graphs: To verify the performance of our method on a large scale, similar to [38], we generate causal graphs of various number nodes using Erdös-Rényi model. In these experiments, each context is a result of intervention on one node. This is the reason for having the same value for number of nodes N𝑁Nitalic_N and number of contexts |C|𝐶|C|| italic_C |. Sample size denotes the number of data points used in each context. We detect and measure whether each pair of nodes is confounded or not. We then calculate the Precision, Recall, and F1 scores. Our confounding measures obtain good results across all settings.

Setting 1 Setting 2 Setting 3
N𝑁Nitalic_N, |C|𝐶|C|| italic_C | Sample Size Precision Recall F1 Precision Recall F1 Precision Recall F1
10 100 0.64 0.97 0.77 0.67 0.83 0.74 0.64 0.72 0.68
10 200 0.64 1.0 0.78 0.67 0.83 0.74 0.70 0.79 0.74
10 300 0.64 1.0 0.78 0.67 0.83 0.74 0.65 0.76 0.70
10 400 0.64 1.0 0.78 0.67 0.83 0.74 0.67 0.83 0.74
10 500 0.64 1.0 0.78 0.67 0.83 0.74 0.67 0.83 0.74
15 100 0.81 0.95 0.88 0.80 0.85 0.82 0.80 0.79 0.80
15 200 0.82 1.0 0.90 0.80 0.85 0.82 0.80 0.85 0.82
15 300 0.82 1.0 0.90 0.80 0.85 0.82 0.80 0.85 0.82
15 400 0.82 1.0 0.90 0.80 0.85 0.82 0.80 0.85 0.82
15 500 0.82 1.0 0.90 0.80 0.85 0.82 0.80 0.84 0.82
20 100 0.68 0.95 0.80 0.68 0.88 0.77 0.69 0.84 0.76
20 200 0.69 1.0 0.82 0.68 0.88 0.77 0.68 0.87 0.76
20 300 0.69 1.0 0.82 0.68 0.88 0.77 0.67 0.86 0.75
20 400 0.69 1.0 0.82 0.68 0.88 0.77 0.68 0.87 0.76
20 500 0.69 1.0 0.82 0.68 0.88 0.77 0.68 0.87 0.76
25 100 0.83 0.96 0.89 0.83 0.91 0.87 0.83 0.89 0.86
25 200 0.83 1.0 0.91 0.83 0.91 0.87 0.82 0.90 0.86
25 300 0.83 1.0 0.91 0.83 0.91 0.87 0.83 0.91 0.87
25 400 0.83 1.0 0.91 0.83 0.92 0.87 0.83 0.91 0.87
25 500 0.83 1.0 0.91 0.83 0.91 0.87 0.83 0.91 0.87
Table 4: Results on synthetic datasets for settings 1,2,3.

7 Conclusions, Limitations, and Future Work

In this paper, based on the known causal mechanism shifts of observed variables, we propose three measures of confounding along with their conditional and multivariate variants. We also study key properties of these measures. Our measures complement each other depending on the available context information. We propose algorithms to compute the proposed measures and empirically verify their correctness. However, for the same confounded pair of variables, our metrics may yield different results depending on the chosen measure. As discussed in the introduction, the measures are intended to assess the relative strengths of confounding rather than for point-to-point comparison. The number of contexts required to evaluate the measure can be large because many contexts without changes in particular mechanisms are discarded. Identifying appropriate real-world datasets and applying the proposed measures to those datasets is an interesting area for future work, as is developing measures that efficiently use context information. Additionally, devising new definitions for confounding and proposing corresponding confounding measures is also an interesting future direction. We aim to pursue these ideas.

Acknowledgments

This work was partly supported by the Prime Minister’s Research Fellowship (PMRF) program and an Adobe Research Gift. We are grateful to the anonymous reviewers for their valuable feedback, which improved the presentation of the paper.

References

  • Aldrich [1995] John Aldrich. Correlations genuine and spurious in pearson and yule. Statistical science, pages 364–376, 1995.
  • Bhattacharya et al. [2021] Rohit Bhattacharya, Tushar Nagarajan, Daniel Malinsky, and Ilya Shpitser. Differentiable causal discovery under unmeasured confounding. In International Conference on Artificial Intelligence and Statistics, pages 2314–2322, 2021.
  • Breslow et al. [1980] Norman E Breslow, Nicholas E Day, and Elisabeth Heseltine. Statistical methods in cancer research. 1980.
  • Brouillard et al. [2020] Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Differentiable causal discovery from interventional data. In Advances in Neural Information Processing Systems, volume 33, pages 21865–21877, 2020.
  • Budtz–Jørgensen et al. [2007] Esben Budtz–Jørgensen, Niels Keiding, Philippe Grandjean, and Pal Weihe. Confounder selection in environmental epidemiology: Assessment of health effects of prenatal mercury exposure. Annals of Epidemiology, 17(1):27–35, 2007.
  • Carey and Stiles [2016] Timothy A Carey and William B Stiles. Some problems with randomized controlled trials and some viable alternatives. Clinical Psychology & Psychotherapy, 23(1):87–95, 2016.
  • Chandrasekaran et al. [2010] Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. Latent variable graphical model selection via convex optimization. In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1610–1613. IEEE, 2010.
  • Chickering [2002] David Maxwell Chickering. Learning equivalence classes of bayesian-network structures. The Journal of Machine Learning Research, 2:445–498, 2002.
  • Colombo et al. [2012] Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294–321, 2012.
  • David et al. [2010] Shai Ben David, Tyler Lu, Teresa Luu, and Dávid Pál. Impossibility theorems for domain adaptation. In AISTATS, pages 129–136, 2010.
  • Diepen et al. [2023] Mirthe Maria Van Diepen, Ioan Gabriel Bucur, Tom Heskes, and Tom Claassen. Beyond the markov equivalence class: Extending causal discovery under latent confounding. In 2nd Conference on Causal Learning and Reasoning, 2023.
  • Eberhardt and Scheines [2007] Frederick Eberhardt and Richard Scheines. Interventions and causal inference. Philosophy of science, 74(5):981–995, 2007.
  • Eberhardt et al. [2012] Frederick Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. arXiv preprint arXiv:1207.1389, 2012.
  • Evans and Richardson [2019] Robin J Evans and Thomas S Richardson. Smooth, identifiable supermodels of discrete dag models with latent variables. Bernoulli, 25:848–876, 2019.
  • Greenland and Morgenstern [2001] Sander Greenland and Hal Morgenstern. Confounding in health research. Annual review of public health, 22(1):189–212, 2001.
  • Groenwold et al. [2009] R.H.H. Groenwold, E. Hak, and A.W. Hoes. Quantitative assessment of unobserved confounding is mandatory in nonrandomized intervention studies. Journal of Clinical Epidemiology, 62(1):22–28, 2009.
  • Guo et al. [2023] Siyuan Guo, Viktor Tóth, Bernhard Schölkopf, and Ferenc Huszár. Causal de finetti: On the identification of invariant causal structure in exchangeable data. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • Hammerton and Munafò [2021] Gemma Hammerton and Marcus R Munafò. Causal inference with observational data: the need for triangulation of evidence. Psychological medicine, 51(4):563–578, 2021.
  • Hauser and Bühlmann [2014] Alain Hauser and Peter Bühlmann. Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55(4):926–939, 2014.
  • Hoyer et al. [2008] Patrik O Hoyer, Shohei Shimizu, Antti J Kerminen, and Markus Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):362–378, 2008.
  • Huang et al. [2017] Biwei Huang, Kun Zhang, Jiji Zhang, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. Behind distribution shift: Mining driving forces of changes and causal arrows. In 2017 IEEE International Conference on Data Mining (ICDM), pages 913–918. IEEE, 2017.
  • Huang et al. [2020] Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. Causal discovery from heterogeneous/nonstationary data. Journal of Machine Learning Research, 21(89):1–53, 2020.
  • Jaber et al. [2020] Amin Jaber, Murat Kocaoglu, Karthikeyan Shanmugam, and Elias Bareinboim. Causal discovery from soft interventions with unknown targets: Characterization and learning. Advances in neural information processing systems, 33:9551–9561, 2020.
  • Janes et al. [2010] Holly Janes, Francesca Dominici, and Scott Zeger. On quantifying the magnitude of confounding. Biostatistics, 11(3):572–582, 2010.
  • Janzing and Schölkopf [2018] Dominik Janzing and Bernhard Schölkopf. Detecting confounding in multivariate linear models via spectral analysis. Journal of Causal Inference, 6(1):20170013, 2018.
  • Jesson et al. [2022] Andrew Jesson, Alyson Rose Douglas, Peter Manshausen, Maëlys Solal, Nicolai Meinshausen, Philip Stier, Yarin Gal, and Uri Shalit. Scalable sensitivity and uncertainty analyses for causal-effect estimates of continuous-valued interventions. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  • Kaltenpoth and Vreeken [2023a] David Kaltenpoth and Jilles Vreeken. Nonlinear causal discovery with latent confounders. In International Conference on Machine Learning, pages 15639–15654, 2023a.
  • Kaltenpoth and Vreeken [2023b] David Kaltenpoth and Jilles Vreeken. Causal discovery with hidden confounders using the algorithmic Markov condition. In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 1016–1026, 2023b.
  • Karlsson and Krijthe [2023] Rickard Karlsson and Jesse Krijthe. Detecting hidden confounding in observational data using multiple environments. Advances in Neural Information Processing Systems, 36, 2023.
  • Kleinbaum et al. [2007] David G Kleinbaum, Kevin M Sullivan, and Nancy D Barker. A pocket guide to epidemiology. Springer, 2007.
  • Ksir and Hart [2016] Charles Ksir and Carl L Hart. Correlation still does not imply causation. The Lancet Psychiatry, 3(5):401, 2016.
  • Lee [2014] Paul H Lee. Is a cutoff of 10% appropriate for the change-in-estimate criterion of confounder identification? Journal of epidemiology, 24(2):161–167, 2014.
  • Li et al. [2023] Adam Li, Amin Jaber, and Elias Bareinboim. Causal discovery from observational and interventional data across multiple environments. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • Maldonado and Greenland [1993] George Maldonado and Sander Greenland. Simulation study of confounder-selection strategies. American journal of epidemiology, 138(11):923–936, 1993.
  • Maldonado and Greenland [2002] George Maldonado and Sander Greenland. Estimating causal effects. International journal of epidemiology, 31(2):422–429, 2002.
  • Mameche et al. [2022] Sarah Mameche, David Kaltenpoth, and Jilles Vreeken. Discovering invariant and changing mechanisms from data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1242–1252, 2022.
  • Mameche et al. [2023] Sarah Mameche, David Kaltenpoth, and Jilles Vreeken. Learning causal models under independent changes. Advances in Neural Information Processing Systems, 36, 2023.
  • Mameche et al. [2024] Sarah Mameche, Jilles Vreeken, and David Kaltenpoth. Identifying confounding from causal mechanism shifts. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2024.
  • Miettinen and Cook [1981] Olli S Miettinen and E Francis Cook. Confounding: essence and detection. American journal of epidemiology, 114(4):593–603, 1981.
  • Mooij et al. [2020] Joris M Mooij, Sara Magliacane, and Tom Claassen. Joint causal inference from multiple contexts. Journal of Machine Learning Research, 21:1–108, 2020.
  • Nichols [2007] Austin Nichols. Causal inference with observational data. The Stata Journal, 7(4):507–541, 2007.
  • Ogarrio et al. [2016] Juan Miguel Ogarrio, Peter Spirtes, and Joe Ramsey. A hybrid causal search algorithm for latent variable models. In Proceedings of the Eighth International Conference on Probabilistic Graphical Models, pages 368–379, 2016.
  • Pang et al. [2016] Menglan Pang, Jay S Kaufman, and Robert W Platt. Studying noncollapsibility of the odds ratio with marginal structural and logistic regression models. Statistical methods in medical research, 25(5):1925–1937, 2016.
  • Pearl [2009] Judea Pearl. Causality. Cambridge university press, 2009.
  • Perry et al. [2022] Ronan Perry, Julius Von Kügelgen, and Bernhard Schölkopf. Causal discovery in heterogeneous environments under the sparse mechanism shift hypothesis. In Advances in Neural Information Processing Systems, pages 10904–10917, 2022.
  • Peters et al. [2014] Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research, 15(1):2009–2053, 2014.
  • Peters et al. [2017] Jonas Peters, Dominik Janzing, and Bernhard Schlkopf. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017.
  • Raginsky [2011] Maxim Raginsky. Directed information and pearl’s causal calculus. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 958–965, 2011.
  • Richardson et al. [2023] Thomas S Richardson, Robin J Evans, James M Robins, and Ilya Shpitser. Nested markov properties for acyclic directed mixed graphs. The Annals of Statistics, 51(1):334–361, 2023.
  • Sanson-Fisher et al. [2007] Robert William Sanson-Fisher, Billie Bonevski, Lawrence W. Green, and Cate D’Este. Limitations of the randomized controlled trial in evaluating population-based health interventions. American Journal of Preventive Medicine, 33(2):155–161, 2007.
  • Scanagatta et al. [2015] Mauro Scanagatta, Cassio P de Campos, Giorgio Corani, and Marco Zaffalon. Learning bayesian networks with thousands of variables. Advances in neural information processing systems, 28, 2015.
  • Schölkopf et al. [2012] B Schölkopf, D Janzing, J Peters, E Sgouritsa, K Zhang, and J Mooij. On causal and anticausal learning. In 29th International Conference on Machine Learning (ICML 2012), pages 1255–1262. International Machine Learning Society, 2012.
  • Schölkopf et al. [2021] Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
  • Schuster et al. [2021] Noah A Schuster, Jos WR Twisk, Gerben Ter Riet, Martijn W Heymans, and Judith JM Rijnhart. Noncollapsibility and its role in quantifying confounding bias in logistic regression. BMC medical research methodology, 21:1–9, 2021.
  • Shanmugam et al. [2015] Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G Dimakis, and Sriram Vishwanath. Learning causal graphs with small interventions. Advances in Neural Information Processing Systems, 28, 2015.
  • Shpitser et al. [2014] Ilya Shpitser, Robin J Evans, Thomas S Richardson, and James M Robins. Introduction to nested markov models. Behaviormetrika, 41:3–39, 2014.
  • Shpitser et al. [2018] Ilya Shpitser, Robin J Evans, and Thomas S Richardson. Acyclic linear sems obey the nested markov property. In Uncertainty in artificial intelligence: proceedings of the… conference. Conference on Uncertainty in Artificial Intelligence, volume 2018. NIH Public Access, 2018.
  • Simpson [1951] Edward H Simpson. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), 13(2):238–241, 1951.
  • Spirtes and Zhang [2016] Peter Spirtes and Kun Zhang. Causal discovery and inference: concepts and recent methodological advances. In Applied informatics, volume 3, pages 1–28. Springer, 2016.
  • Spirtes et al. [2000] Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search. 2000.
  • Tan [2006] Zhiqiang Tan. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101(476):1619–1637, 2006.
  • VanderWeele and Shpitser [2013] Tyler J VanderWeele and Ilya Shpitser. On the definition of a confounder. Annals of statistics, 41(1):196, 2013.
  • Wang and Drton [2023] Y. Samuel Wang and Mathias Drton. Causal discovery with unobserved confounding and non-gaussian data. Journal of Machine Learning Research, 24(271):1–61, 2023.
  • Wieczorek and Roth [2019] Aleksander Wieczorek and Volker Roth. Information theoretic causal effect quantification. Entropy, 21(10), 2019.
  • Zanga et al. [2022] Alessio Zanga, Elif Ozkirimli, and Fabio Stella. A survey on causal discovery: Theory and practice. International Journal of Approximate Reasoning, 151:101–129, 2022.
  • Zheng et al. [2018] Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.

Appendix

Appendix A Proofs

See 4.1

Proof.

Since the set of contexts 𝐂{i}Pijsubscript𝐂𝑖subscript𝑃𝑖𝑗\mathbf{C}_{\{i\}\wedge P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT consist of data with all possible interventions on Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, if a context c𝑐citalic_c is generated by performing intervention on Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with the value xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the expression (Xj|do(Xi=xi))conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖subscript𝑥𝑖\mathbb{P}(X_{j}|do(X_{i}=x_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) is equal to the expression (Xj|Xi=xi)conditionalsubscript𝑋𝑗subscript𝑋𝑖subscript𝑥𝑖\mathbb{P}(X_{j}|X_{i}=x_{i})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) in that context c𝑐citalic_c.

From Defn. 4.4, to detect and measure confounding between the pair of variables Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we need to evaluate (Xj|do(Xi))conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) and (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ). To this end, from the previous paragraph, we need two sets of contexts 𝐂{i}Pijsubscript𝐂𝑖subscript𝑃𝑖𝑗\mathbf{C}_{\{i\}\wedge P_{ij}}bold_C start_POSTSUBSCRIPT { italic_i } ∧ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝐂{j}Pjisubscript𝐂𝑗subscript𝑃𝑗𝑖\mathbf{C}_{\{j\}\wedge P_{ji}}bold_C start_POSTSUBSCRIPT { italic_j } ∧ italic_P start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Following these observations, it is enough to have n𝑛nitalic_n sets of contexts to detect and measure confounding between (n2)binomial𝑛2\binom{n}{2}( FRACOP start_ARG italic_n end_ARG start_ARG 2 end_ARG ) distinct pairs of nodes. ∎

See 4.1

Proof.

Consider three variables Xi,Xj,Xksubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘X_{i},X_{j},X_{k}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in the underlying causal graph. Consider the conditional directed information between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT given Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the subsequent manipulations as follows.

I(XiXj|Xk)𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑘\displaystyle I(X_{i}\rightarrow X_{j}|X_{k})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) 𝔼(Xi,Xj,Xk)log(Xi|Xj,Xk)(Xi|do(Xj),Xk)absentsubscript𝔼subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘conditionalsubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗subscript𝑋𝑘\displaystyle\coloneqq\mathbb{E}_{\mathbb{P}(X_{i},X_{j},X_{k})}\log\frac{% \mathbb{P}(X_{i}|X_{j},X_{k})}{\mathbb{P}(X_{i}|do(X_{j}),X_{k})}≔ blackboard_E start_POSTSUBSCRIPT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_log divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG
=𝔼(Xi,Xj,Xk)log((Xi,Xk|Xj)(Xi,Xk|do(Xj))×(Xk|do(Xj))(Xk|Xj))absentsubscript𝔼subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘subscript𝑋𝑖conditionalsubscript𝑋𝑘subscript𝑋𝑗subscript𝑋𝑖conditionalsubscript𝑋𝑘𝑑𝑜subscript𝑋𝑗conditionalsubscript𝑋𝑘𝑑𝑜subscript𝑋𝑗conditionalsubscript𝑋𝑘subscript𝑋𝑗\displaystyle=\mathbb{E}_{\mathbb{P}(X_{i},X_{j},X_{k})}\log\left(\frac{% \mathbb{P}(X_{i},X_{k}|X_{j})}{\mathbb{P}(X_{i},X_{k}|do(X_{j}))}\times\frac{% \mathbb{P}(X_{k}|do(X_{j}))}{\mathbb{P}(X_{k}|X_{j})}\right)= blackboard_E start_POSTSUBSCRIPT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_log ( divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) end_ARG × divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG )
=𝔼(Xi,Xj,Xk)log(Xi,Xk|Xj)(Xi,Xk|do(Xj))𝔼(Xj,Xk)log(Xk|Xj)(Xk|do(Xj))absentsubscript𝔼subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘subscript𝑋𝑖conditionalsubscript𝑋𝑘subscript𝑋𝑗subscript𝑋𝑖conditionalsubscript𝑋𝑘𝑑𝑜subscript𝑋𝑗subscript𝔼subscript𝑋𝑗subscript𝑋𝑘conditionalsubscript𝑋𝑘subscript𝑋𝑗conditionalsubscript𝑋𝑘𝑑𝑜subscript𝑋𝑗\displaystyle=\mathbb{E}_{\mathbb{P}(X_{i},X_{j},X_{k})}\log\frac{\mathbb{P}(X% _{i},X_{k}|X_{j})}{\mathbb{P}(X_{i},X_{k}|do(X_{j}))}-\mathbb{E}_{\mathbb{P}(X% _{j},X_{k})}\log\frac{\mathbb{P}(X_{k}|X_{j})}{\mathbb{P}(X_{k}|do(X_{j}))}= blackboard_E start_POSTSUBSCRIPT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_log divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) end_ARG - blackboard_E start_POSTSUBSCRIPT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_log divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) end_ARG
=I({XiXk}Xj)I(XkXj)absent𝐼subscript𝑋𝑖subscript𝑋𝑘subscript𝑋𝑗𝐼subscript𝑋𝑘subscript𝑋𝑗\displaystyle=I(\{X_{i}X_{k}\}\rightarrow X_{j})-I(X_{k}\rightarrow X_{j})= italic_I ( { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_I ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )

Since I(XkXj)0,𝐼subscript𝑋𝑘subscript𝑋𝑗0I(X_{k}\rightarrow X_{j})\geq 0,italic_I ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≥ 0 , we have I(XiXj|Xk)I({XiXk}Xj)𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑘𝐼subscript𝑋𝑖subscript𝑋𝑘subscript𝑋𝑗I(X_{i}\rightarrow X_{j}|X_{k})\leq I(\{X_{i}X_{k}\}\rightarrow X_{j})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ italic_I ( { italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Equality holds only when Xk,Xjsubscript𝑋𝑘subscript𝑋𝑗X_{k},X_{j}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are unconfounded. ∎

See 4.2

Proof.

Reflexivity: From the definition of directed information, I(XiXi|Xo)=𝔼(Xi,Xj,Xo)log(Xi|Xo)(Xi|Xo)=0𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑖subscript𝑋𝑜subscript𝔼subscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑜𝑙𝑜𝑔conditionalsubscript𝑋𝑖subscript𝑋𝑜conditionalsubscript𝑋𝑖subscript𝑋𝑜0I(X_{i}\rightarrow X_{i}|X_{o})=\mathbb{E}_{\mathbb{P}(X_{i},X_{j},X_{o})}log% \frac{\mathbb{P}(X_{i}|X_{o})}{\mathbb{P}(X_{i}|X_{o})}=0italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = blackboard_E start_POSTSUBSCRIPT blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_l italic_o italic_g divide start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG start_ARG blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_ARG = 0 and hence CNF1(Xi,Xj|Xo)=1e0=0𝐶𝑁𝐹1subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜1superscript𝑒00CNF\mathchar 45\relax 1(X_{i},X_{j}|X_{o})=1-e^{0}=0italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = 1 - italic_e start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = 0.

Symmetry: Even if I(XiXj|Xo)𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜I(X_{i}\rightarrow X_{j}|X_{o})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) is not symmetric, the expression ‘min(I(XiXj|Xo),I(XjXi|Xo))𝐼subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜𝐼subscript𝑋𝑗conditionalsubscript𝑋𝑖subscript𝑋𝑜\min(I(X_{i}\rightarrow X_{j}|X_{o}),I(X_{j}\rightarrow X_{i}|X_{o}))roman_min ( italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) , italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) )’ is symmetric and hence CNF1(Xi,Xj|Xo)𝐶𝑁𝐹1subscript𝑋𝑖conditionalsubscript𝑋𝑗subscript𝑋𝑜CNF\mathchar 45\relax 1(X_{i},X_{j}|X_{o})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) is symmetric.

Positivity: If Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded, irrespective of the direction of the causal path between Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we have (Xi|Xj)(Xi|do(Xj))conditionalsubscript𝑋𝑖subscript𝑋𝑗conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|X_{j})\neq\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ≠ blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) and (Xj|Xi)(Xj|do(Xi))conditionalsubscript𝑋𝑗subscript𝑋𝑖conditionalsubscript𝑋𝑗𝑑𝑜subscript𝑋𝑖\mathbb{P}(X_{j}|X_{i})\neq\mathbb{P}(X_{j}|do(X_{i}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≠ blackboard_P ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). Hence I(XiXj)>0𝐼subscript𝑋𝑖subscript𝑋𝑗0I(X_{i}\rightarrow X_{j})>0italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > 0 and I(XjXi)>0𝐼subscript𝑋𝑗subscript𝑋𝑖0I(X_{j}\rightarrow X_{i})>0italic_I ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > 0. We now have CNF1(Xi,Xj)>0𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗0CNF\mathchar 45\relax 1(X_{i},X_{j})>0italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > 0. The above statement is true even if there is no causal path between the nodes Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The above statements are valid even after conditioning on an observed confounding variable Xosubscript𝑋𝑜X_{o}italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT if there is an unobserved confounding between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Monotonicity: Without loss of generality, assume that the inequality CNF1(Xi,Xj)>CNF1(Xk,Xl)𝐶𝑁𝐹1subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹1subscript𝑋𝑘subscript𝑋𝑙CNF\mathchar 45\relax 1(X_{i},X_{j})>CNF\mathchar 45\relax 1(X_{k},X_{l})italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_C italic_N italic_F - 1 ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) is a result of I(XiXj)>I(XkXl)𝐼subscript𝑋𝑖subscript𝑋𝑗𝐼subscript𝑋𝑘subscript𝑋𝑙I(X_{i}\rightarrow X_{j})>I(X_{k}\rightarrow X_{l})italic_I ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_I ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ). That is, the KL divergence between (Xi|Xj)conditionalsubscript𝑋𝑖subscript𝑋𝑗\mathbb{P}(X_{i}|X_{j})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) is greater than the kl divergence between (Xk|Xl)conditionalsubscript𝑋𝑘subscript𝑋𝑙\mathbb{P}(X_{k}|X_{l})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) and (Xk|do(Xl))conditionalsubscript𝑋𝑘𝑑𝑜subscript𝑋𝑙\mathbb{P}(X_{k}|do(X_{l}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ). That is, the pair of distributions (Xk|Xl)conditionalsubscript𝑋𝑘subscript𝑋𝑙\mathbb{P}(X_{k}|X_{l})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) and (Xk|do(Xl))conditionalsubscript𝑋𝑘𝑑𝑜subscript𝑋𝑙\mathbb{P}(X_{k}|do(X_{l}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ) are closer to each other compared to the pair (Xi|Xj)conditionalsubscript𝑋𝑖subscript𝑋𝑗\mathbb{P}(X_{i}|X_{j})blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and (Xi|do(Xj))conditionalsubscript𝑋𝑖𝑑𝑜subscript𝑋𝑗\mathbb{P}(X_{i}|do(X_{j}))blackboard_P ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_d italic_o ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ). As a result, Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are closer to being not confounded in the sense of Defns. 4.2 and 4.3. ∎

See 4.2

Proof.

There are two sources of dependency between EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. If Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are causally related in the underlying causal model generating the data, there will be a dependency between EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the context C{i}{j}subscript𝐶𝑖𝑗C_{\{i\}\wedge\{j\}}italic_C start_POSTSUBSCRIPT { italic_i } ∧ { italic_j } end_POSTSUBSCRIPT as the interventions are soft. On the other hand, as per the Assumption 4.1, any shift in the causal mechanism of Z𝑍Zitalic_Z leads to a change in both the mechanisms of Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT leading to a dependency. Hence the random variables EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have non-zero mutual information. ∎

See 4.3

Proof.

Following the Assumption 4.2, when three variables Xi,Xj,Xksubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘X_{i},X_{j},X_{k}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are confounded by as single confounding variable Z𝑍Zitalic_Z, conditioning on one of EiC,EjC,EkCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗subscriptsuperscript𝐸𝐶𝑘E^{C}_{i},E^{C}_{j},E^{C}_{k}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT explains away some of the dependency between other two. Hence we have I(EiC;EjC|EkC)<I(EiC;EjC)𝐼superscriptsubscript𝐸𝑖𝐶conditionalsuperscriptsubscript𝐸𝑗𝐶subscriptsuperscript𝐸𝐶𝑘𝐼subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗I(E_{i}^{C};E_{j}^{C}|E^{C}_{k})<I(E^{C}_{i};E^{C}_{j})italic_I ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ; italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) < italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) for all triples i,j,k𝑖𝑗𝑘i,j,kitalic_i , italic_j , italic_k. ∎

See 4.4

Proof.

Reflexivity: from the definition of mutual information, I(EiC;EiC|Xo)=H(EiC|Xo)H(EiC|EiC,Xo)=H(EiC|Xo)𝐼subscriptsuperscript𝐸𝐶𝑖conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜I(E^{C}_{i};E^{C}_{i}|X_{o})=H(E^{C}_{i}|X_{o})-H(E^{C}_{i}|E^{C}_{i},X_{o})=H% (E^{C}_{i}|X_{o})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) - italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ). Substituting in the definition of CNF2(Xi,Xj)𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 2(X_{i},X_{j})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), result follows.

Symmetry: The result follows from the ‘symmetry’ property of mutual information.

Positivity: If Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded, from the Assumption 4.1, EiC,EjCsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{i},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are dependent random variables. Hence the mutual information is positive. The result follows after substituting some positive value for I(EiC;EjC)𝐼subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗I(E^{C}_{i};E^{C}_{j})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) in the definition of CNF2(Xi,Xj)𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 2(X_{i},X_{j})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). The same argument goes for conditional confounding.

Monotonicity: from the definition of CNF2(Xi,Xj)𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 2(X_{i},X_{j})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), CNF2(Xi,Xj)>CNF2(Xk,Xl)𝐶𝑁𝐹2subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹2subscript𝑋𝑘subscript𝑋𝑙CNF\mathchar 45\relax 2(X_{i},X_{j})>CNF\mathchar 45\relax 2(X_{k},X_{l})italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_C italic_N italic_F - 2 ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) implies I(EiC;EjC)>I(EkC;ElC)𝐼subscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑗𝐼subscriptsuperscript𝐸𝐶𝑘subscriptsuperscript𝐸𝐶𝑙I(E^{C}_{i};E^{C}_{j})>I(E^{C}_{k};E^{C}_{l})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ). From the Defn. 4.2, Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have higher mutual information than the pair Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and hence Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are more strongly confounded than Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. ∎

See 4.5

Proof.

Following the Assumption 4.2, when three variables Xi,Xj,Xksubscript𝑋𝑖subscript𝑋𝑗subscript𝑋𝑘X_{i},X_{j},X_{k}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are confounded by as single confounding variable Z𝑍Zitalic_Z, conditioning on EkCsubscriptsuperscript𝐸𝐶𝑘E^{C}_{k}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT explains away some of the dependency between EijC,EjkCsuperscriptsubscript𝐸𝑖𝑗𝐶superscriptsubscript𝐸𝑗𝑘𝐶E_{ij}^{C},E_{jk}^{C}italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT. Hence we have I(EijC;EjkC|EjC)<I(EijC;EjkC)𝐼superscriptsubscript𝐸𝑖𝑗𝐶conditionalsuperscriptsubscript𝐸𝑗𝑘𝐶subscriptsuperscript𝐸𝐶𝑗𝐼subscriptsuperscript𝐸𝐶𝑖𝑗subscriptsuperscript𝐸𝐶𝑗𝑘I(E_{ij}^{C};E_{jk}^{C}|E^{C}_{j})<I(E^{C}_{ij};E^{C}_{jk})italic_I ( italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT ; italic_E start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) < italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ) for all triples i,j,k𝑖𝑗𝑘i,j,kitalic_i , italic_j , italic_k. ∎

See 4.6

Proof.

Reflexivity: from the definition of mutual information, I(EiiC;EiC|Xo)=I(EiC;EiC|Xo)=H(EiC|Xo)H(EiC|EiC,Xo)=H(EiC|Xo)𝐼subscriptsuperscript𝐸𝐶𝑖𝑖conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐼subscriptsuperscript𝐸𝐶𝑖conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜𝐻conditionalsubscriptsuperscript𝐸𝐶𝑖subscript𝑋𝑜I(E^{C}_{ii};E^{C}_{i}|X_{o})=I(E^{C}_{i};E^{C}_{i}|X_{o})=H(E^{C}_{i}|X_{o})-% H(E^{C}_{i}|E^{C}_{i},X_{o})=H(E^{C}_{i}|X_{o})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) - italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) = italic_H ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ). Substituting in the definition of CNF3(Xi,Xj)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), result follows.

Symmetry: Since we rely on the direction of the causal path between Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, for a given pair of nodes Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we have CNF3(Xi,Xj)=CNF3(Xj,Xi)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹3subscript𝑋𝑗subscript𝑋𝑖CNF\mathchar 45\relax 3(X_{i},X_{j})=CNF\mathchar 45\relax 3(X_{j},X_{i})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) from Defn. 4.7.

Positivity: If Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are confounded and XiXjsubscript𝑋𝑖subscript𝑋𝑗X_{i}\rightarrow X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, from the Assumption 4.1, EjiC,EjCsubscriptsuperscript𝐸𝐶𝑗𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{ji},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are dependent random variables. Hence the mutual information EjiC,EjCsubscriptsuperscript𝐸𝐶𝑗𝑖subscriptsuperscript𝐸𝐶𝑗E^{C}_{ji},E^{C}_{j}italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is positive. The result follows after substituting positive value for I(EjiC;EjC)𝐼subscriptsuperscript𝐸𝐶𝑗𝑖subscriptsuperscript𝐸𝐶𝑗I(E^{C}_{ji};E^{C}_{j})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) in the definition of CNF3(Xi,Xj)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). The same argument goes for conditional confounding.

Monotonicity: from the definition of CNF3(Xi,Xj)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗CNF\mathchar 45\relax 3(X_{i},X_{j})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), without loss of generality, CNF3(Xi,Xj)>CNF3(Xk,Xl)𝐶𝑁𝐹3subscript𝑋𝑖subscript𝑋𝑗𝐶𝑁𝐹3subscript𝑋𝑘subscript𝑋𝑙CNF\mathchar 45\relax 3(X_{i},X_{j})>CNF\mathchar 45\relax 3(X_{k},X_{l})italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_C italic_N italic_F - 3 ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) implies I(EjiC;EjC)>I(ElkC;ElC)𝐼subscriptsuperscript𝐸𝐶𝑗𝑖subscriptsuperscript𝐸𝐶𝑗𝐼subscriptsuperscript𝐸𝐶𝑙𝑘subscriptsuperscript𝐸𝐶𝑙I(E^{C}_{ji};E^{C}_{j})>I(E^{C}_{lk};E^{C}_{l})italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) > italic_I ( italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_k end_POSTSUBSCRIPT ; italic_E start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ). From the Defn. 4.2, Xi,Xjsubscript𝑋𝑖subscript𝑋𝑗X_{i},X_{j}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have higher mutual information and hence are more strongly confounded than Xk,Xlsubscript𝑋𝑘subscript𝑋𝑙X_{k},X_{l}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. ∎

Appendix B Real-world Examples

Refer to caption
Figure 4: Two real-world examples where our method can be applied. Here Pro: Production Volume, Exp: Exports, Lab: Total Labor Required, Edu: Education, Wag: Wages, Inv: Investments. We can perform interventions on the above variables and any combination thereof to obtain context-specific data. We can use such data to identify and measure confounding by applying our methods.