Detecting and Measuring Confounding Using Causal Mechanism Shifts
Abstract
Detecting and measuring confounding effects from data is a key challenge in causal inference. Existing methods frequently assume causal sufficiency, disregarding the presence of unobserved confounding variables. Causal sufficiency is both unrealistic and empirically untestable. Additionally, existing methods make strong parametric assumptions about the underlying causal generative process to guarantee the identifiability of confounding variables. Relaxing the causal sufficiency and parametric assumptions and leveraging recent advancements in causal discovery and confounding analysis with non-i.i.d. data, we propose a comprehensive approach for detecting and measuring confounding. We consider various definitions of confounding and introduce tailored methodologies to achieve three objectives: (i) detecting and measuring confounding among a set of variables, (ii) separating observed and unobserved confounding effects, and (iii) understanding the relative strengths of confounding bias between different sets of variables. We present useful properties of a confounding measure and present measures that satisfy those properties. Empirical results support the theoretical analysis.
1 Introduction
Understanding the underlying causal generative process of a set of variables is crucial in many scientific studies for applications in treatment and policy designs [44]. While randomized controlled trials (RCTs) and causal inference through active interventions are ideal choices for understanding the underlying causal model [19, 12, 13, 55], RCTs and/or active interventions are often impossible/infeasible, and some times unethical [50, 6]. Research efforts in causal inference hence rely on observational data to study causal relationships [44, 59, 65, 18, 41]. However, recovering the underlying causal model purely from observational data is challenging without further assumptions; this challenge is further exacerbated in the presence of unmeasured confounding variables.
A confounding variable is a variable that causes two other variables, resulting in a spurious association between those two variables. As exemplified with Simpson’s paradox [58] and many other studies [20, 1, 31], the presence of confounding variables is an important quantitative explanation for why correlation does not imply causation. It is challenging to observe and measure all confounding variables in a scientific study [60, 44]. Identifying latent or unobserved confounding variables is even more challenging, and misinterpretation presents various challenges in downstream applications, such as discovering causal structures from observational data. Numerous methods operate under the assumption of causal sufficiency [45, 4, 60, 8, 51, 65], implying the non-existence of unobserved confounding variables. Causal sufficiency presupposes that all pertinent variables required for causal inference have been observed. However, this may not be a practical or testable assumption.
The study of confounding has various applications, chief among them being causal discovery - identifying the causal relationships among variables [38, 40, 63]. It is also useful for determining whether a set of observed confounding variables is sufficient to adjust for estimating causal effects [29], measuring the extent to which statistical correlation between variables can be attributed to confounding [24, 25, 62], and verifying the comparability of treatment and control groups in non-randomized interventional studies [16].
A fundamental problem in causal inference tasks lies in detecting hidden confounding variables from observational data alone. However, this is non-trivial and poses various challenges. For example, a key issue is that given a marginal distribution over observed variables, there are infinitely many joint distributions corresponding to causal graphs involving unobserved variables [56]. To tackle such challenges, recent endeavors show that using data from different environments helps in improved causal discovery [40, 38, 33, 45, 23], detecting causal mechanism shifts [36], and detecting unobserved confounding [29, 38]. However, such recent efforts often subsume confounding detection under causal discovery, focusing primarily on identifying confounding factors while overlooking other useful information, such as the relative strength of confounding between variable sets and the distinction between observed and unobserved confounding within a variable set. We seek to address these gaps in this work.
We focus exclusively on the problem of studying confounding from multiple perspectives, including (i) detecting and measuring confounding among a set of variables, (ii) assessing the relative strengths of confounding among different sets of observed variables, and (iii) distinguishing between observed and unobserved confounding among a set of variables. The primary focus of causal inference often lies in verifying the presence or absence of confounding rather than determining the exact value of the measured confounding. However, we leverage the measured confounding to assess the relative strengths of confounding between sets of variables. To achieve the above objectives, we utilize data from various contexts, where each context results from shifts in the causal mechanisms of a set of variables [38, 45]. This allows us to propose different measures of confounding based on the available context information. Our contributions can be summarized as follows.
-
•
For various definitions of confounding, we propose corresponding measures of confounding and present useful properties of the proposed measures. To our knowledge, this is the first comprehensive study that examines various aspects of observed and unobserved confounding using data from multiple contexts without making parametric or causal sufficiency assumptions.
-
•
We study pair-wise confounding, confounding among multiple variables, how to separate unobserved confounding from overall confounding, and present ways to assess relative confounding.
-
•
We present an algorithm for detecting and measuring confounding using data from multiple contexts. Experimental results are performed to verify theoretical analysis.
2 Related Work
The study of confounding has typically been embedded as part of causal discovery algorithms in most existing work. Causal discovery methods can be categorized according to several criteria, including the type of data utilized (observational versus interventional/experimental), parametric versus non-parametric approaches, or whether they relax causal sufficiency assumptions [65, 59]. Considering our focus in this work on studying confounding comprehensively by going beyond observed confounding variables, we discuss literature that are directed towards methods that relax the causal sufficiency assumption and rely on experimental data.
Causal Discovery via Observational Data, Relaxing Causal Sufficiency: Constraint based causal discovery algorithms produce equivalence class of graphs that satisfy a set of conditional independence constraints [60, 11, 9, 42]. Other methods such as [2, 28, 27] reduce the problem complexity by assuming a parametric form of the underlying causal model (e.g., variables are jointly Gaussian in Chandrasekaran et al. [7]), thereby returning unique causal graphs. Nested Markov Models (NMMs) [56, 57, 49, 14] allow identifiability of causal models with latent factors by using (pairwise) Verma constraints. A recent approach using differentiable causal discovery [2] combines NMMs with the differentiable constraint [66] to discover a partially directed causal network and likely confounded nodes. Unlike these methods, our focus in this work is on detecting and measuring confounding under various settings, instead of recovering the entire causal graph or equivalence class.
Causal Discovery Using Data From Multiple Environments: Given access to a set of observed confounding variables, very recent work [29] presented testable conditional independence tests that are violated only when there is unobserved confounding. However, their analysis is focused towards the downstream causal effect estimation. We aim to provide a unified framework for studying and measuring confounding under different types of contextual information available. Other methods [33, 23] learn an equivalence class of graphs when data from observational and interventional distribution are available. Confounding has also shown to be detected in linear models with non-Gaussian variables [20]. In linear models, a spectral analysis method was proposed in [25] to understand to what extent the statistical correlation between a set of variables on a target variable can be attributed to confounding. See Tab. 4 of [40] for an overview of causal discovery methods that use data from multiple environments or contexts. Under the specific assumptions of causal sufficiency and sparse mechanism shift, a method was proposed in [45] to reduce the size of a given Markov equivalence class using mechanism shift score. A differentiable causal discovery method was proposed in [4] to use interventional data to recover interventional Markov equivalence class. While these methods use data from different contexts, they assume the absence of unobserved confounding variables; we instead focus on capturing both observed and unobserved confounding.
Measuring and Interpreting Confounding: Earlier efforts in the field have studied different measures for observed confounding, each tailored to address specific challenges [44, 15, 35, 3, 39, 30, 43, 34]. Such measures have also been refined to address specific issues [24, 54]; for e.g., a method to correct the non-linearity effect present in confounding estimates via the exposure–outcome association with and without adjustment for confounding was proposed in [24]. In contrast, we measure the effects of both observed and unobserved confounding. Motivated from the ignorability property in potential outcomes framework [61, 26], the divergence between nominal and complete propensity density has been considered as an indicative of hidden confounding [26]. To the best of our knowledge, the efforts closest to ours are [38, 40], which study confounding using data from multiple contexts without the causal sufficiency assumption. However, they do not measure confounding and detect confounding only as a step to discover the causal graph. Ours is a more general framework for studying and measuring confounding from multiple perspectives.
In regression models, certain difference thresold between the coefficients of treatment variable before and after adjusting for the possible confounding is considered as the indication for the presence of confounding. This process of choosing a threshold is also called change-in-estimate criterion. Typical threshold used in literature is [54, 32, 5].
3 Background and Problem Setup
Let be a set of observed variables and be a set of unobserved or latent variables. The values of can be real, discrete, or mixed. Let be the underlying directed acyclic graph (DAG) among the variables . Directed edges among the variables in indicate direct causal influences. Assume that the set of unobserved variables are jointly independent and are exogenous to (i.e., ). In this setting, any two nodes sharing a common parent are said to be confounded, and is said to be a confounding variable. For a node denotes the set of parents of .
For a node , is called the causal mechanism of . The causal mechanism encodes how the variable is influenced by its parents . Following earlier work [22, 38, 45, 21, 46, 52], we make the following general assumption about the underlying causal mechanisms of data.
Assumption 3.1.
Identifying confounding from only observational data is challenging without further assumptions [28]. Hence, following earlier work [38, 29, 40], we assume that the data over the variables is observed over multiple contexts or environments. While there are various ways of formulating/constructing contexts, in this paper, we assume that each context is created as a result of either hard (a.k.a. structural) interventions or soft (a.k.a. parametric) interventions on a subset of variables where is a set of indices. Performing hard intervention on a variable is the same as setting the value of to a value . Hard intervention on a variable removes the influence of its parents on . Performing soft intervention on a variable is the same as changing the causal mechanism of , , with a new causal mechanism . Soft intervention on a variable does not remove the influence of its parents on . The idea of explicitly considering context information and using different contexts as context variables to create extended causal graphs has been studied in the literature. Context variables are also called as policy variables, decision variables, regime variables, domain variables, environment variables, etc. [40, 45, 17, 22].
Let be the set of contexts and let , denotes the probability distribution of the observed variables in the context . Let , where are sets of indices, be the set of contexts in which we observe mechanism changes for the set of variables . Similarly, let be the set of contexts in which we observe mechanism changes for the set of variables but not for the variables . We say that the causal mechanism of a variable changes between two contexts if . Given the data over observed variables in each context, there exist methods for detecting mechanism shifts of each variable between the contexts [36, 38, 45, 37]. For example, the where is the set of observed parents of can be used to detect mechanism change for between the contexts [38, 36]. Hence, we focus on detecting and measuring confounding among a set of variables, assuming that the causal mechanism shifts are observed among that set of variables.
Context information is not very useful if there is no restriction on how causal mechanisms are changed between the contexts [45, 38]. For example, the causal mechanisms of and both differing across all (or no) contexts would trivially satisfy Assumption 3.1, but reveal no information about the underlying causal mechanisms [10, 38]. Hence, following earlier work [45, 38, 17], we make the following assumptions.
Assumption 3.2.
(Sparse Causal Mechanism Shift [53]) Causal mechanisms of variables change sparsely across contexts, i.e., if , then .
Assumption 3.2 implies that the causal mechanisms change infrequently across contexts. This assumption is more general because, in many scientific studies, for any given context, interventions typically affect only a few variables [53].
Assumption 3.3.
(Markov Property under Mechanism Shifts [17]) The distribution is given by . In other words, variables are assumed to be conditionally exchangeable, so that the same graph applies in every context .
Assumption 3.4.
(Causal Sufficiency Over ) All common parents of any pair of observed nodes belong to the set . In other words, all relevant variables for detecting confounding and the unobserved confounding variables are already present in .
Problem Statement: Given data over the observed variables in multiple contexts, each context resulting from a sparse causal mechanism shift of variables in , (i) can we identify which pairs or sets of variables are confounded and can we measure the confounding strength? (ii) can we isolate the confounding effects of observed and unobserved confounding variables? and (iii) can we study the relative strengths of confounding among different sets of variables?
To address the above problem, in the next section, we consider various definitions of confounding and present appropriate confounding measures depending on the context information available.
4 Detecting and Measuring Confounding
Settings | Confounding Definition | Required Context | Type of | |
---|---|---|---|---|
Based On | Information | Intervention | ||
1 | Directed Information [48] & | Hard / Structural | ||
Noncollapsibility [15, 43, 54] | ||||
2 & 3 | Mutual Information | Soft / Parametric |
In this section, we present methods for detecting and measuring confounding for various scenarios in which shifts in causal mechanisms are observed. Considering any three observed variables and an unobserved confounding variable , we present measures of confounding depending on the information about mechanism shifts of . Each of the following subsections includes: (i) a definition of confounding, (ii) a corresponding definition of the confounding measure, (iii) a method for isolating the unobserved confounding measure from the overall confounding, (iv) an extension of the confounding measure to more than two variables, and (v) key properties of the proposed confounding measures. See Tab. 1 and Fig. 1 for an overview.
4.1 Setting 1: Measuring Confounding Using Directed Information Between .
In this setting, we use the fact that directed information does not vanish in the presence of a confounding variable [64, 48]. To this end, we leverage the interventional effects of on each other to define a measure of confounding.
Definition 4.1.
(Directed Information [48]). The directed information from to is defined as the conditional Kullback-Leibler divergence between the distributions . That is:
(1) |
Definition 4.2.
(No Confounding [44]) When measuring the causal effect of a (treatment) variable on a (target) variable , the ordered pair is unconfounded if and only if the directed information from to : is zero. Equivalently, .
A similar definition of confounding that relates the conditional distribution and interventional distribution is defined as follows.
Definition 4.3.
From Defns. 4.1 and 4.2, for a pair of variables , observing and implies that and hence the presence of confounding (see Tab. 2). Using the above properties of directed information, we measure confounding as follows.
Definition 4.4.
Graph | |||
---|---|---|---|
Uncnf. | |||
Confounded |
For all the confounding measures, we use exponential transformation to limit the range of the measure between and . Note that in a DAG, one of is zero under no confounding (see Tab. 2 for a simple example with two and three node graphs). Hence outputs zero when there is no confounding between . Similarly outputs positive real value in the range when there is confounding. We leverage data from multiple contexts to evaluate and as follows. In this setting, we assume each context is generated as a result of hard interventions on a subset of variables. Let be the set of node indices that belong to a path from to including , we use the contexts to evaluate as . Intuitively, to compute the interventional effects of on , we need to observe mechanism changes only for to account for the potential causal influence from to . In addition, none of the nodes in a causal path from to should be intervened. We use observational data to evaluate .
Proposition 4.1.
(Identifiability of ) is identifiable from the set of contexts . To detect and measure confounding between a pair of nodes , it is enough to observe two sets of contexts and . Thus, sets of contexts are needed to detect and measure confounding between distinct pairs of nodes in a causal DAG with nodes.
When a confounding variable between is observed, and there may exist an unobserved confounding variable , it is crucial to detect and measure unobserved confounding effect [29]. We utilize conditional directed information to define the measure of unobserved confounding.
Definition 4.5.
(Conditional Directed Information [48]). The conditional directed information from to conditioned on is defined as the conditional Kullback-Leibler divergence between the distributions as follows.
(3) |
This measure can trivially be extended to the case where there exist multiple observed and unobserved confounding variables. The expression means conditioning on in the interventional distribution . Now, the conditional confounding can be measured as:
(4) |
Intuitively, by conditioning on an observed confounding variable , we control the association between flowing via and measure the influence via the unobserved confounding variables.
Beyond Pairwise Confounding: We now study when a set of variables where are jointly confounded i.e., share a common confounding variable and how to measure the joint confounding among the variables .
Theorem 4.1.
A set of observed variables are jointly unconfounded if and only if there exists three variables such that .
We now define the measure of confounding among the variables in as follows.
(5) |
Conditional confounding among a set of variables can be defined similarly to Eqn. 4. We now study some useful properties of the measure .
Theorem 4.2.
For any three observed variables and an unobserved confounding variable , the following statements are true for the measure .
-
1.
(Reflexivity and Symmetry.) , .
-
2.
(Positivity.) if and only if are confounded. Given an observed confounding variable between , if and only if there exists an unobserved confounding variable between .
- 3.
4.2 Setting 2: Detecting and Measuring Confounding Using the Mechanism Shifts of .
The previous setting utilizes the interventional effects of on to define a measure of confounding between . In this setting, we utilize the association between the observed marginal distributions of under causal mechanism shifts of to measure confounding. To this end, similar to [38], we make the following assumption.
Assumption 4.1.
(Shift Faithfulness [38]) Let be a common parent for a set of variables . Then each causal mechanism shift in between two contexts entails a causal mechanism change in each between the same contexts .
One consequence of the Assumption 4.1 is that a change in the causal mechanism of induces correlations between the expectations of in different contexts. To understand this, consider the following structural equations.
(6) |
Where denotes the context and and are noise variables with zero mean and have no additional restriction on the underlying probability distribution. The causal graph corresponding to this model has the nodes and edges: . It is easy to see that and . Following Assumption 4.1, whenever there is a change in causal mechanism of (e.g., changes to in Eqn. 6), there is a change in both . Additionally, since is a common cause of both , there is a spurious association between . Subsequently, in the set of contexts the values , are spuriously associated. Under Assumptions 3.2 and 4.1, restricting our analysis to ensures that with high probability, the association between , is due to the confounding variable . In this example, the association between exists even if , i.e., . To define confounding measure, we create two random variables which we define as respectively where . Relying on the context information and utilizing the association between and , we define a confounding measure as follows.
Proposition 4.2.
(Confounding Based on Mutual Information) If two variables are confounded by a variable , the induced random variables as described above have non zero mutual information .
Definition 4.6.
To measure the unobserved confounding strength when we already observe a confounding variable , we condition on the observed confounding variable to define as follows.
(8) |
Beyond Pairwise Confounding: Following earlier work [38], we utilize total correlation among triplets of random variables in to verify whether a set of variables are jointly confounded. By Assumption 4.1, we know that the variables in jointly confounded only if each pair is pairwise confounded. If all three variables share the same latent confounding variable , then knowing about one of explains away some of the association between the other two, so that we have . However, for a triplet , it is possible that, rather than jointly confounded, there may be three disjoint confounding variables confounding each of the individual pairs: . In general, for a set of variables of size to permit such an equivalent explanation, we would need to have a total of confounding variables with outgoing edges to obtain the same structure of pairwise confounding [38]. While this may plausibly occur for small sets of variables that appear to be pairwise correlated, we assume the true graph to be causally minimal in the following sense.
Assumption 4.2.
(Confounder Minimality [38]) For every subset of at least variables, there are at most edges incoming into from latent confounding variables with at least three children in .
Assumption 4.2 ensures that variables that appear to be jointly confounded are indeed confounded. In other words, when a small number of latent variables suffice to explain the observed correlations, there should indeed exist only few confounding variables. With this assumption, we can guarantee that joint confounding can be identified from the total correlation.
Theorem 4.3.
Let be a set of variables such that all are pairwise confounded. Then is jointly confounded if and only if for each triple we have .
Now, the measure of joint confounding among a set of variables can be defined using total correlation as follows. To evaluate the following expression, we need to use the contexts to ensure that with high probability, the association among the variables in is due to the joint confounding variable .
(9) |
Theorem 4.4.
For any three observed variables and an unobserved confounding variable , the following statements are true for the measure .
-
1.
(Reflexivity and Symmetry.) where denotes conditional entropy and .
-
2.
(Positivity.) if and only if are confounded. Given an observed confounding variable between , if and only if there exists an unobserved confounding variable between .
-
3.
(Monotonicity.) implies that the pair of variables are more strongly confounded than the pair of variables in the sense of Defn. 4.2.
4.3 Setting 3: Observing the Causal Mechanism Shifts in and Known Causal Path Direction Between and
Similar to the previous settings, we utilize marginal and conditional distributions of to define a measure of confounding. By prior knowledge, if we know the direction of causal path between , we can utilize the causal direction to measure confounding as explained below. In addition to the notations introduced in the previous setting, let us denote for each with respectively. We now leverage dependency among these variables to define the measure of confounding. Intuitively, if and if we observe a change in the causal mechanisms of both due to the causal mechanism changes in , we also observe a change in the causal mechanism .
Definition 4.7.
To measure the unobserved confounding strength in the presence of an observed confounding variable , similar to setting 2, we can modify Eqn. 10 to condition on the variable .
Beyond Pairwise Confounding: Using the Assumption 4.2, we have the following.
Theorem 4.5.
Let be a set of variables such that all are pairwise confounded and the causal relationships among each pair . Then is jointly confounded if and only if for each triple we have .
Since we have access to random variables in addition to , it is not straightforward to use all of them to measure joint confounding. To keep the measure simple, we let the measure of joint confounding among the variables be the same as . That is, . Setting 3 is an alternative to Setting 2 when we know the direction of the causal path between . Settings 2 and 3 act as complementary to each other in validating the correctness of our analysis.
Theorem 4.6.
For any three observed variables and an unobserved confounding variable , the following statements are true for the measure .
-
1.
(Reflexivity and Symmetry.) where denotes conditional entropy and .
-
2.
(Positivity.) if and only if are confounded. Given an observed confounding variable between , if and only if there exists an unobserved confounding variable between .
-
3.
(Monotonicity.) implies that the pair of variables are more strongly confounded than the pair of variables in the sense of Defn. 4.2.
5 Algorithm
6 Experiments and Results
We perform simulation studies to verify the correctness of the proposed measures. All the experiments are run on a CPU. We report the mean and standard deviation of results taken over five random seeds. Code to reproduce the results is presented in the supplementary material. Code is available at https://github.com/gautam0707/CD_CNF.
Measuring Confounding: In this set of experiments, we consider the following four causal structures made of three nodes : Empty graph over i.e., nodes are isolated in the graph, , , . In , there is no confounding between and in there is confounding effect of on and . Results in Fig. 2 show that our measures output zero when there is no confounding between and output positive values when are confounded by a confounding variable .
Measuring Conditional Confounding: We consider the following two causal structures. . . In , and are confounded by two variables . We measure conditional confounding between conditioned on , , and respectively. Since confounding still exists in all of the above conditioning settings, correctly returns positive confounding value in all three cases (see Fig. 3 left). On the other hand, in , we measure conditional confounding between conditioning on empty set and . Since conditioning on will block the confounding association between , returns confounding value closer to zero. However, the unconditioned confounding (conditioning on empty set) value is still large. These results empirically validate the correctness of the proposed measures.
Causal | Not Controlling Confounding | Controlling Confounding | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Graph | 1000 | 2000 | 3000 | 4000 | 5000 | 1000 | 2000 | 3000 | 4000 | 5000 |
0.55 | 0.57 | 0.55 | 0.52 | 0.52 | 0.06 | 0.02 | 0.007 | 0.03 | 0.009 | |
0.24 | 0.26 | 0.23 | 0.24 | 0.23 | 0.04 | 0.05 | 0.06 | 0.02 | 0.05 |
Downstream Causal Effect Estimation: For the causal graphs , we examine the impact of controlling for nodes identified using our method. We measure the causal effect of on with and without controlling for the detected confounding variable and report the absolute difference between the true and estimated causal effects in Tab. 3. The results show that controlling for the variables identified by our method reduces the bias in the estimated causal effects.
Binary Data - Erdös-Rényi Causal Graphs: To verify the performance of our method on a large scale, similar to [38], we generate causal graphs of various number nodes using Erdös-Rényi model. In these experiments, each context is a result of intervention on one node. This is the reason for having the same value for number of nodes and number of contexts . Sample size denotes the number of data points used in each context. We detect and measure whether each pair of nodes is confounded or not. We then calculate the Precision, Recall, and F1 scores. Our confounding measures obtain good results across all settings.
Setting 1 | Setting 2 | Setting 3 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
, | Sample Size | Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 |
10 | 100 | 0.64 | 0.97 | 0.77 | 0.67 | 0.83 | 0.74 | 0.64 | 0.72 | 0.68 |
10 | 200 | 0.64 | 1.0 | 0.78 | 0.67 | 0.83 | 0.74 | 0.70 | 0.79 | 0.74 |
10 | 300 | 0.64 | 1.0 | 0.78 | 0.67 | 0.83 | 0.74 | 0.65 | 0.76 | 0.70 |
10 | 400 | 0.64 | 1.0 | 0.78 | 0.67 | 0.83 | 0.74 | 0.67 | 0.83 | 0.74 |
10 | 500 | 0.64 | 1.0 | 0.78 | 0.67 | 0.83 | 0.74 | 0.67 | 0.83 | 0.74 |
15 | 100 | 0.81 | 0.95 | 0.88 | 0.80 | 0.85 | 0.82 | 0.80 | 0.79 | 0.80 |
15 | 200 | 0.82 | 1.0 | 0.90 | 0.80 | 0.85 | 0.82 | 0.80 | 0.85 | 0.82 |
15 | 300 | 0.82 | 1.0 | 0.90 | 0.80 | 0.85 | 0.82 | 0.80 | 0.85 | 0.82 |
15 | 400 | 0.82 | 1.0 | 0.90 | 0.80 | 0.85 | 0.82 | 0.80 | 0.85 | 0.82 |
15 | 500 | 0.82 | 1.0 | 0.90 | 0.80 | 0.85 | 0.82 | 0.80 | 0.84 | 0.82 |
20 | 100 | 0.68 | 0.95 | 0.80 | 0.68 | 0.88 | 0.77 | 0.69 | 0.84 | 0.76 |
20 | 200 | 0.69 | 1.0 | 0.82 | 0.68 | 0.88 | 0.77 | 0.68 | 0.87 | 0.76 |
20 | 300 | 0.69 | 1.0 | 0.82 | 0.68 | 0.88 | 0.77 | 0.67 | 0.86 | 0.75 |
20 | 400 | 0.69 | 1.0 | 0.82 | 0.68 | 0.88 | 0.77 | 0.68 | 0.87 | 0.76 |
20 | 500 | 0.69 | 1.0 | 0.82 | 0.68 | 0.88 | 0.77 | 0.68 | 0.87 | 0.76 |
25 | 100 | 0.83 | 0.96 | 0.89 | 0.83 | 0.91 | 0.87 | 0.83 | 0.89 | 0.86 |
25 | 200 | 0.83 | 1.0 | 0.91 | 0.83 | 0.91 | 0.87 | 0.82 | 0.90 | 0.86 |
25 | 300 | 0.83 | 1.0 | 0.91 | 0.83 | 0.91 | 0.87 | 0.83 | 0.91 | 0.87 |
25 | 400 | 0.83 | 1.0 | 0.91 | 0.83 | 0.92 | 0.87 | 0.83 | 0.91 | 0.87 |
25 | 500 | 0.83 | 1.0 | 0.91 | 0.83 | 0.91 | 0.87 | 0.83 | 0.91 | 0.87 |
7 Conclusions, Limitations, and Future Work
In this paper, based on the known causal mechanism shifts of observed variables, we propose three measures of confounding along with their conditional and multivariate variants. We also study key properties of these measures. Our measures complement each other depending on the available context information. We propose algorithms to compute the proposed measures and empirically verify their correctness. However, for the same confounded pair of variables, our metrics may yield different results depending on the chosen measure. As discussed in the introduction, the measures are intended to assess the relative strengths of confounding rather than for point-to-point comparison. The number of contexts required to evaluate the measure can be large because many contexts without changes in particular mechanisms are discarded. Identifying appropriate real-world datasets and applying the proposed measures to those datasets is an interesting area for future work, as is developing measures that efficiently use context information. Additionally, devising new definitions for confounding and proposing corresponding confounding measures is also an interesting future direction. We aim to pursue these ideas.
Acknowledgments
This work was partly supported by the Prime Minister’s Research Fellowship (PMRF) program and an Adobe Research Gift. We are grateful to the anonymous reviewers for their valuable feedback, which improved the presentation of the paper.
References
- Aldrich [1995] John Aldrich. Correlations genuine and spurious in pearson and yule. Statistical science, pages 364–376, 1995.
- Bhattacharya et al. [2021] Rohit Bhattacharya, Tushar Nagarajan, Daniel Malinsky, and Ilya Shpitser. Differentiable causal discovery under unmeasured confounding. In International Conference on Artificial Intelligence and Statistics, pages 2314–2322, 2021.
- Breslow et al. [1980] Norman E Breslow, Nicholas E Day, and Elisabeth Heseltine. Statistical methods in cancer research. 1980.
- Brouillard et al. [2020] Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Differentiable causal discovery from interventional data. In Advances in Neural Information Processing Systems, volume 33, pages 21865–21877, 2020.
- Budtz–Jørgensen et al. [2007] Esben Budtz–Jørgensen, Niels Keiding, Philippe Grandjean, and Pal Weihe. Confounder selection in environmental epidemiology: Assessment of health effects of prenatal mercury exposure. Annals of Epidemiology, 17(1):27–35, 2007.
- Carey and Stiles [2016] Timothy A Carey and William B Stiles. Some problems with randomized controlled trials and some viable alternatives. Clinical Psychology & Psychotherapy, 23(1):87–95, 2016.
- Chandrasekaran et al. [2010] Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. Latent variable graphical model selection via convex optimization. In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1610–1613. IEEE, 2010.
- Chickering [2002] David Maxwell Chickering. Learning equivalence classes of bayesian-network structures. The Journal of Machine Learning Research, 2:445–498, 2002.
- Colombo et al. [2012] Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pages 294–321, 2012.
- David et al. [2010] Shai Ben David, Tyler Lu, Teresa Luu, and Dávid Pál. Impossibility theorems for domain adaptation. In AISTATS, pages 129–136, 2010.
- Diepen et al. [2023] Mirthe Maria Van Diepen, Ioan Gabriel Bucur, Tom Heskes, and Tom Claassen. Beyond the markov equivalence class: Extending causal discovery under latent confounding. In 2nd Conference on Causal Learning and Reasoning, 2023.
- Eberhardt and Scheines [2007] Frederick Eberhardt and Richard Scheines. Interventions and causal inference. Philosophy of science, 74(5):981–995, 2007.
- Eberhardt et al. [2012] Frederick Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. arXiv preprint arXiv:1207.1389, 2012.
- Evans and Richardson [2019] Robin J Evans and Thomas S Richardson. Smooth, identifiable supermodels of discrete dag models with latent variables. Bernoulli, 25:848–876, 2019.
- Greenland and Morgenstern [2001] Sander Greenland and Hal Morgenstern. Confounding in health research. Annual review of public health, 22(1):189–212, 2001.
- Groenwold et al. [2009] R.H.H. Groenwold, E. Hak, and A.W. Hoes. Quantitative assessment of unobserved confounding is mandatory in nonrandomized intervention studies. Journal of Clinical Epidemiology, 62(1):22–28, 2009.
- Guo et al. [2023] Siyuan Guo, Viktor Tóth, Bernhard Schölkopf, and Ferenc Huszár. Causal de finetti: On the identification of invariant causal structure in exchangeable data. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Hammerton and Munafò [2021] Gemma Hammerton and Marcus R Munafò. Causal inference with observational data: the need for triangulation of evidence. Psychological medicine, 51(4):563–578, 2021.
- Hauser and Bühlmann [2014] Alain Hauser and Peter Bühlmann. Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55(4):926–939, 2014.
- Hoyer et al. [2008] Patrik O Hoyer, Shohei Shimizu, Antti J Kerminen, and Markus Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):362–378, 2008.
- Huang et al. [2017] Biwei Huang, Kun Zhang, Jiji Zhang, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. Behind distribution shift: Mining driving forces of changes and causal arrows. In 2017 IEEE International Conference on Data Mining (ICDM), pages 913–918. IEEE, 2017.
- Huang et al. [2020] Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. Causal discovery from heterogeneous/nonstationary data. Journal of Machine Learning Research, 21(89):1–53, 2020.
- Jaber et al. [2020] Amin Jaber, Murat Kocaoglu, Karthikeyan Shanmugam, and Elias Bareinboim. Causal discovery from soft interventions with unknown targets: Characterization and learning. Advances in neural information processing systems, 33:9551–9561, 2020.
- Janes et al. [2010] Holly Janes, Francesca Dominici, and Scott Zeger. On quantifying the magnitude of confounding. Biostatistics, 11(3):572–582, 2010.
- Janzing and Schölkopf [2018] Dominik Janzing and Bernhard Schölkopf. Detecting confounding in multivariate linear models via spectral analysis. Journal of Causal Inference, 6(1):20170013, 2018.
- Jesson et al. [2022] Andrew Jesson, Alyson Rose Douglas, Peter Manshausen, Maëlys Solal, Nicolai Meinshausen, Philip Stier, Yarin Gal, and Uri Shalit. Scalable sensitivity and uncertainty analyses for causal-effect estimates of continuous-valued interventions. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Kaltenpoth and Vreeken [2023a] David Kaltenpoth and Jilles Vreeken. Nonlinear causal discovery with latent confounders. In International Conference on Machine Learning, pages 15639–15654, 2023a.
- Kaltenpoth and Vreeken [2023b] David Kaltenpoth and Jilles Vreeken. Causal discovery with hidden confounders using the algorithmic Markov condition. In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 1016–1026, 2023b.
- Karlsson and Krijthe [2023] Rickard Karlsson and Jesse Krijthe. Detecting hidden confounding in observational data using multiple environments. Advances in Neural Information Processing Systems, 36, 2023.
- Kleinbaum et al. [2007] David G Kleinbaum, Kevin M Sullivan, and Nancy D Barker. A pocket guide to epidemiology. Springer, 2007.
- Ksir and Hart [2016] Charles Ksir and Carl L Hart. Correlation still does not imply causation. The Lancet Psychiatry, 3(5):401, 2016.
- Lee [2014] Paul H Lee. Is a cutoff of 10% appropriate for the change-in-estimate criterion of confounder identification? Journal of epidemiology, 24(2):161–167, 2014.
- Li et al. [2023] Adam Li, Amin Jaber, and Elias Bareinboim. Causal discovery from observational and interventional data across multiple environments. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Maldonado and Greenland [1993] George Maldonado and Sander Greenland. Simulation study of confounder-selection strategies. American journal of epidemiology, 138(11):923–936, 1993.
- Maldonado and Greenland [2002] George Maldonado and Sander Greenland. Estimating causal effects. International journal of epidemiology, 31(2):422–429, 2002.
- Mameche et al. [2022] Sarah Mameche, David Kaltenpoth, and Jilles Vreeken. Discovering invariant and changing mechanisms from data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1242–1252, 2022.
- Mameche et al. [2023] Sarah Mameche, David Kaltenpoth, and Jilles Vreeken. Learning causal models under independent changes. Advances in Neural Information Processing Systems, 36, 2023.
- Mameche et al. [2024] Sarah Mameche, Jilles Vreeken, and David Kaltenpoth. Identifying confounding from causal mechanism shifts. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2024.
- Miettinen and Cook [1981] Olli S Miettinen and E Francis Cook. Confounding: essence and detection. American journal of epidemiology, 114(4):593–603, 1981.
- Mooij et al. [2020] Joris M Mooij, Sara Magliacane, and Tom Claassen. Joint causal inference from multiple contexts. Journal of Machine Learning Research, 21:1–108, 2020.
- Nichols [2007] Austin Nichols. Causal inference with observational data. The Stata Journal, 7(4):507–541, 2007.
- Ogarrio et al. [2016] Juan Miguel Ogarrio, Peter Spirtes, and Joe Ramsey. A hybrid causal search algorithm for latent variable models. In Proceedings of the Eighth International Conference on Probabilistic Graphical Models, pages 368–379, 2016.
- Pang et al. [2016] Menglan Pang, Jay S Kaufman, and Robert W Platt. Studying noncollapsibility of the odds ratio with marginal structural and logistic regression models. Statistical methods in medical research, 25(5):1925–1937, 2016.
- Pearl [2009] Judea Pearl. Causality. Cambridge university press, 2009.
- Perry et al. [2022] Ronan Perry, Julius Von Kügelgen, and Bernhard Schölkopf. Causal discovery in heterogeneous environments under the sparse mechanism shift hypothesis. In Advances in Neural Information Processing Systems, pages 10904–10917, 2022.
- Peters et al. [2014] Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research, 15(1):2009–2053, 2014.
- Peters et al. [2017] Jonas Peters, Dominik Janzing, and Bernhard Schlkopf. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017.
- Raginsky [2011] Maxim Raginsky. Directed information and pearl’s causal calculus. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 958–965, 2011.
- Richardson et al. [2023] Thomas S Richardson, Robin J Evans, James M Robins, and Ilya Shpitser. Nested markov properties for acyclic directed mixed graphs. The Annals of Statistics, 51(1):334–361, 2023.
- Sanson-Fisher et al. [2007] Robert William Sanson-Fisher, Billie Bonevski, Lawrence W. Green, and Cate D’Este. Limitations of the randomized controlled trial in evaluating population-based health interventions. American Journal of Preventive Medicine, 33(2):155–161, 2007.
- Scanagatta et al. [2015] Mauro Scanagatta, Cassio P de Campos, Giorgio Corani, and Marco Zaffalon. Learning bayesian networks with thousands of variables. Advances in neural information processing systems, 28, 2015.
- Schölkopf et al. [2012] B Schölkopf, D Janzing, J Peters, E Sgouritsa, K Zhang, and J Mooij. On causal and anticausal learning. In 29th International Conference on Machine Learning (ICML 2012), pages 1255–1262. International Machine Learning Society, 2012.
- Schölkopf et al. [2021] Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
- Schuster et al. [2021] Noah A Schuster, Jos WR Twisk, Gerben Ter Riet, Martijn W Heymans, and Judith JM Rijnhart. Noncollapsibility and its role in quantifying confounding bias in logistic regression. BMC medical research methodology, 21:1–9, 2021.
- Shanmugam et al. [2015] Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G Dimakis, and Sriram Vishwanath. Learning causal graphs with small interventions. Advances in Neural Information Processing Systems, 28, 2015.
- Shpitser et al. [2014] Ilya Shpitser, Robin J Evans, Thomas S Richardson, and James M Robins. Introduction to nested markov models. Behaviormetrika, 41:3–39, 2014.
- Shpitser et al. [2018] Ilya Shpitser, Robin J Evans, and Thomas S Richardson. Acyclic linear sems obey the nested markov property. In Uncertainty in artificial intelligence: proceedings of the… conference. Conference on Uncertainty in Artificial Intelligence, volume 2018. NIH Public Access, 2018.
- Simpson [1951] Edward H Simpson. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), 13(2):238–241, 1951.
- Spirtes and Zhang [2016] Peter Spirtes and Kun Zhang. Causal discovery and inference: concepts and recent methodological advances. In Applied informatics, volume 3, pages 1–28. Springer, 2016.
- Spirtes et al. [2000] Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search. 2000.
- Tan [2006] Zhiqiang Tan. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101(476):1619–1637, 2006.
- VanderWeele and Shpitser [2013] Tyler J VanderWeele and Ilya Shpitser. On the definition of a confounder. Annals of statistics, 41(1):196, 2013.
- Wang and Drton [2023] Y. Samuel Wang and Mathias Drton. Causal discovery with unobserved confounding and non-gaussian data. Journal of Machine Learning Research, 24(271):1–61, 2023.
- Wieczorek and Roth [2019] Aleksander Wieczorek and Volker Roth. Information theoretic causal effect quantification. Entropy, 21(10), 2019.
- Zanga et al. [2022] Alessio Zanga, Elif Ozkirimli, and Fabio Stella. A survey on causal discovery: Theory and practice. International Journal of Approximate Reasoning, 151:101–129, 2022.
- Zheng et al. [2018] Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
Appendix
Appendix A Proofs
See 4.1
Proof.
Since the set of contexts consist of data with all possible interventions on , if a context is generated by performing intervention on with the value , the expression is equal to the expression in that context .
From Defn. 4.4, to detect and measure confounding between the pair of variables , we need to evaluate and . To this end, from the previous paragraph, we need two sets of contexts and . Following these observations, it is enough to have sets of contexts to detect and measure confounding between distinct pairs of nodes. ∎
See 4.1
Proof.
Consider three variables in the underlying causal graph. Consider the conditional directed information between given and the subsequent manipulations as follows.
Since we have . Equality holds only when are unconfounded. ∎
See 4.2
Proof.
Reflexivity: From the definition of directed information, and hence .
Symmetry: Even if is not symmetric, the expression ‘’ is symmetric and hence is symmetric.
Positivity: If are confounded, irrespective of the direction of the causal path between and , we have and . Hence and . We now have . The above statement is true even if there is no causal path between the nodes . The above statements are valid even after conditioning on an observed confounding variable if there is an unobserved confounding between .
Monotonicity: Without loss of generality, assume that the inequality is a result of . That is, the KL divergence between and is greater than the kl divergence between and . That is, the pair of distributions and are closer to each other compared to the pair and . As a result, are closer to being not confounded in the sense of Defns. 4.2 and 4.3. ∎
See 4.2
Proof.
There are two sources of dependency between . If are causally related in the underlying causal model generating the data, there will be a dependency between in the context as the interventions are soft. On the other hand, as per the Assumption 4.1, any shift in the causal mechanism of leads to a change in both the mechanisms of leading to a dependency. Hence the random variables have non-zero mutual information. ∎
See 4.3
Proof.
Following the Assumption 4.2, when three variables are confounded by as single confounding variable , conditioning on one of explains away some of the dependency between other two. Hence we have for all triples . ∎
See 4.4
Proof.
Reflexivity: from the definition of mutual information, . Substituting in the definition of , result follows.
Symmetry: The result follows from the ‘symmetry’ property of mutual information.
Positivity: If are confounded, from the Assumption 4.1, are dependent random variables. Hence the mutual information is positive. The result follows after substituting some positive value for in the definition of . The same argument goes for conditional confounding.
Monotonicity: from the definition of , implies . From the Defn. 4.2, have higher mutual information than the pair and hence are more strongly confounded than . ∎
See 4.5
Proof.
Following the Assumption 4.2, when three variables are confounded by as single confounding variable , conditioning on explains away some of the dependency between . Hence we have for all triples . ∎
See 4.6
Proof.
Reflexivity: from the definition of mutual information, . Substituting in the definition of , result follows.
Symmetry: Since we rely on the direction of the causal path between , for a given pair of nodes , we have from Defn. 4.7.
Positivity: If are confounded and , from the Assumption 4.1, are dependent random variables. Hence the mutual information is positive. The result follows after substituting positive value for in the definition of . The same argument goes for conditional confounding.
Monotonicity: from the definition of , without loss of generality, implies . From the Defn. 4.2, have higher mutual information and hence are more strongly confounded than . ∎