Explicit Formula for
Partial Information Decomposition

Aobo Lyu Andrew Clark and Netanel Raviv
Department of Electrical and Systems Engineering
Washington University in St. Louis St. Louis MO USA
Department of Computer Science and Engineering
Washington University in St. Louis St. Louis MO USA
[email protected]
[email protected] [email protected]
Abstract

Mutual information between two random variables is a well-studied notion, whose understanding is fairly complete. Mutual information between one random variable and a pair of other random variables, however, is a far more involved notion. Specifically, Shannon’s mutual information does not capture fine-grained interactions between those three variables, resulting in limited insights in complex systems. To capture these fine-grained interactions, in 2010 Williams and Beer proposed to decompose this mutual information to information atoms, called unique, redundant, and synergistic, and proposed several operational axioms that these atoms must satisfy. In spite of numerous efforts, a general formula which satisfies these axioms has yet to be found. Inspired by Judea Pearl’s do-calculus, we resolve this open problem by introducing the do-operation, an operation over the variable system which sets a certain marginal to a desired value, which is distinct from any existing approaches. Using this operation, we provide the first explicit formula for calculating the information atoms so that Williams and Beer’s axioms are satisfied, as well as additional properties from subsequent studies in the field.

I Introduction

Since its inception by Claude Shannon [1], mutual information has remained a pivotal measure in information theory, which finds extensive applications across multiple other domains. Extending mutual information to multivariate systems has attracted significant academic interest, but no widely agreed upon generalization exists to date. For instance, the so-called interaction information [2] emerged in 1960 as an equivalent notion for mutual information in multivariate systems, and yet, it provides negative values in many common systems, contradicting Shannon’s viewpoint of information measures as nonnegative quantities.

Arguably the simplest multivariate setting in which Shannon’s mutual information fails to capture the full complexity of the system is that of a three variable system, with two source variables, and one target variable. Mutual information between the source variables and the target variable does not provide insights about how the source variables influence the target variable. Specifically, in various points of the probability space the value of the target variable might be computable either:

  1. (a)

    exclusively from one source variable (but not the other);

  2. (b)

    either one of the source variables; or

  3. (c)

    both variables jointly (but not separately).

In 2010 William and Beer [3] proposed to formalize the above fine-grained interactions in a three variable system111William and Beer formulated their notions for any number of source variables, and yet herein we focus on three variables for simplicity. Extending our methods to multiple source variables will be addressed in future version of this paper. using an axiomatic approach they called Partial Information Decomposition (PID). They proposed decomposing said mutual information to four constituent ingredients called information atoms, which capture the above possible interactions between the variables:

  1. (a)

    two unique information atoms, one for each source variable, which capture the information each source variable implies about the target variable, that cannot be inferred from the other;

  2. (b)

    one redundant information atom, which captures the information that can inferred about the target variable from either one of the source variables; and

  3. (c)

    one synergistic information atom, which captures the information that can be inferred about the target variable from both source variables jointly, but not individually.

Ref. [3] proposed a set of axioms that the above information atoms should satisfy in order to provide said insights, and follow-up works in the field identified several additional properties [4, 5, 6, 7]. Yet, in spite of extensive efforts [8, 9, 10, 11, 12], a comprehensive definition of information atoms which satisfies all these axioms and properties is yet to be found.

In spite of limited understanding of the information atoms, PID has already found multiple applications in various fields. As a simple example [13, Fig. 1], one can imagine the two source variables being education level and gender, and the target variable being annual income. An exact formula for computing the information atoms would shed insightful information about the extent to which annual income is a result of education level, gender, either one, or both.

Beyond this simple example, PID has broad applications in a wide range of fields. In brain network analysis, PID (or similar ideas) has been instrumental in measuring correlations between neurons [14] and understanding complex neuronal interactions in cognitive processes [15]. For privacy and fairness studies, the synergistic concept provides insights about data disclosure mechanisms [16, 17]. In the field of causality, information decomposition can be used to distinguish and quantify the occurrence of causal emergence [18], and more.

In this paper, we propose an explicit PID formula that satisfies all of Williams and Beer’s axioms, as well as several additional desired properties. We do so by introducing the do-operation, which is inspired by similar concepts in the field of causal analysis [19, 20, 21]. Intuitively, based on the understanding that unique information is “ideal conditional mutual information,” our method first adjusts the entire probability distribution by using the do-operation in order to make the target variable identical to its conditional distribution given one source variable, and then calculates the expectation of mutual information between it and the other source variable under different conditions. It is worth noting that our method is not based on any of the point-wise, localized, or optimization approaches that existing methods use.

We begin in Section II by introducing the PID framework, and its axioms and properties. We continue in Section III by introducing our do-operation and the definition of unique information, from which all other definitions follow, and prove that all axioms and properties are satisfied. We discuss the intrinsic meaning of our definition in Section IV, and provide all proofs in the appendix.

II Framework, axioms, and properties

The following notational conventions are observed throughout this article: X,𝒳,x𝑋𝒳𝑥X,\mathcal{X},xitalic_X , caligraphic_X , italic_x (similarly Y,𝒴,y𝑌𝒴𝑦Y,\mathcal{Y},yitalic_Y , caligraphic_Y , italic_y etc.) denote a random variable, its corresponding (finite) alphabet, and an element of that alphabet, respectively. The distribution of X𝑋Xitalic_X is denoted by 𝒟Xsubscript𝒟𝑋\mathcal{D}_{X}caligraphic_D start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT, the joint distribution of X𝑋Xitalic_X and Y𝑌Yitalic_Y is denoted by 𝒟X,Ysubscript𝒟𝑋𝑌\mathcal{D}_{X,Y}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT, and the distribution of X𝑋Xitalic_X given Y=y𝑌𝑦Y=yitalic_Y = italic_y is denoted by 𝒟X|Y=ysubscript𝒟conditional𝑋𝑌𝑦\mathcal{D}_{X|Y=y}caligraphic_D start_POSTSUBSCRIPT italic_X | italic_Y = italic_y end_POSTSUBSCRIPT.

For random variables X,Y,Z𝑋𝑌𝑍X,Y,Zitalic_X , italic_Y , italic_Z, the quantity I((X,Y);Z)𝐼𝑋𝑌𝑍I((X,Y);Z)italic_I ( ( italic_X , italic_Y ) ; italic_Z ) captures the amount of information that one target variable Z𝑍Zitalic_Z shares with the source variables (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ), but provides no further information regarding finer interactions between the three variables. To gain more subtle insights into the interactions between Z𝑍Zitalic_Z and (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ), [3] proposed to further decompose I((X,Y);Z)𝐼𝑋𝑌𝑍I((X,Y);Z)italic_I ( ( italic_X , italic_Y ) ; italic_Z ) into information atoms. Specifically, the shared information between Z𝑍Zitalic_Z and (X,Y)𝑋𝑌(X,Y)( italic_X , italic_Y ) should contain a redundant information atom, two unique information atoms, and one synergistic information atom (see Figure 1).

The redundant information atom Red(X,YZ)Red𝑋𝑌𝑍\operatorname{Red}(X,Y\to Z)roman_Red ( italic_X , italic_Y → italic_Z ) (also called “shared”) represents the information which either X𝑋Xitalic_X oder Y𝑌Yitalic_Y imply about Z𝑍Zitalic_Z. The unique information atom Un(XZ|Y)Un𝑋conditional𝑍𝑌\operatorname{Un}(X\to Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) represents the information individually contributed to Z𝑍Zitalic_Z by X𝑋Xitalic_X, but not by Y𝑌Yitalic_Y (similarly Un(YZ|X)Un𝑌conditional𝑍𝑋\operatorname{Un}(Y\to Z|X)roman_Un ( italic_Y → italic_Z | italic_X )). The synergistic information atom Syn(X,YZ)Syn𝑋𝑌𝑍\operatorname{Syn}(X,Y\to Z)roman_Syn ( italic_X , italic_Y → italic_Z ) (also called “complementary”), represents the information that can only be known about Z𝑍Zitalic_Z through the joint observation of X𝑋Xitalic_X and Y𝑌Yitalic_Y, but cannot be provided by either one of them separately. Together, we must have that

I((X,Y);Z)𝐼𝑋𝑌𝑍\displaystyle I((X,Y);Z)italic_I ( ( italic_X , italic_Y ) ; italic_Z ) =Red(X,YZ)+Syn(X,YZ)absentRed𝑋𝑌𝑍Syn𝑋𝑌𝑍\displaystyle=\operatorname{Red}(X,Y\to Z)+\operatorname{Syn}(X,Y\to Z)= roman_Red ( italic_X , italic_Y → italic_Z ) + roman_Syn ( italic_X , italic_Y → italic_Z )
+Un(XZ|Y)+Un(YZ|X).Un𝑋conditional𝑍𝑌Un𝑌conditional𝑍𝑋\displaystyle\phantom{=}+\operatorname{Un}(X\to Z|Y)+\operatorname{Un}(Y\to Z|% X).+ roman_Un ( italic_X → italic_Z | italic_Y ) + roman_Un ( italic_Y → italic_Z | italic_X ) . (1)

We refer to (1) as Partial Information Decomposition (PID).

Moreover, since the redundant atom together with one of the unique atoms constitute all information that one source variable implies about the target variable, it must be the case that their summation equals the mutual information between the two, i.e., that

I(X;Z)𝐼𝑋𝑍\displaystyle I(X;Z)italic_I ( italic_X ; italic_Z ) =Red(X,YZ)+Un(XZ|Y), andabsentRed𝑋𝑌𝑍Un𝑋conditional𝑍𝑌 and\displaystyle=\operatorname{Red}(X,Y\to Z)+\operatorname{Un}(X\to Z|Y),\mbox{ and}= roman_Red ( italic_X , italic_Y → italic_Z ) + roman_Un ( italic_X → italic_Z | italic_Y ) , and
I(Y;Z)𝐼𝑌𝑍\displaystyle I(Y;Z)italic_I ( italic_Y ; italic_Z ) =Red(X,YZ)+Un(YZ|X).absentRed𝑋𝑌𝑍Un𝑌conditional𝑍𝑋\displaystyle=\operatorname{Red}(X,Y\to Z)+\operatorname{Un}(Y\to Z|X).= roman_Red ( italic_X , italic_Y → italic_Z ) + roman_Un ( italic_Y → italic_Z | italic_X ) . (2)

In a similar spirit, the synergistic information atom and one of the unique information atoms measure shared information between the target variable and one of the source variables, while excluding the other source variable. Therefore, the summation of these quantities should coincide with the well-known definition of conditional mutual information, i.e.,

I(Z;X|Y)𝐼𝑍conditional𝑋𝑌\displaystyle I(Z;X|Y)italic_I ( italic_Z ; italic_X | italic_Y ) =Syn(X,YZ)+Un(XZ|Y), andabsentSyn𝑋𝑌𝑍Un𝑋conditional𝑍𝑌 and\displaystyle=\operatorname{Syn}(X,Y\to Z)+\operatorname{Un}(X\to Z|Y),\mbox{ and}= roman_Syn ( italic_X , italic_Y → italic_Z ) + roman_Un ( italic_X → italic_Z | italic_Y ) , and
I(Z;Y|X)𝐼𝑍conditional𝑌𝑋\displaystyle I(Z;Y|X)italic_I ( italic_Z ; italic_Y | italic_X ) =Syn(X,YZ)+Un(YZ|X).absentSyn𝑋𝑌𝑍Un𝑌conditional𝑍𝑋\displaystyle=\operatorname{Syn}(X,Y\to Z)+\operatorname{Un}(Y\to Z|X).= roman_Syn ( italic_X , italic_Y → italic_Z ) + roman_Un ( italic_Y → italic_Z | italic_X ) . (3)

Eqs. (2), and (3) are the foundation of an axiomatic approach towards an operational definition of the information atoms. These equations form the first in a series of axioms, presented next, which were raised in previous works on the topic [3, 22, 23]. Such axiomatic approach was also taken in the past in order to shed light on Shannon’s mutual information [24].

Axiom 1 (Information atoms relationship).

Partial Information Decomposition (1) satisfies (2) and (3).

RedRed\operatorname{Red}roman_Red (X,YZ)𝑋𝑌𝑍(X,Y\to Z)( italic_X , italic_Y → italic_Z )UnUn\operatorname{Un}roman_Un(XZ|Y)𝑋conditional𝑍𝑌(X\to Z|Y)( italic_X → italic_Z | italic_Y )SynSyn\operatorname{Syn}roman_Syn(X,YZ)𝑋𝑌𝑍(X,Y\to Z)( italic_X , italic_Y → italic_Z )UnUn\operatorname{Un}roman_Un(YZ|X)𝑌conditional𝑍𝑋(Y\to Z|X)( italic_Y → italic_Z | italic_X )I(X,Y;Z)𝐼𝑋𝑌𝑍I(X,Y;Z)italic_I ( italic_X , italic_Y ; italic_Z )I(Y;Z)𝐼𝑌𝑍I(Y;Z)italic_I ( italic_Y ; italic_Z )I(X;Z)𝐼𝑋𝑍I(X;Z)italic_I ( italic_X ; italic_Z )
Figure 1: A pictorial representation of Partial Information Decomposition (1), where I((X,Y);Z)𝐼𝑋𝑌𝑍I((X,Y);Z)italic_I ( ( italic_X , italic_Y ) ; italic_Z ) is decomposed to its finer information atoms, the synergistic Syn(X,YZ)Syn𝑋𝑌𝑍\operatorname{Syn}(X,Y\to Z)roman_Syn ( italic_X , italic_Y → italic_Z ) (also called “complementary”), the redundant Red(X,YZ)Red𝑋𝑌𝑍\operatorname{Red}(X,Y\to Z)roman_Red ( italic_X , italic_Y → italic_Z ) (also called “shared”), and the two directional unique components Un(XZ|Y)Un𝑋conditional𝑍𝑌\operatorname{Un}(X\to Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) and Un(YZ|X)Un𝑌conditional𝑍𝑋\operatorname{Un}(Y\to Z|X)roman_Un ( italic_Y → italic_Z | italic_X ). The summation of the redundant atom and one of the unique atoms must be equal to the corresponding mutual information, as described in Eq. (2).

Notice that it suffices to specify the definition of any one of the information atoms, and the definitions for the remaining atoms follow from Axiom 1. Consequently, [3, 25] chose to specify RedRed\operatorname{Red}roman_Red, and provided three additional axioms which RedRed\operatorname{Red}roman_Red should satisfy.

The first additional axiom is commutativity of the source variables, which implies that the order of the source variables must not affect the value of the redundant information.

Axiom 2 (Commutativity).

Partial Information Decomposition satisfies Red(X,YZ)=Red(Y,XZ)Red𝑋𝑌𝑍Red𝑌𝑋𝑍\operatorname{Red}(X,Y\to Z)=\operatorname{Red}(Y,X\to Z)roman_Red ( italic_X , italic_Y → italic_Z ) = roman_Red ( italic_Y , italic_X → italic_Z ).

The second is monotonicity, which implies that the redundant information is non-increasing when adding a source variable, since the newly added variable cannot increase the redundancy between the original variables. We sidestep the discussion about monotonicity with more than two variables, which is not our focus in this paper, even though it can be easily obtained by extending our definition to more than two source variables.

The third is self-redundancy, which defines the redundant information from one source variable to the target variable (i.e., Red(XZ)Red𝑋𝑍\operatorname{Red}(X\to Z)roman_Red ( italic_X → italic_Z )) as the mutual information between them. In the case of two source variables considered herein, monotonicity and self-redundancy merge into the following single axiom.

Axiom 3 (Monotonicity and self-redundancy).

Partial Information Decomposition satisfies Red(X,YZ)min{I(X;Z),I(Y;Z)}Red𝑋𝑌𝑍𝐼𝑋𝑍𝐼𝑌𝑍\operatorname{Red}(X,Y\to Z)\leq\min\{I(X;Z),I(Y;Z)\}roman_Red ( italic_X , italic_Y → italic_Z ) ≤ roman_min { italic_I ( italic_X ; italic_Z ) , italic_I ( italic_Y ; italic_Z ) }.

Notice that Axiom 3, alongside Axiom 1 (specifically (2)), imply that UnUn\operatorname{Un}roman_Un is a nonnegative quantity. The nonnegativity of RedRed\operatorname{Red}roman_Red is stated in [3, 25] as a separate axiom, shown next.

Axiom 4 (Nonnegativity).

Partial Information Decomposition satisfies Red(X,YZ)0Red𝑋𝑌𝑍0\operatorname{Red}(X,Y\to Z)\geq 0roman_Red ( italic_X , italic_Y → italic_Z ) ≥ 0.

The nonnegativity of SynSyn\operatorname{Syn}roman_Syn is normally not listed as an axiom, since it is debatable if it should or should not be nonnegative; we will show that our method yields nonnegative SynSyn\operatorname{Syn}roman_Syn under the closed-system assumption (i.e., H(Z|X,Y)=0𝐻conditional𝑍𝑋𝑌0H(Z|X,Y)=0italic_H ( italic_Z | italic_X , italic_Y ) = 0) in Section III, and further discussion is given in Section IV.

Besides, subsequent to [3, 25], studies suggested two additional properties, additivity and continuity [12, 26]. Additivity implies that whenever independent variable systems are considered, the joint information measures should be the sum of the information measures of each individual system. This is the case, for instance, in joint entropy of two independent variables.

Property 1 (Additivity).

Partial Information Decomposition of two independent systems 𝒟X,Y,Zsubscript𝒟𝑋𝑌𝑍\mathcal{D}_{X,Y,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y , italic_Z end_POSTSUBSCRIPT and 𝒟X¯,Y¯,Z¯subscript𝒟¯𝑋¯𝑌¯𝑍\mathcal{D}_{\bar{X},\bar{Y},\bar{Z}}caligraphic_D start_POSTSUBSCRIPT over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG , over¯ start_ARG italic_Z end_ARG end_POSTSUBSCRIPT satisfy

UnUn\displaystyle\operatorname{Un}roman_Un ((X,X¯)(Z,Z¯)|(Y,Y¯))𝑋¯𝑋conditional𝑍¯𝑍𝑌¯𝑌\displaystyle((X,\bar{X})\to(Z,\bar{Z})|(Y,\bar{Y}))( ( italic_X , over¯ start_ARG italic_X end_ARG ) → ( italic_Z , over¯ start_ARG italic_Z end_ARG ) | ( italic_Y , over¯ start_ARG italic_Y end_ARG ) )
=Un(XZ|Y)+Un(X¯Z¯|Y¯), andabsentUn𝑋conditional𝑍𝑌Un¯𝑋conditional¯𝑍¯𝑌 and\displaystyle=\operatorname{Un}(X\to Z|Y)+\operatorname{Un}(\bar{X}\to\bar{Z}|% \bar{Y}),\mbox{ and}= roman_Un ( italic_X → italic_Z | italic_Y ) + roman_Un ( over¯ start_ARG italic_X end_ARG → over¯ start_ARG italic_Z end_ARG | over¯ start_ARG italic_Y end_ARG ) , and
FF\displaystyle\operatorname{F}roman_F ((X,X¯),(Y,Y¯)(Z,Z¯))=𝑋¯𝑋𝑌¯𝑌𝑍¯𝑍absent\displaystyle((X,\bar{X}),(Y,\bar{Y})\to(Z,\bar{Z}))=( ( italic_X , over¯ start_ARG italic_X end_ARG ) , ( italic_Y , over¯ start_ARG italic_Y end_ARG ) → ( italic_Z , over¯ start_ARG italic_Z end_ARG ) ) =
F(X,YZ)+F(X¯,Y¯Z¯),F𝑋𝑌𝑍F¯𝑋¯𝑌¯𝑍\displaystyle\operatorname{F}(X,Y\to Z)+\operatorname{F}(\bar{X},\bar{Y}\to% \bar{Z}),roman_F ( italic_X , italic_Y → italic_Z ) + roman_F ( over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG → over¯ start_ARG italic_Z end_ARG ) ,

for every F{Red,Syn}FRedSyn\operatorname{F}\in\{\operatorname{Red},\operatorname{Syn}\}roman_F ∈ { roman_Red , roman_Syn }.

Continuity implies that small changes in the probability distribution lead to small changes in the value of the information measure. It ensures that the measure behaves predictably and is a key property in information theory, particularly for measures like entropy and mutual information.

Property 2 (Continuity).

RedRed\operatorname{Red}roman_Red, UnUn\operatorname{Un}roman_Un, and SynSyn\operatorname{Syn}roman_Syn are continuous functions from the underlying joint distributions 𝒟X,Y,Zsubscript𝒟𝑋𝑌𝑍\mathcal{D}_{X,Y,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y , italic_Z end_POSTSUBSCRIPT to \mathbb{R}blackboard_R.

In addition, another well-known property is independent identity [9], which asserts that in a system of two independent source variables and a target variable which equals to their joint distribution, the redundant information should be zero.

Property 3 (Independent Identity).

If I(X,Y)=0𝐼𝑋𝑌0I(X,Y)=0italic_I ( italic_X , italic_Y ) = 0 and Z=(X,Y)𝑍𝑋𝑌Z=(X,Y)italic_Z = ( italic_X , italic_Y ), then Red(X,YZ)=0Red𝑋𝑌𝑍0\operatorname{Red}(X,Y\to Z)=0roman_Red ( italic_X , italic_Y → italic_Z ) = 0.

We mention that several important properties can be inferred from the above. For example, the non-negativity of UnUn\operatorname{Un}roman_Un can be obtained from Axiom 1 and Axiom 3 as mentioned earlier; the commutativity of SynSyn\operatorname{Syn}roman_Syn follows from Axiom 1 and Axiom 2; the difference between (2) and (3) is often called consistency [12], etc.

Finally, we emphasize once again that none of the existing operational definitions of the information atoms satisfy all of the above. A comprehensive list of violations is beyond the page limit of this paper, and yet we briefly mention that Axiom 4 (nonnegativity) is violated by [8, 9, 27] (although some sources do not refer to non-negativity as a requirement); Property 1 (additivity) is violated by all works except [12],  [28],  [8], and [29] according to [26]; Property 3 (independent identity) is violated by [3]; Property 2 (continuity) is violated by [11, 8], [28], [29], etc.

III Proposed information decomposition definition

In this section, we present our operational definition of UnUn\operatorname{Un}roman_Un, from which the definitions of the remaining information atoms follow. Then, we explain the logic behind this definition, and prove that it satisfies all the axioms and properties proposed in Section II.

III-A Definition of Information Atoms

Given a system X,Y,Z𝑋𝑌𝑍X,Y,Zitalic_X , italic_Y , italic_Z define a new random variable Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT over the alphabet 𝒳𝒳\mathcal{X}caligraphic_X via its conditional joint distribution as follows:

Pr(X=x,Z=z|Y=y)Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦\displaystyle\Pr(X^{\prime}=x,Z=z|Y=y)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y )
Pr(X=x|Z=z)Pr(Z=z|Y=y),absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍conditional𝑧𝑌𝑦\displaystyle\triangleq\Pr(X=x|Z=z)\Pr(Z=z|Y=y),≜ roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z | italic_Y = italic_y ) , (4)

meaning, for every x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X,

Pr(X=x)=y,z𝒴×𝒵Pr(X=x,Z=z|Y=y)Pr(Y=y).Prsuperscript𝑋𝑥subscript𝑦𝑧𝒴𝒵Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦Pr𝑌𝑦\displaystyle\Pr(X^{\prime}=x)=\sum_{y,z\in\mathcal{Y}\times\mathcal{Z}}\Pr(X^% {\prime}=x,Z=z|Y=y)\Pr(Y=y).roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x ) = ∑ start_POSTSUBSCRIPT italic_y , italic_z ∈ caligraphic_Y × caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) roman_Pr ( italic_Y = italic_y ) .

The variable Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is well-defined since all probabilities are non-negative, and since

x𝒳Pr(X=x)subscript𝑥𝒳Prsuperscript𝑋𝑥\displaystyle\sum_{x\in\mathcal{X}}\Pr(X^{\prime}=x)∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x )
=x,y,z𝒳×𝒴×𝒵Pr(X=x,Z=z|Y=y)Pr(Y=y)absentsubscript𝑥𝑦𝑧𝒳𝒴𝒵Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦Pr𝑌𝑦\displaystyle=\sum_{x,y,z\in\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}}\Pr(% X^{\prime}=x,Z=z|Y=y)\Pr(Y=y)= ∑ start_POSTSUBSCRIPT italic_x , italic_y , italic_z ∈ caligraphic_X × caligraphic_Y × caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) roman_Pr ( italic_Y = italic_y )
=y𝒴Pr(Y=y)=1.absentsubscript𝑦𝒴Pr𝑌𝑦1\displaystyle=\sum_{y\in\mathcal{Y}}\Pr(Y=y)=1.= ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT roman_Pr ( italic_Y = italic_y ) = 1 .
Definition 1 (Unique Information).

The unique information from X𝑋Xitalic_X to Z𝑍Zitalic_Z given Y𝑌Yitalic_Y is Un(XZ|Y)=I(X;Z|Y)Un𝑋conditional𝑍𝑌𝐼superscript𝑋conditional𝑍𝑌\operatorname{Un}(X\to Z|Y)=I(X^{\prime};Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Z | italic_Y ).

The definitions for the remaining information atoms are then implied by Axiom 1 as follows.

Definition 2 (Redundant Information).

The Redundant Information from X𝑋Xitalic_X and Y𝑌Yitalic_Y to Z𝑍Zitalic_Z is defined as:

Red(X,YZ)=I(X;Z)Un(XZ|Y).Red𝑋𝑌𝑍𝐼𝑋𝑍Un𝑋conditional𝑍𝑌\displaystyle\operatorname{Red}(X,Y\to Z)=I(X;Z)-\operatorname{Un}(X\to Z|Y).roman_Red ( italic_X , italic_Y → italic_Z ) = italic_I ( italic_X ; italic_Z ) - roman_Un ( italic_X → italic_Z | italic_Y ) .
Definition 3 (Synergistic Information).

The synergistic information from X𝑋Xitalic_X and Y𝑌Yitalic_Y to Z𝑍Zitalic_Z is defined as:

Syn(X,YZ)=I(X;Z|Y)Un(XZ|Y).Syn𝑋𝑌𝑍𝐼𝑋conditional𝑍𝑌Un𝑋conditional𝑍𝑌\displaystyle\operatorname{Syn}(X,Y\to Z)=I(X;Z|Y)-\operatorname{Un}(X\to Z|Y).roman_Syn ( italic_X , italic_Y → italic_Z ) = italic_I ( italic_X ; italic_Z | italic_Y ) - roman_Un ( italic_X → italic_Z | italic_Y ) .

It should be noted that Definition 2 and Definition 3 strictly depend on the order of the source variables; the commutativity of RedRed\operatorname{Red}roman_Red (Axiom 2) will be addressed in the sequel, and the commutativity of SynSyn\operatorname{Syn}roman_Syn follows from Axiom 1 and Axiom 2 as mentioned earlier.

III-B Intuitive Explanation of Definition 1.

Our definition of unique information UnUn\operatorname{Un}roman_Un is derived from a newly defined do-operation.

Definition 4 (Do-operation).

Given 𝒟X,Zsubscript𝒟𝑋𝑍\mathcal{D}_{X,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT and 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT such that the support of 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT is contained in the support of 𝒟Zsubscript𝒟𝑍\mathcal{D}_{Z}caligraphic_D start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT, let do(𝒟X,Z|𝒟C)=𝒟A,C𝑑𝑜conditionalsubscript𝒟𝑋𝑍subscript𝒟𝐶subscript𝒟𝐴𝐶do(\mathcal{D}_{X,Z}|\mathcal{D}_{C})=\mathcal{D}_{A,C}italic_d italic_o ( caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) = caligraphic_D start_POSTSUBSCRIPT italic_A , italic_C end_POSTSUBSCRIPT, where

Pr(A=x,C=z)=Pr(X=x|Z=z)Pr(C=z)Pr𝐴𝑥𝐶𝑧Pr𝑋conditional𝑥𝑍𝑧Pr𝐶𝑧\displaystyle\Pr(A=x,C=z)=\Pr(X=x|Z=z)\Pr(C=z)roman_Pr ( italic_A = italic_x , italic_C = italic_z ) = roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_C = italic_z ) (5)

for all x,z𝒳×𝒵𝑥𝑧𝒳𝒵x,z\in\mathcal{X}\times\mathcal{Z}italic_x , italic_z ∈ caligraphic_X × caligraphic_Z.

In Lemma 8, which is given and proved in Appendix 8, it is shown that 𝒟A,Csubscript𝒟𝐴𝐶\mathcal{D}_{A,C}caligraphic_D start_POSTSUBSCRIPT italic_A , italic_C end_POSTSUBSCRIPT in Definition 4 is well-defined in the sense that the right marginal of 𝒟A,Csubscript𝒟𝐴𝐶\mathcal{D}_{A,C}caligraphic_D start_POSTSUBSCRIPT italic_A , italic_C end_POSTSUBSCRIPT is identical to 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. Therefore, there is no ambiguity in referring to both the input distribution and the right marginal of the output distribution by the same letter C𝐶Citalic_C. This operation receives 𝒟X,Zsubscript𝒟𝑋𝑍\mathcal{D}_{X,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT and 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, and outputs a joint distribution 𝒟A,Csubscript𝒟𝐴𝐶\mathcal{D}_{A,C}caligraphic_D start_POSTSUBSCRIPT italic_A , italic_C end_POSTSUBSCRIPT whose right marginal is 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, and 𝒟A|C=z=𝒟X|Z=zsubscript𝒟conditional𝐴𝐶𝑧subscript𝒟conditional𝑋𝑍𝑧\mathcal{D}_{A|C=z}=\mathcal{D}_{X|Z=z}caligraphic_D start_POSTSUBSCRIPT italic_A | italic_C = italic_z end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_X | italic_Z = italic_z end_POSTSUBSCRIPT for all z𝒵𝑧𝒵z\in\mathcal{Z}italic_z ∈ caligraphic_Z. Using the do-operation, UnUn\operatorname{Un}roman_Un can be defined equivalently as follows:

Definition 5 (Unique Information, equivalent definition).

For y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y let Cysubscript𝐶𝑦C_{y}italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT be a random variable with distribution 𝒟Cy=𝒟Z|Y=ysubscript𝒟subscript𝐶𝑦subscript𝒟conditional𝑍𝑌𝑦\mathcal{D}_{C_{y}}=\mathcal{D}_{Z|Y=y}caligraphic_D start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_Z | italic_Y = italic_y end_POSTSUBSCRIPT, and let 𝒟Ay,Cy=do(𝒟X,Z|𝒟Cy)subscript𝒟subscript𝐴𝑦subscript𝐶𝑦𝑑𝑜conditionalsubscript𝒟𝑋𝑍subscript𝒟subscript𝐶𝑦\mathcal{D}_{A_{y},C_{y}}=do(\mathcal{D}_{X,Z}|\mathcal{D}_{C_{y}})caligraphic_D start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_d italic_o ( caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). The unique information from X𝑋Xitalic_X to Z𝑍Zitalic_Z given Y𝑌Yitalic_Y is defined as:

Un(XZ|Y)Un𝑋conditional𝑍𝑌\displaystyle\operatorname{Un}(X\to Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) =y𝒴Pr(Y=y)I(Ay;Cy).absentsubscript𝑦𝒴Pr𝑌𝑦𝐼subscript𝐴𝑦subscript𝐶𝑦\displaystyle=\sum_{y\in\mathcal{Y}}\Pr(Y=y)I(A_{y};C_{y}).= ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT roman_Pr ( italic_Y = italic_y ) italic_I ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ; italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) .

The proof of equivalence is simple, and is given in Appendix -B. The idea behind Definition 5 is that by setting C=(Z|Y=y)𝐶conditional𝑍𝑌𝑦C=(Z|Y=y)italic_C = ( italic_Z | italic_Y = italic_y ) for some y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y in Definition 4, and then by averaging the resulting mutual information values over all y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y, we eliminate the effect of Y𝑌Yitalic_Y from the directional dependence between X𝑋Xitalic_X and Z𝑍Zitalic_Z, without changing the directional dependence itself. A similar idea exists in Judea Pearl’s do-calculus [20] (also [21]), where a node in a Bayesian network is set to a certain value while removing all incoming dependencies to that node, thereby distilling the causal relationship between that value and the remainder of the network.

III-C Satisfaction of axioms and properties

To show that our definition satisfies the axioms and properties mentioned in Section II, we require the following technical lemma. The proof is given in Appendix -C.

Lemma 1.

Following the notations of Definition 1, we have that H(X|Z)=H(X|Z)=H(X|Z,Y)𝐻conditional𝑋𝑍𝐻conditionalsuperscript𝑋𝑍𝐻conditionalsuperscript𝑋𝑍𝑌H(X|Z)=H(X^{\prime}|Z)=H(X^{\prime}|Z,Y)italic_H ( italic_X | italic_Z ) = italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z ) = italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z , italic_Y ), and that H(X)=H(X)𝐻superscript𝑋𝐻𝑋H(X^{\prime})=H(X)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H ( italic_X ).

Corollary 1.

Unique information (Def. 1) can be written as:

Un(XZ|Y)Un𝑋conditional𝑍𝑌\displaystyle\operatorname{Un}(X\to Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) =I(X;Z|Y)=X(X|Y)H(X|Z,Y)absent𝐼superscript𝑋conditional𝑍𝑌𝑋conditionalsuperscript𝑋𝑌𝐻conditionalsuperscript𝑋𝑍𝑌\displaystyle=I(X^{\prime};Z|Y)=X(X^{\prime}|Y)-H(X^{\prime}|Z,Y)= italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Z | italic_Y ) = italic_X ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) - italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z , italic_Y )
=H(X|Y)H(X|Z).absent𝐻conditionalsuperscript𝑋𝑌𝐻conditional𝑋𝑍\displaystyle=H(X^{\prime}|Y)-H(X|Z).= italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) - italic_H ( italic_X | italic_Z ) .
Corollary 2.

By Lemma 1, and since conditioning reduces entropy, we have that H(X|Y)H(X)=H(X)𝐻conditionalsuperscript𝑋𝑌𝐻superscript𝑋𝐻𝑋H(X^{\prime}|Y)\leq H(X^{\prime})=H(X)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) ≤ italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H ( italic_X ).

Based on the above lemmas and corollaries, we are in a position to prove that our definition of UnUn\operatorname{Un}roman_Un satisfies the required axioms.

III-C1 Proof of Axiom 1, Information atoms relationship

Follows immediately from Definition 2 and Definition 3.

III-C2 Proof of Axiom 2, Commutativity

First, Definition 1 and Definition 2 provide the following equivalent way for computing RedRed\operatorname{Red}roman_Red, which is proved in Appendix -D.

Lemma 2.

Redundant information (Def. 2) can alternatively be written as Red(X,YZ)=I(X;Y)Red𝑋𝑌𝑍𝐼superscript𝑋𝑌\operatorname{Red}(X,Y\to Z)=I(X^{\prime};Y)roman_Red ( italic_X , italic_Y → italic_Z ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ).

Similarly, by switching between X𝑋Xitalic_X and Y𝑌Yitalic_Y in Definition 2 we have that Red(Y,XZ)=I(Y;Z)Un(YZ|X)Red𝑌𝑋𝑍𝐼𝑌𝑍Un𝑌conditional𝑍𝑋\operatorname{Red}(Y,X\to Z)=I(Y;Z)-\operatorname{Un}(Y\to Z|X)roman_Red ( italic_Y , italic_X → italic_Z ) = italic_I ( italic_Y ; italic_Z ) - roman_Un ( italic_Y → italic_Z | italic_X ); based on Lemma 2, this equals to I(X;Y)𝐼𝑋superscript𝑌I(X;Y^{\prime})italic_I ( italic_X ; italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), where Ysuperscript𝑌Y^{\prime}italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is defined analogously to Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in (III-A), i.e.,

Pr(Y=y,Z=z|X=x)Prsuperscript𝑌𝑦𝑍conditional𝑧𝑋𝑥\displaystyle\Pr(Y^{\prime}=y,Z=z|X=x)roman_Pr ( italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_y , italic_Z = italic_z | italic_X = italic_x )
=Pr(Y=y|Z=z)Pr(Z=z|X=x).absentPr𝑌conditional𝑦𝑍𝑧Pr𝑍conditional𝑧𝑋𝑥\displaystyle=\Pr(Y=y|Z=z)\Pr(Z=z|X=x).= roman_Pr ( italic_Y = italic_y | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z | italic_X = italic_x ) . (6)

Then, we can conclude the commutativity of redundant information through the following lemma, which is proved in Appendix -E.

Lemma 3 (Commutativity of Redundant Information).

For Xsuperscript𝑋X^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and Ysuperscript𝑌Y^{\prime}italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as above, we have that I(X;Y)=I(X;Y)𝐼superscript𝑋𝑌𝐼𝑋superscript𝑌I(X^{\prime};Y)=I(X;Y^{\prime})italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ) = italic_I ( italic_X ; italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).

Combining Lemma 2 and Lemma 3 readily implies the commutativity of RedRed\operatorname{Red}roman_Red, i.e., Red(X,YZ)=Red(Y,XZ).Red𝑋𝑌𝑍Red𝑌𝑋𝑍\operatorname{Red}(X,Y\to Z)=\operatorname{Red}(Y,X\to Z).roman_Red ( italic_X , italic_Y → italic_Z ) = roman_Red ( italic_Y , italic_X → italic_Z ) .

III-C3 Proof of Axiom 3, Monotonicity and self-redundancy

According to Definition 1, UnUn\operatorname{Un}roman_Un is nonnegative as conditional mutual information. Therefore, Red(X,YZ)I(X;Z)Red𝑋𝑌𝑍𝐼𝑋𝑍\operatorname{Red}(X,Y\to Z)\leq I(X;Z)roman_Red ( italic_X , italic_Y → italic_Z ) ≤ italic_I ( italic_X ; italic_Z ) by Definition 2. Similarly, by Definition 2 we have Red(Y,XZ)=I(Y;Z)Un(YZ|X)Red𝑌𝑋𝑍𝐼𝑌𝑍Un𝑌conditional𝑍𝑋\operatorname{Red}(Y,X\to Z)=I(Y;Z)-\operatorname{Un}(Y\to Z|X)roman_Red ( italic_Y , italic_X → italic_Z ) = italic_I ( italic_Y ; italic_Z ) - roman_Un ( italic_Y → italic_Z | italic_X ), and hence Red(Y,XZ)I(Y;Z)Red𝑌𝑋𝑍𝐼𝑌𝑍\operatorname{Red}(Y,X\to Z)\leq I(Y;Z)roman_Red ( italic_Y , italic_X → italic_Z ) ≤ italic_I ( italic_Y ; italic_Z ). Since RedRed\operatorname{Red}roman_Red is symmetric by Axiom 2, it follows that

Red(X,YZ)min{I(X;Z),I(Y;Z)}.Red𝑋𝑌𝑍𝐼𝑋𝑍𝐼𝑌𝑍\displaystyle\operatorname{Red}(X,Y\to Z)\leq\min\{I(X;Z),I(Y;Z)\}.roman_Red ( italic_X , italic_Y → italic_Z ) ≤ roman_min { italic_I ( italic_X ; italic_Z ) , italic_I ( italic_Y ; italic_Z ) } .

III-C4 Proof of Axiom 4, Nonnegativity

We begin by showing that RedRed\operatorname{Red}roman_Red is nonnegative, for which we require the following lemma, proved in Appendix -F.

Lemma 4.

Unique information (Def. 1) is bounded from above by mutual information, i.e.,

Un(XZ|Y)I(X;Z).Un𝑋conditional𝑍𝑌𝐼𝑋𝑍\displaystyle\operatorname{Un}(X\to Z|Y)\leq I(X;Z).roman_Un ( italic_X → italic_Z | italic_Y ) ≤ italic_I ( italic_X ; italic_Z ) .

Then, nonnegativity of RedRed\operatorname{Red}roman_Red follows from Lemma 4.

Corollary 3 (Nonnegativity of Redundant Information).

Redundant information (Def. 2) is nonnegative, i.e.,

Red(X,YZ)0.Red𝑋𝑌𝑍0\displaystyle\operatorname{Red}(X,Y\to Z)\geq 0.roman_Red ( italic_X , italic_Y → italic_Z ) ≥ 0 .

Finally, we remark that even though it is not a required axiom, non-negativity of SynSyn\operatorname{Syn}roman_Syn can be proved under an additional assumption as follows.

Remark 1 (Nonnegativity of Synergistic Information).

Suppose H(Z|X,Y)=0𝐻conditional𝑍𝑋𝑌0H(Z|X,Y)=0italic_H ( italic_Z | italic_X , italic_Y ) = 0 (closed system assumption). Since Un(XZ|Y)=I(X;Z|Y)H(Z|Y)Un𝑋conditional𝑍𝑌𝐼superscript𝑋conditional𝑍𝑌𝐻conditional𝑍𝑌\operatorname{Un}(X\to Z|Y)=I(X^{\prime};Z|Y)\leq H(Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Z | italic_Y ) ≤ italic_H ( italic_Z | italic_Y ), it follows from Definition 3 that Syn(X,YZ)0Syn𝑋𝑌𝑍0\operatorname{Syn}(X,Y\to Z)\geq 0roman_Syn ( italic_X , italic_Y → italic_Z ) ≥ 0.

III-C5 Proof of Property 1, Additivity

The following lemma is proved in Appendix -G.

Lemma 5 (Additivity of Unique Information).

For two independent sets of variables X,Y,Z𝑋𝑌𝑍X,Y,Zitalic_X , italic_Y , italic_Z and X¯,Y¯,Z¯¯𝑋¯𝑌¯𝑍\bar{X},\bar{Y},\bar{Z}over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG , over¯ start_ARG italic_Z end_ARG, unique information (Def. 1) is additive:

Un((X,X¯)(Z,Z¯)|(Y,Y¯))Un𝑋¯𝑋conditional𝑍¯𝑍𝑌¯𝑌\displaystyle\operatorname{Un}((X,\bar{X})\to(Z,\bar{Z})|(Y,\bar{Y}))roman_Un ( ( italic_X , over¯ start_ARG italic_X end_ARG ) → ( italic_Z , over¯ start_ARG italic_Z end_ARG ) | ( italic_Y , over¯ start_ARG italic_Y end_ARG ) )
=Un(XZ|Y)+Un(X¯Z¯|Y¯)absentUn𝑋conditional𝑍𝑌Un¯𝑋conditional¯𝑍¯𝑌\displaystyle=\operatorname{Un}(X\to Z|Y)+\operatorname{Un}(\bar{X}\to\bar{Z}|% \bar{Y})= roman_Un ( italic_X → italic_Z | italic_Y ) + roman_Un ( over¯ start_ARG italic_X end_ARG → over¯ start_ARG italic_Z end_ARG | over¯ start_ARG italic_Y end_ARG ) (7)

Since mutual information and conditional entropy are additive in the above sense, by Definition 2 and Definition 3, alongside Lemma 5, RedRed\operatorname{Red}roman_Red and SynSyn\operatorname{Syn}roman_Syn are additive as well.

III-C6 Proof of Property 2, Continuity

We begin by showing that RedRed\operatorname{Red}roman_Red is continuous, for which we require the following lemma proved in Appendix -H.

Lemma 6 (Continuity of Redundant Information).

The redundant information (Def. 2) is a continuous function of the input distribution 𝒟X,Y,Zsubscript𝒟𝑋𝑌𝑍\mathcal{D}_{X,Y,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y , italic_Z end_POSTSUBSCRIPT to \mathbb{R}blackboard_R.

By Definition 2 and Definition 3, the continuity of UnUn\operatorname{Un}roman_Un and SynSyn\operatorname{Syn}roman_Syn can also be derived.

III-C7 Proof of Property 3, Independent Identity

The following lemma is proved in Appendix -I.

Lemma 7.

The operator RedRed\operatorname{Red}roman_Red satisfies Property 3.

IV Discussion

In this paper, we proposed an explicit operational formula for PID, which is distinct from any existing approach, and proved that it satisfies all axioms and properties. In this section we provide further intuitive explanation for our approach.

First, we wish to elucidate the role that our do-operation plays in the definition of UnUn\operatorname{Un}roman_Un (Definition 5). In a sense, the do-operation can be understood as adjusting the marginal distribution of the Z𝑍Zitalic_Z variable of 𝒟X,Zsubscript𝒟𝑋𝑍\mathcal{D}_{X,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT, while impacting its connections with other variables as little as possible. This understanding can be confirmed by Lemma 1, which shows that the expected value of the conditional entropy after the do-operation retains its original value. This resembles the invariance implied in Shannon’s communication model [1], where the conditional entropy of the output given the input is not affected by the input distribution. From this perspective, Z𝑍Zitalic_Z and X𝑋Xitalic_X can be regarded as the input and output of the channel, respectively, that indicates their “relationship.” The do-operation changes the distribution of the input Z𝑍Zitalic_Z, but does not change the channel’s characteristic (i.e., H(X|Z)𝐻conditional𝑋𝑍H(X|Z)italic_H ( italic_X | italic_Z )).

Based on this, Definition 5 realizes the intuition that unique information should represent the relationship between source variable and target variable given the other source variable. So, we use the do-operation to control the marginal distribution of the target variable Z𝑍Zitalic_Z to its conditional distribution given the value y𝑦yitalic_y of some source variable Y𝑌Yitalic_Y, and then use the expectation of mutual information yPr(Y=y)I(Ay;Cy)subscript𝑦Pr𝑌𝑦𝐼subscript𝐴𝑦subscript𝐶𝑦\sum_{y}\Pr(Y=y)I(A_{y};C_{y})∑ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT roman_Pr ( italic_Y = italic_y ) italic_I ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ; italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) to capture the “connection” between the specific source variable X𝑋Xitalic_X and target variable Z𝑍Zitalic_Z given Y𝑌Yitalic_Y after the do-operation.

The reason this method can partition I(X;Z|Y)𝐼𝑋conditional𝑍𝑌I(X;Z|Y)italic_I ( italic_X ; italic_Z | italic_Y ) to SynSyn\operatorname{Syn}roman_Syn and UnUn\operatorname{Un}roman_Un (Def. 3), is that the do-operation eliminates high-order relations between Y𝑌Yitalic_Y and X,Z𝑋𝑍X,Zitalic_X , italic_Z, i.e. SynSyn\operatorname{Syn}roman_Syn. Specifically, conditional mutual information relies on the joint conditional probability 𝒟(X,Z)|ysubscript𝒟conditional𝑋𝑍𝑦\mathcal{D}_{(X,Z)|y}caligraphic_D start_POSTSUBSCRIPT ( italic_X , italic_Z ) | italic_y end_POSTSUBSCRIPT, in expectation over all y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y. This distribution includes both the conditional influence of Y𝑌Yitalic_Y on X,Z𝑋𝑍X,Zitalic_X , italic_Z, but also has a simultaneous influence on the relationship between X𝑋Xitalic_X and Z𝑍Zitalic_Z.

However, Definition 5 of unique information retains the relationship between X𝑋Xitalic_X and Z𝑍Zitalic_Z without influence from Y𝑌Yitalic_Y by using the conditional probability 𝒟Z|ysubscript𝒟conditional𝑍𝑦\mathcal{D}_{Z|y}caligraphic_D start_POSTSUBSCRIPT italic_Z | italic_y end_POSTSUBSCRIPT, in expectation over all y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y, to perform the do-operation, which only reflects the conditional influence of Y𝑌Yitalic_Y on Z𝑍Zitalic_Z. Therefore, the expectation of mutual information yPr(Y=y)I(Ay;Cy)subscript𝑦Pr𝑌𝑦𝐼subscript𝐴𝑦subscript𝐶𝑦\sum_{y}\Pr(Y=y)I(A_{y};C_{y})∑ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT roman_Pr ( italic_Y = italic_y ) italic_I ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ; italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) can accurately quantify the unique information, which represents the pure conditional mutual relationship.

In addition to the above analysis of do-operations in unique information, Lemma 2 also brings another perspective worth discussing. Redundant information can be understood as the mutual information I(X;Y)𝐼superscript𝑋𝑌I(X^{\prime};Y)italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ) (or I(X;Y)𝐼𝑋superscript𝑌I(X;Y^{\prime})italic_I ( italic_X ; italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )) obtained by changing the joint probability distribution 𝒟X,Ysubscript𝒟𝑋𝑌\mathcal{D}_{X,Y}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y end_POSTSUBSCRIPT according to 𝒟X,Y,Zsubscript𝒟𝑋𝑌𝑍\mathcal{D}_{X,Y,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y , italic_Z end_POSTSUBSCRIPT without changing the marginal distribution 𝒟Xsubscript𝒟𝑋\mathcal{D}_{X}caligraphic_D start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT (or 𝒟Ysubscript𝒟𝑌\mathcal{D}_{Y}caligraphic_D start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT) according to Lemma 1 (H(X)=H(X)𝐻superscript𝑋𝐻𝑋H(X^{\prime})=H(X)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H ( italic_X )).

As mentioned earlier, our definition of SynSyn\operatorname{Syn}roman_Syn might be negative, unless the system is closed (i.e., H(Z|X,Y)=0𝐻conditional𝑍𝑋𝑌0H(Z|X,Y)=0italic_H ( italic_Z | italic_X , italic_Y ) = 0, Remark 1). While UnUn\operatorname{Un}roman_Un and RedRed\operatorname{Red}roman_Red represent the information shared by one or two source variables with the target variable, SynSyn\operatorname{Syn}roman_Syn represents the information provided to the target variable by the “cooperation” of source variables. It is an accepted aphorism that cooperation does not necessarily increase outcome, and hence it might be the case that negative values of SynSyn\operatorname{Syn}roman_Syn conform with intuition. However, the reason why this explanation is no longer necessary in a closed system, as well as alternative interpretations of SynSyn\operatorname{Syn}roman_Syn that are nonnegative, remain to be studied.

Acknowledgments. The authors would like to thank an anonymous reviewer whose suggestions greatly simplified the paper. This research was supported by AFOSR grants FA9550-22-1-0054 and FA9550-23-1-0208.

References

  • [1] Claude Elwood Shannon. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1):3–55, 2001.
  • [2] Satosi Watanabe. Information theoretical analysis of multivariate correlation. IBM Journal of research and development, 4(1):66–82, 1960.
  • [3] Paul L Williams and Randall D Beer. Nonnegative decomposition of multivariate information. arXiv preprint arXiv:1004.2515, 2010.
  • [4] Robin AA Ince. The partial entropy decomposition: Decomposing multivariate entropy and mutual information via pointwise common surprisal. arXiv preprint arXiv:1702.01591, 2017.
  • [5] Pedro AM Mediano, Fernando Rosas, Robin L Carhart-Harris, Anil K Seth, and Adam B Barrett. Beyond integrated information: A taxonomy of information dynamics phenomena. arXiv preprint arXiv:1909.02297, 2019.
  • [6] Aobo Lyu, Bing Yuan, Ou Deng, Mingzhe Yang, Andrew Clark, and Jiang Zhang. System information decomposition. arXiv preprint arXiv:2306.08288, 2023.
  • [7] Thomas F Varley. Generalized decomposition of multivariate information. arXiv preprint arXiv:2309.08003, 2023.
  • [8] Virgil Griffith, Edwin KP Chong, Ryan G James, Christopher J Ellison, and James P Crutchfield. Intersection information based on common randomness. Entropy, 16(4):1985–2000, 2014.
  • [9] Robin AA Ince. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy, 19(7):318, 2017.
  • [10] Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, and Jürgen Jost. Shared information—new insights and problems in decomposing information in complex systems. In Proceedings of the European conference on complex systems 2012, pages 251–269. Springer, 2013.
  • [11] Malte Harder, Christoph Salge, and Daniel Polani. Bivariate measure of redundant information. Physical Review E, 87(1):012130, 2013.
  • [12] Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, Jürgen Jost, and Nihat Ay. Quantifying unique information. Entropy, 16(4):2161–2183, 2014.
  • [13] Pradeep Kr Banerjee, Johannes Rauh, and Guido Montúfar. Computing the unique information. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 141–145. IEEE, 2018.
  • [14] Elad Schneidman, William Bialek, and Michael J Berry. Synergy, redundancy, and independence in population codes. Journal of Neuroscience, 23(37):11539–11553, 2003.
  • [15] Thomas F Varley, Maria Pope, Maria Grazia, Joshua, and Olaf Sporns. Partial entropy decomposition reveals higher-order information structures in human brain activity. Proceedings of the National Academy of Sciences, 120(30):e2300888120, 2023.
  • [16] Borzoo Rassouli, Fernando E Rosas, and Deniz Gündüz. Data disclosure under perfect sample privacy. IEEE Transactions on Information Forensics and Security, 15:2012–2025, 2019.
  • [17] Faisal Hamman and Sanghamitra Dutta. Demystifying local and global fairness trade-offs in federated learning using partial information decomposition. arXiv preprint arXiv:2307.11333, 2023.
  • [18] Fernando E Rosas, Pedro AM Mediano, Henrik J Jensen, Anil K Seth, Adam B Barrett, Robin L Carhart-Harris, and Daniel Bor. Reconciling emergences: An information-theoretic approach to identify causal emergence in multivariate data. PLoS computational biology, 16(12):e1008289, 2020.
  • [19] Judea Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–688, 1995.
  • [20] Judea Pearl. Causality. Cambridge University Press, 2009.
  • [21] Erik P Hoel, Larissa Albantakis, and Giulio Tononi. Quantifying causal emergence shows that macro can beat micro. Proceedings of the National Academy of Sciences, 110(49):19790–19795, 2013.
  • [22] Paul L Williams. Information dynamics: Its theory and application to embodied cognitive systems. PhD thesis, Indiana University, 2011.
  • [23] Virgil Griffith and Christof Koch. Quantifying synergistic mutual information. In Guided self-organization: inception, pages 159–190. Springer, 2014.
  • [24] Imre Csiszár. Axiomatic characterizations of information measures. Entropy, 10(3):261–273, 2008.
  • [25] Joseph T Lizier, Benjamin Flecker, and Paul L Williams. Towards a synergy-based approach to measuring information modification. In 2013 IEEE Symposium on Artificial Life (ALIFE), pages 43–51. IEEE, 2013.
  • [26] Johannes Rauh, Pradeep Kr Banerjee, Eckehard Olbrich, Guido Montúfar, and Jürgen Jost. Continuity and additivity properties of information decompositions. International Journal of Approximate Reasoning, 161:108979, 2023.
  • [27] Conor Finn and Joseph T Lizier. Pointwise partial information decompositionusing the specificity and ambiguity lattices. Entropy, 20(4):297, 2018.
  • [28] Virgil Griffith and Tracey Ho. Quantifying redundant information in predicting a target random variable. Entropy, 17(7):4644–4653, 2015.
  • [29] Artemy Kolchinsky. A novel approach to the partial information decomposition. Entropy, 24(3):403, 2022.

-A Proof of the completeness of Definition 4.

In this part, we will show that do-operation’s output is a probability distribution with the same marginal distribution as its input.

Lemma 8.

For 𝒟X,Zsubscript𝒟𝑋𝑍\mathcal{D}_{X,Z}caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT and 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT as in Definition 4, the output Pr(A,C=x,z)Pr𝐴𝐶𝑥𝑧\Pr(A,C=x,z)roman_Pr ( italic_A , italic_C = italic_x , italic_z ) of (5) describes a probability distribution, i.e.,

0Pr(A=x,C=z)0Pr𝐴𝑥𝐶𝑧\displaystyle 0\leq\Pr(A=x,C=z)0 ≤ roman_Pr ( italic_A = italic_x , italic_C = italic_z ) 1, andabsent1 and\displaystyle\leq 1,\mbox{ and}≤ 1 , and
x,z𝒳×𝒵Pr(A=x,C=z)subscript𝑥𝑧𝒳𝒵Pr𝐴𝑥𝐶𝑧\displaystyle\sum_{x,z\in\mathcal{X}\times\mathcal{Z}}\Pr(A=x,C=z)∑ start_POSTSUBSCRIPT italic_x , italic_z ∈ caligraphic_X × caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_A = italic_x , italic_C = italic_z ) =1.absent1\displaystyle=1.= 1 .

Furthermore, the marginal distribution 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT of the output 𝒟A,Csubscript𝒟𝐴𝐶\mathcal{D}_{A,C}caligraphic_D start_POSTSUBSCRIPT italic_A , italic_C end_POSTSUBSCRIPT is equal to the input (call it 𝒟Csubscript𝒟superscript𝐶\mathcal{D}_{C^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT in this section), i.e.,.

x𝒳Pr(A=x,C=z)=Pr(C=z).subscript𝑥𝒳Pr𝐴𝑥𝐶𝑧Prsuperscript𝐶𝑧\displaystyle\sum_{x\in\mathcal{X}}\Pr(A=x,C=z)=\Pr(C^{\prime}=z).∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT roman_Pr ( italic_A = italic_x , italic_C = italic_z ) = roman_Pr ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z ) .
Proof.

We begin by showing 0Pr(A,C=(x,z))10Pr𝐴𝐶𝑥𝑧10\leq\Pr(A,C=(x,z))\leq 10 ≤ roman_Pr ( italic_A , italic_C = ( italic_x , italic_z ) ) ≤ 1. By Definition 4,

Pr(A=x,C=z)Pr𝐴𝑥𝐶𝑧\displaystyle\Pr(A=x,C=z)roman_Pr ( italic_A = italic_x , italic_C = italic_z )
=Pr(X=x|Z=z)Pr(C=z).absentPr𝑋conditional𝑥𝑍𝑧Prsuperscript𝐶𝑧\displaystyle=\Pr(X=x|Z=z)\Pr(C^{\prime}=z).= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z ) . (8)

Since both terms in (-A) are between 00 and 1111, so is Pr(A=x,C=z)Pr𝐴𝑥𝐶𝑧\Pr(A=x,C=z)roman_Pr ( italic_A = italic_x , italic_C = italic_z ).

We continue by showing that

x,zPr(A=x,C=z)subscript𝑥𝑧Pr𝐴𝑥𝐶𝑧\displaystyle\sum_{x,z}\Pr(A=x,C=z)∑ start_POSTSUBSCRIPT italic_x , italic_z end_POSTSUBSCRIPT roman_Pr ( italic_A = italic_x , italic_C = italic_z )
=Def. 4x,zPr(X=x|Z=z)Pr(C=z)Def. 4subscript𝑥𝑧Pr𝑋conditional𝑥𝑍𝑧Prsuperscript𝐶𝑧\displaystyle\overset{\text{Def.~{}\ref{def:do operation}}}{=}\sum_{x,z}\Pr(X=% x|Z=z)\Pr(C^{\prime}=z)overDef. start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_x , italic_z end_POSTSUBSCRIPT roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z )
=z𝒵Pr(C=z)x𝒳Pr(X=x|Z=z)absentsubscript𝑧𝒵Prsuperscript𝐶𝑧subscript𝑥𝒳Pr𝑋conditional𝑥𝑍𝑧\displaystyle=\sum_{z\in\mathcal{Z}}\Pr(C^{\prime}=z)\sum_{x\in\mathcal{X}}\Pr% (X=x|Z=z)= ∑ start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z ) ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT roman_Pr ( italic_X = italic_x | italic_Z = italic_z )
=z𝒵Pr(C=z)=1.absentsubscript𝑧𝒵Prsuperscript𝐶𝑧1\displaystyle=\sum_{z\in\mathcal{Z}}\Pr(C^{\prime}=z)=1.= ∑ start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z ) = 1 . (9)

In (-A), we can also conclude that x𝒳Pr(A=x,C=z)=Pr(C=z)subscript𝑥𝒳Pr𝐴𝑥𝐶𝑧Prsuperscript𝐶𝑧\sum_{x\in\mathcal{X}}\Pr(A=x,C=z)=\Pr(C^{\prime}=z)∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT roman_Pr ( italic_A = italic_x , italic_C = italic_z ) = roman_Pr ( italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_z ), which implies that the input 𝒟Csubscript𝒟superscript𝐶\mathcal{D}_{C^{\prime}}caligraphic_D start_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is equal to the marginal distribution 𝒟Csubscript𝒟𝐶\mathcal{D}_{C}caligraphic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT of the output 𝒟A,Csubscript𝒟𝐴𝐶\mathcal{D}_{A,C}caligraphic_D start_POSTSUBSCRIPT italic_A , italic_C end_POSTSUBSCRIPT. ∎

-B Proof of the equivalence of Definitions 1 and 5.

Lemma 9.

Definition 1 and Definition 5 are equivalent.

Proof.

For the purpose of the proof, let

Un(XZ|Y)Un𝑋conditional𝑍𝑌\displaystyle\operatorname{Un}(X\to Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) =y𝒴Pr(Y=y)I(Ay;Cy), andabsentsubscript𝑦𝒴Pr𝑌𝑦𝐼subscript𝐴𝑦subscript𝐶𝑦 and\displaystyle=\sum_{y\in\mathcal{Y}}\Pr(Y=y)I(A_{y};C_{y}),\text{ and}= ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT roman_Pr ( italic_Y = italic_y ) italic_I ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ; italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ) , and (10)
Un(XZ|Y)superscriptUn𝑋conditional𝑍𝑌\displaystyle\operatorname{Un}^{\prime}(X\to Z|Y)roman_Un start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X → italic_Z | italic_Y ) =I(X;Z|Y).absent𝐼superscript𝑋conditional𝑍𝑌\displaystyle=I(X^{\prime};Z|Y).= italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Z | italic_Y ) . (11)

where 𝒟Cy=𝒟Z|Y=ysubscript𝒟subscript𝐶𝑦subscript𝒟conditional𝑍𝑌𝑦\mathcal{D}_{C_{y}}=\mathcal{D}_{Z|Y=y}caligraphic_D start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_Z | italic_Y = italic_y end_POSTSUBSCRIPT. By Definition 4 we have 𝒟Ay,Cy=do(𝒟X,Z|𝒟Cy)subscript𝒟subscript𝐴𝑦subscript𝐶𝑦𝑑𝑜conditionalsubscript𝒟𝑋𝑍subscript𝒟subscript𝐶𝑦\mathcal{D}_{A_{y},C_{y}}=do(\mathcal{D}_{X,Z}|\mathcal{D}_{C_{y}})caligraphic_D start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_d italic_o ( caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Z end_POSTSUBSCRIPT | caligraphic_D start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), i.e.,

Pr(Ay=x,Cy=z)Prsubscript𝐴𝑦𝑥subscript𝐶𝑦𝑧\displaystyle\Pr(A_{y}=x,C_{y}=z)roman_Pr ( italic_A start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = italic_x , italic_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT = italic_z ) =Pr(X=x|Z=z)Pr(Z=z|Y=y)absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍conditional𝑧𝑌𝑦\displaystyle=\Pr(X=x|Z=z)\Pr(Z=z|Y=y)= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z | italic_Y = italic_y )
=(III-A)Pr(X=x,Z=z|Y=y).italic-(III-Aitalic-)Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦\displaystyle\overset{\eqref{equation:X'Z}}{=}\Pr(X^{\prime}=x,Z=z|Y=y).start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) .

Therefore, we have

(10)italic-(10italic-)\displaystyle\eqref{equ:equavelence of definitions_1}italic_( italic_) =y𝒴Pr(Y=y)I(X,Z|Y=y)absentsubscript𝑦𝒴Pr𝑌𝑦𝐼superscript𝑋conditional𝑍𝑌𝑦\displaystyle=\sum_{y\in\mathcal{Y}}\Pr(Y=y)I(X^{\prime},Z|Y=y)= ∑ start_POSTSUBSCRIPT italic_y ∈ caligraphic_Y end_POSTSUBSCRIPT roman_Pr ( italic_Y = italic_y ) italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Z | italic_Y = italic_y )
=I(X,Z|Y)=Un(XZ|Y).absent𝐼superscript𝑋conditional𝑍𝑌superscriptUn𝑋conditional𝑍𝑌\displaystyle=I(X^{\prime},Z|Y)=\operatorname{Un}^{\prime}(X\to Z|Y).\qed= italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Z | italic_Y ) = roman_Un start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_X → italic_Z | italic_Y ) . italic_∎

-C Proof of Lemma 1.

Proof.

By Definition 1, for every x,y,z𝒳×𝒴×𝒵𝑥𝑦𝑧𝒳𝒴𝒵x,y,z\in\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}italic_x , italic_y , italic_z ∈ caligraphic_X × caligraphic_Y × caligraphic_Z, we have

Pr(X=x,Z=z|Y=y)=Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦absent\displaystyle\Pr(X^{\prime}=x,Z=z|Y=y)=roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) =
=Pr(X=x|Z=z)Pr(Z=z|Y=y),absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍conditional𝑧𝑌𝑦\displaystyle=\Pr(X=x|Z=z)\Pr(Z=z|Y=y),= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z | italic_Y = italic_y ) , (12)

and therefore

Pr(X=x,Z=z|Y=y)Pr(Z=z|Y=y)=Pr(X=x|Z=z),Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦Pr𝑍conditional𝑧𝑌𝑦Pr𝑋conditional𝑥𝑍𝑧\displaystyle\frac{\Pr(X^{\prime}=x,Z=z|Y=y)}{\Pr(Z=z|Y=y)}=\Pr(X=x|Z=z),divide start_ARG roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) end_ARG start_ARG roman_Pr ( italic_Z = italic_z | italic_Y = italic_y ) end_ARG = roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) ,

which can be simplified to

Pr(X=x|Z=z,Y=y)=Pr(X=x|Z=z).Prsuperscript𝑋conditional𝑥𝑍𝑧𝑌𝑦Pr𝑋conditional𝑥𝑍𝑧\displaystyle\Pr(X^{\prime}=x|Z=z,Y=y)=\Pr(X=x|Z=z).roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x | italic_Z = italic_z , italic_Y = italic_y ) = roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) .

Therefore, we have that H(X|Z=z,Y=y)=H(X|Z=z)𝐻formulae-sequenceconditionalsuperscript𝑋𝑍𝑧𝑌𝑦𝐻conditional𝑋𝑍𝑧H(X^{\prime}|Z=z,Y=y)=H(X|Z=z)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z = italic_z , italic_Y = italic_y ) = italic_H ( italic_X | italic_Z = italic_z ) for every y,z𝒴×𝒵𝑦𝑧𝒴𝒵y,z\in\mathcal{Y}\times\mathcal{Z}italic_y , italic_z ∈ caligraphic_Y × caligraphic_Z, which implies that H(X|Z,Y)=H(X|Z)𝐻conditionalsuperscript𝑋𝑍𝑌𝐻conditional𝑋𝑍H(X^{\prime}|Z,Y)=H(X|Z)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z , italic_Y ) = italic_H ( italic_X | italic_Z ).

Also, from (-C) we have

Pr(X=x,Z=z,Y=y)Prsuperscript𝑋𝑥𝑍𝑧𝑌𝑦\displaystyle\Pr(X^{\prime}=x,Z=z,Y=y)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z , italic_Y = italic_y )
=Pr(X=x|Z=z)Pr(Z=z,Y=y),absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍𝑧𝑌𝑦\displaystyle=\Pr(X=x|Z=z)\Pr(Z=z,Y=y),= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z , italic_Y = italic_y ) ,

and summation over all y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y yields

Pr(X=x,Z=z)Prsuperscript𝑋𝑥𝑍𝑧\displaystyle\Pr(X^{\prime}=x,Z=z)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z ) =Pr(X=x|Z=z)Pr(Z=z)absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍𝑧\displaystyle=\Pr(X=x|Z=z)\Pr(Z=z)= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z )
=Pr(X=x,Z=z),absentPr𝑋𝑥𝑍𝑧\displaystyle=\Pr(X=x,Z=z),= roman_Pr ( italic_X = italic_x , italic_Z = italic_z ) , (13)

which readily implies that H(X|Z=z)=H(X|Z=z)𝐻conditionalsuperscript𝑋𝑍𝑧𝐻conditional𝑋𝑍𝑧H(X^{\prime}|Z=z)=H(X|Z=z)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z = italic_z ) = italic_H ( italic_X | italic_Z = italic_z ) for every z𝒵𝑧𝒵z\in\mathcal{Z}italic_z ∈ caligraphic_Z, and hence H(X|Z)=H(X|Z)𝐻conditionalsuperscript𝑋𝑍𝐻conditional𝑋𝑍H(X^{\prime}|Z)=H(X|Z)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Z ) = italic_H ( italic_X | italic_Z ). Also, by (-C), we have

Pr(X=x)Prsuperscript𝑋𝑥\displaystyle\Pr(X^{\prime}=x)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x ) =z𝒵Pr(X=x,Z=z)absentsubscript𝑧𝒵Prsuperscript𝑋𝑥𝑍𝑧\displaystyle=\sum_{z\in\mathcal{Z}}\Pr(X^{\prime}=x,Z=z)= ∑ start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z )
=z𝒵Pr(X=x,Z=z)absentsubscript𝑧𝒵Pr𝑋𝑥𝑍𝑧\displaystyle=\sum_{z\in\mathcal{Z}}\Pr(X=x,Z=z)= ∑ start_POSTSUBSCRIPT italic_z ∈ caligraphic_Z end_POSTSUBSCRIPT roman_Pr ( italic_X = italic_x , italic_Z = italic_z )
=Pr(X=x),absentPr𝑋𝑥\displaystyle=\Pr(X=x),= roman_Pr ( italic_X = italic_x ) ,

which implies that H(X)=H(X)𝐻superscript𝑋𝐻𝑋H(X^{\prime})=H(X)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_H ( italic_X ). ∎

-D Proof of Lemma 2.

Proof.

By Definition 2, we have:

RedRed\displaystyle\operatorname{Red}roman_Red (X,YZ)=I(X,Z)Un(XZ|Y)𝑋𝑌𝑍𝐼𝑋𝑍Un𝑋conditional𝑍𝑌\displaystyle(X,Y\to Z)=I(X,Z)-\operatorname{Un}(X\to Z|Y)( italic_X , italic_Y → italic_Z ) = italic_I ( italic_X , italic_Z ) - roman_Un ( italic_X → italic_Z | italic_Y )
=(I(X,Z)+H(X|Z))(Un(XZ|Y)+H(X|Z))absent𝐼𝑋𝑍𝐻conditional𝑋𝑍Un𝑋conditional𝑍𝑌𝐻conditional𝑋𝑍\displaystyle=(I(X,Z)+H(X|Z))-(\operatorname{Un}(X\to Z|Y)+H(X|Z))= ( italic_I ( italic_X , italic_Z ) + italic_H ( italic_X | italic_Z ) ) - ( roman_Un ( italic_X → italic_Z | italic_Y ) + italic_H ( italic_X | italic_Z ) )
=H(X)(Un(XZ|Y)+H(X|Z))absent𝐻𝑋Un𝑋conditional𝑍𝑌𝐻conditional𝑋𝑍\displaystyle=H(X)-(\operatorname{Un}(X\to Z|Y)+H(X|Z))= italic_H ( italic_X ) - ( roman_Un ( italic_X → italic_Z | italic_Y ) + italic_H ( italic_X | italic_Z ) ) (14)

Since Corollary 1 states that

Un(XZ|Y)Un𝑋conditional𝑍𝑌\displaystyle\operatorname{Un}(X\to Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) =H(X|Y)H(X|Z),absent𝐻conditionalsuperscript𝑋𝑌𝐻conditional𝑋𝑍\displaystyle=H(X^{\prime}|Y)-H(X|Z),= italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) - italic_H ( italic_X | italic_Z ) ,

it follows that

(-D)italic-(-Ditalic-)\displaystyle\eqref{equ:Commutativity of Redundant Information_1}italic_( italic_) =H(X)H(X|Y)absent𝐻𝑋𝐻conditionalsuperscript𝑋𝑌\displaystyle=H(X)-H(X^{\prime}|Y)= italic_H ( italic_X ) - italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y )
=Lem.1H(X)H(X|Y)=I(X;Y).Lem.1𝐻superscript𝑋𝐻conditionalsuperscript𝑋𝑌𝐼superscript𝑋𝑌\displaystyle\overset{\text{Lem.}~{}\ref{le:invariant property of channel}}{=}% H(X^{\prime})-H(X^{\prime}|Y)=I(X^{\prime};Y).\qedstart_OVERACCENT Lem. end_OVERACCENT start_ARG = end_ARG italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ) . italic_∎

-E Proof of Lemma 3.

Proof.

To prove that I(X;Y)=I(X;Y)𝐼superscript𝑋𝑌𝐼𝑋superscript𝑌I(X^{\prime};Y)=I(X;Y^{\prime})italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ) = italic_I ( italic_X ; italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), it suffices to show that Pr(X=x,Y=y,Z=z)=Pr(X=x,Y=y,Z=z)Prsuperscript𝑋𝑥𝑌𝑦𝑍𝑧Pr𝑋𝑥superscript𝑌𝑦𝑍𝑧\Pr(X^{\prime}=x,Y=y,Z=z)=\Pr(X=x,Y^{\prime}=y,Z=z)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Y = italic_y , italic_Z = italic_z ) = roman_Pr ( italic_X = italic_x , italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_y , italic_Z = italic_z ) for all x,y,z𝒳×𝒴×𝒵𝑥𝑦𝑧𝒳𝒴𝒵x,y,z\in\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}italic_x , italic_y , italic_z ∈ caligraphic_X × caligraphic_Y × caligraphic_Z. We have

Pr(X=x,Y=y,Z=z)Prsuperscript𝑋𝑥𝑌𝑦𝑍𝑧\displaystyle\Pr(X^{\prime}=x,Y=y,Z=z)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Y = italic_y , italic_Z = italic_z )
=Pr(X=x,Z=z|Y=y)Pr(Y=y)absentPrsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦Pr𝑌𝑦\displaystyle=\Pr(X^{\prime}=x,Z=z|Y=y)\Pr(Y=y)= roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) roman_Pr ( italic_Y = italic_y )
=(III-A)Pr(X=x|Z=z)Pr(Z=z|Y=y)Pr(Y=y)italic-(III-Aitalic-)Pr𝑋conditional𝑥𝑍𝑧Pr𝑍conditional𝑧𝑌𝑦Pr𝑌𝑦\displaystyle\overset{\eqref{equation:X'Z}}{=}\Pr(X=x|Z=z)\Pr(Z=z|Y=y)\Pr(Y=y)start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z | italic_Y = italic_y ) roman_Pr ( italic_Y = italic_y )
=Pr(X=x|Z=z)Pr(Z=z,Y=y)absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍𝑧𝑌𝑦\displaystyle=\Pr(X=x|Z=z)\Pr(Z=z,Y=y)= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z , italic_Y = italic_y )
=Pr(X=x|Z=z)Pr(Y=y|Z=z)Pr(Z=z),absentPr𝑋conditional𝑥𝑍𝑧Pr𝑌conditional𝑦𝑍𝑧Pr𝑍𝑧\displaystyle=\Pr(X=x|Z=z)\Pr(Y=y|Z=z)\Pr(Z=z),= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Y = italic_y | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z ) , (15)

in which X𝑋Xitalic_X and Y𝑌Yitalic_Y are symmetric. The proof is concluded by following similar steps in reversed order, i.e.,

(-E)italic-(-Eitalic-)\displaystyle\eqref{equ: Commutativity of Redundant Information_1}italic_( italic_) =Pr(X=x,Z=z)Pr(Y=y|Z=z)absentPr𝑋𝑥𝑍𝑧Pr𝑌conditional𝑦𝑍𝑧\displaystyle=\Pr(X=x,Z=z)\Pr(Y=y|Z=z)= roman_Pr ( italic_X = italic_x , italic_Z = italic_z ) roman_Pr ( italic_Y = italic_y | italic_Z = italic_z )
=Pr(Z=z|X=x)Pr(Y=y|Z=z)Pr(X=x)absentPr𝑍conditional𝑧𝑋𝑥Pr𝑌conditional𝑦𝑍𝑧Pr𝑋𝑥\displaystyle=\Pr(Z=z|X=x)\Pr(Y=y|Z=z)\Pr(X=x)= roman_Pr ( italic_Z = italic_z | italic_X = italic_x ) roman_Pr ( italic_Y = italic_y | italic_Z = italic_z ) roman_Pr ( italic_X = italic_x )
=(III-C2)Pr(Y=y,Z=z|X=x)Pr(X=x)italic-(III-C2italic-)Prsuperscript𝑌𝑦𝑍conditional𝑧𝑋𝑥Pr𝑋𝑥\displaystyle\overset{\eqref{equation:Y'Z}}{=}\Pr(Y^{\prime}=y,Z=z|X=x)\Pr(X=x)start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG roman_Pr ( italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_y , italic_Z = italic_z | italic_X = italic_x ) roman_Pr ( italic_X = italic_x )
=Pr(X=x,Y=y,Z=z).absentPr𝑋𝑥superscript𝑌𝑦𝑍𝑧\displaystyle=\Pr(X=x,Y^{\prime}=y,Z=z).\qed= roman_Pr ( italic_X = italic_x , italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_y , italic_Z = italic_z ) . italic_∎

-F Proof of Lemma 4.

Proof.

By Corollary 1, Lemma 4 is equivalent to

H(X|Y)H(X|Z)𝐻conditionalsuperscript𝑋𝑌𝐻conditional𝑋𝑍\displaystyle H(X^{\prime}|Y)-H(X|Z)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) - italic_H ( italic_X | italic_Z ) I(X,Z)absent𝐼𝑋𝑍\displaystyle\leq I(X,Z)≤ italic_I ( italic_X , italic_Z )
H(X|Y)𝐻conditionalsuperscript𝑋𝑌\displaystyle H(X^{\prime}|Y)italic_H ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_Y ) H(X),absent𝐻𝑋\displaystyle\leq H(X),≤ italic_H ( italic_X ) ,

which coincides with Corollary 2, and the proof follows. ∎

-G Proof of Lemma 5.

Proof.

By Definition 1, we have:

Un((X,X¯)(Z,Z¯)|(Y,Y¯))Un𝑋¯𝑋conditional𝑍¯𝑍𝑌¯𝑌\displaystyle\operatorname{Un}((X,\bar{X})\to(Z,\bar{Z})|(Y,\bar{Y}))roman_Un ( ( italic_X , over¯ start_ARG italic_X end_ARG ) → ( italic_Z , over¯ start_ARG italic_Z end_ARG ) | ( italic_Y , over¯ start_ARG italic_Y end_ARG ) )
=I((X,X¯);(Z,Z¯)|(Y,Y¯))absent𝐼superscript𝑋¯𝑋conditional𝑍¯𝑍𝑌¯𝑌\displaystyle=I((X,\bar{X})^{\prime};(Z,\bar{Z})|(Y,\bar{Y}))= italic_I ( ( italic_X , over¯ start_ARG italic_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; ( italic_Z , over¯ start_ARG italic_Z end_ARG ) | ( italic_Y , over¯ start_ARG italic_Y end_ARG ) ) (16)

where

Pr((X,X¯),(Z,Z¯)=(x,x¯),(z,z¯)|(Y,Y¯)=(y,y¯))Prsuperscript𝑋¯𝑋𝑍¯𝑍𝑥¯𝑥conditional𝑧¯𝑧𝑌¯𝑌𝑦¯𝑦\displaystyle\Pr((X,\bar{X})^{\prime},(Z,\bar{Z})=(x,\bar{x}),(z,\bar{z})|(Y,% \bar{Y})=(y,\bar{y}))roman_Pr ( ( italic_X , over¯ start_ARG italic_X end_ARG ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , ( italic_Z , over¯ start_ARG italic_Z end_ARG ) = ( italic_x , over¯ start_ARG italic_x end_ARG ) , ( italic_z , over¯ start_ARG italic_z end_ARG ) | ( italic_Y , over¯ start_ARG italic_Y end_ARG ) = ( italic_y , over¯ start_ARG italic_y end_ARG ) )
=Pr((X,X¯)=(x,x¯)|(Z,Z¯)=(z,z¯))absentPr𝑋¯𝑋conditional𝑥¯𝑥𝑍¯𝑍𝑧¯𝑧\displaystyle=\Pr((X,\bar{X})=(x,\bar{x})|(Z,\bar{Z})=(z,\bar{z}))= roman_Pr ( ( italic_X , over¯ start_ARG italic_X end_ARG ) = ( italic_x , over¯ start_ARG italic_x end_ARG ) | ( italic_Z , over¯ start_ARG italic_Z end_ARG ) = ( italic_z , over¯ start_ARG italic_z end_ARG ) )
Pr((Z,Z¯)=(z,z¯)|(Y,Y¯)=(y,y¯)).absentPr𝑍¯𝑍conditional𝑧¯𝑧𝑌¯𝑌𝑦¯𝑦\displaystyle\phantom{=}\cdot\Pr((Z,\bar{Z})=(z,\bar{z})|(Y,\bar{Y})=(y,\bar{y% })).⋅ roman_Pr ( ( italic_Z , over¯ start_ARG italic_Z end_ARG ) = ( italic_z , over¯ start_ARG italic_z end_ARG ) | ( italic_Y , over¯ start_ARG italic_Y end_ARG ) = ( italic_y , over¯ start_ARG italic_y end_ARG ) ) . (17)

Since X,Y,Z𝑋𝑌𝑍X,Y,Zitalic_X , italic_Y , italic_Z are independent from X¯,Y¯,Z¯¯𝑋¯𝑌¯𝑍\bar{X},\bar{Y},\bar{Z}over¯ start_ARG italic_X end_ARG , over¯ start_ARG italic_Y end_ARG , over¯ start_ARG italic_Z end_ARG, we have

(-G)italic-(-Gitalic-)\displaystyle\eqref{equ:proof of superposition_2}italic_( italic_) =Pr(X=x|Z=z)Pr(X¯=x¯|Z¯=z¯)absentPr𝑋conditional𝑥𝑍𝑧Pr¯𝑋conditional¯𝑥¯𝑍¯𝑧\displaystyle=\Pr(X=x|Z=z)\Pr(\bar{X}=\bar{x}|\bar{Z}=\bar{z})= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( over¯ start_ARG italic_X end_ARG = over¯ start_ARG italic_x end_ARG | over¯ start_ARG italic_Z end_ARG = over¯ start_ARG italic_z end_ARG )
Pr(Z=z|Y=y)Pr(Z¯=z¯|Y¯=y¯)absentPr𝑍conditional𝑧𝑌𝑦Pr¯𝑍conditional¯𝑧¯𝑌¯𝑦\displaystyle\phantom{=}\cdot\Pr(Z=z|Y=y)\Pr(\bar{Z}=\bar{z}|\bar{Y}=\bar{y})⋅ roman_Pr ( italic_Z = italic_z | italic_Y = italic_y ) roman_Pr ( over¯ start_ARG italic_Z end_ARG = over¯ start_ARG italic_z end_ARG | over¯ start_ARG italic_Y end_ARG = over¯ start_ARG italic_y end_ARG )
=(III-A)Pr(X=x,Z=z|Y=y)Pr(X¯,Z¯=x¯,z¯|Y¯=y¯),italic-(III-Aitalic-)Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦Prsuperscript¯𝑋¯𝑍¯𝑥conditional¯𝑧¯𝑌¯𝑦\displaystyle\overset{\eqref{equation:X'Z}}{=}\Pr(X^{\prime}=x,Z=z|Y=y)\Pr(% \bar{X}^{\prime},\bar{Z}=\bar{x},\bar{z}|\bar{Y}=\bar{y}),start_OVERACCENT italic_( italic_) end_OVERACCENT start_ARG = end_ARG roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y ) roman_Pr ( over¯ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_Z end_ARG = over¯ start_ARG italic_x end_ARG , over¯ start_ARG italic_z end_ARG | over¯ start_ARG italic_Y end_ARG = over¯ start_ARG italic_y end_ARG ) ,

which implies that X,Z|Y=ysuperscript𝑋conditional𝑍𝑌𝑦X^{\prime},Z|Y=yitalic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Z | italic_Y = italic_y and X¯,Z¯|Y¯=y¯superscript¯𝑋conditional¯𝑍¯𝑌¯𝑦\bar{X}^{\prime},\bar{Z}|\bar{Y}=\bar{y}over¯ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_Z end_ARG | over¯ start_ARG italic_Y end_ARG = over¯ start_ARG italic_y end_ARG are also independent for every y,y¯𝒴×𝒴¯𝑦¯𝑦𝒴¯𝒴y,\bar{y}\in\mathcal{Y}\times\bar{\mathcal{Y}}italic_y , over¯ start_ARG italic_y end_ARG ∈ caligraphic_Y × over¯ start_ARG caligraphic_Y end_ARG. Therefore, we have

=I(X,Z|Y)+I(X¯,Z¯|Y¯)absent𝐼superscript𝑋conditional𝑍𝑌𝐼superscript¯𝑋conditional¯𝑍¯𝑌\displaystyle=I(X^{\prime},Z|Y)+I(\bar{X}^{\prime},\bar{Z}|\bar{Y})= italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Z | italic_Y ) + italic_I ( over¯ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over¯ start_ARG italic_Z end_ARG | over¯ start_ARG italic_Y end_ARG )
=Un(XZ|Y)+Un(X¯Z¯|Y¯)absentUnsuperscript𝑋conditional𝑍𝑌Unsuperscript¯𝑋conditional¯𝑍¯𝑌\displaystyle=\operatorname{Un}(X^{\prime}\to Z|Y)+\operatorname{Un}(\bar{X}^{% \prime}\to\bar{Z}|\bar{Y})\qed= roman_Un ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → italic_Z | italic_Y ) + roman_Un ( over¯ start_ARG italic_X end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT → over¯ start_ARG italic_Z end_ARG | over¯ start_ARG italic_Y end_ARG ) italic_∎

-H Proof of Lemma 6.

Proof.

Recall that Lemma 2 states that Red(X,YZ)=I(X;Y)Red𝑋𝑌𝑍𝐼superscript𝑋𝑌\operatorname{Red}(X,Y\to Z)=I(X^{\prime};Y)roman_Red ( italic_X , italic_Y → italic_Z ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ). Therefore, since I(X;Y)𝐼superscript𝑋𝑌I(X^{\prime};Y)italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Y ) is a continuous function of 𝒟X,Ysubscript𝒟superscript𝑋𝑌\mathcal{D}_{X^{\prime},Y}caligraphic_D start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Y end_POSTSUBSCRIPT, it suffices to prove that the mapping (𝒟X,Y,Z)=𝒟X,Ysubscript𝒟𝑋𝑌𝑍subscript𝒟superscript𝑋𝑌\mathcal{F}(\mathcal{D}_{X,Y,Z})=\mathcal{D}_{X^{\prime},Y}caligraphic_F ( caligraphic_D start_POSTSUBSCRIPT italic_X , italic_Y , italic_Z end_POSTSUBSCRIPT ) = caligraphic_D start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_Y end_POSTSUBSCRIPT is continuous, which holds by (III-A). ∎

-I Proof of Lemma 7.

Proof.

By Def. 1, we have Un(XZ|Y)=I(X;Z|Y)Un𝑋conditional𝑍𝑌𝐼superscript𝑋conditional𝑍𝑌\operatorname{Un}(X\to Z|Y)=I(X^{\prime};Z|Y)roman_Un ( italic_X → italic_Z | italic_Y ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Z | italic_Y ), where

Pr(X=x,Z=z|Y=y)Prsuperscript𝑋𝑥𝑍conditional𝑧𝑌𝑦\displaystyle\Pr(X^{\prime}=x,Z=z|Y=y)roman_Pr ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_x , italic_Z = italic_z | italic_Y = italic_y )
=Pr(X=x|Z=z)Pr(Z=z|Y=y).absentPr𝑋conditional𝑥𝑍𝑧Pr𝑍conditional𝑧𝑌𝑦\displaystyle=\Pr(X=x|Z=z)\Pr(Z=z|Y=y).= roman_Pr ( italic_X = italic_x | italic_Z = italic_z ) roman_Pr ( italic_Z = italic_z | italic_Y = italic_y ) . (18)

Since Z=(X,Y)𝑍𝑋𝑌Z=(X,Y)italic_Z = ( italic_X , italic_Y ), Eq. (-I) is zero whenever z(x,y)𝑧𝑥𝑦z\neq(x,y)italic_z ≠ ( italic_x , italic_y ), and otherwise

(-I)italic-(-Iitalic-)\displaystyle\eqref{equ:proof:Independent Identity_1}italic_( italic_) =Pr(X=x|(X,Y)=(x,y))Pr((X,Y)=(x,y)|Y=y)absentPr𝑋conditional𝑥𝑋𝑌𝑥𝑦Pr𝑋𝑌conditional𝑥𝑦𝑌𝑦\displaystyle=\Pr(X=x|(X,Y)=(x,y))\Pr((X,Y)=(x,y)|Y=y)= roman_Pr ( italic_X = italic_x | ( italic_X , italic_Y ) = ( italic_x , italic_y ) ) roman_Pr ( ( italic_X , italic_Y ) = ( italic_x , italic_y ) | italic_Y = italic_y )
=Pr(X=x|Y=y).absentPr𝑋conditional𝑥𝑌𝑦\displaystyle=\Pr(X=x|Y=y).= roman_Pr ( italic_X = italic_x | italic_Y = italic_y ) . (19)

Also, since X𝑋Xitalic_X and Y𝑌Yitalic_Y are independent, we have

(-I)=Pr(X=x).italic-(-Iitalic-)Pr𝑋𝑥\displaystyle\eqref{equ:proof:Independent Identity_2}=\Pr(X=x).italic_( italic_) = roman_Pr ( italic_X = italic_x ) .

Therefore, we have

Un(XZ|Y)=I(X;Z|Y)=H(X)=I(X;Z),Un𝑋conditional𝑍𝑌𝐼superscript𝑋conditional𝑍𝑌𝐻𝑋𝐼𝑋𝑍\displaystyle\operatorname{Un}(X\to Z|Y)=I(X^{\prime};Z|Y)=H(X)=I(X;Z),roman_Un ( italic_X → italic_Z | italic_Y ) = italic_I ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; italic_Z | italic_Y ) = italic_H ( italic_X ) = italic_I ( italic_X ; italic_Z ) ,

which by Definition 2 implies that

Red(X,YZ)Red𝑋𝑌𝑍\displaystyle\operatorname{Red}(X,Y\to Z)roman_Red ( italic_X , italic_Y → italic_Z ) =I(X;Z)Un(XZ|Y)=0.absent𝐼𝑋𝑍Un𝑋conditional𝑍𝑌0\displaystyle=I(X;Z)-\operatorname{Un}(X\to Z|Y)=0.\qed= italic_I ( italic_X ; italic_Z ) - roman_Un ( italic_X → italic_Z | italic_Y ) = 0 . italic_∎