CONDITIONAL DISTANCE CORRELATION

J Am Stat Assoc. 2015;110(512):1726-1734. doi: 10.1080/01621459.2014.993081. Epub 2015 Jan 23.

Abstract

Statistical inference on conditional dependence is essential in many fields including genetic association studies and graphical models. The classic measures focus on linear conditional correlations, and are incapable of characterizing non-linear conditional relationship including non-monotonic relationship. To overcome this limitation, we introduces a nonparametric measure of conditional dependence for multivariate random variables with arbitrary dimensions. Our measure possesses the necessary and intuitive properties as a correlation index. Briefly, it is zero almost surely if and only if two multivariate random variables are conditionally independent given a third random variable. More importantly, the sample version of this measure can be expressed elegantly as the root of a V or U-process with random kernels and has desirable theoretical properties. Based on the sample version, we propose a test for conditional independence, which is proven to be more powerful than some recently developed tests through our numerical simulations. The advantage of our test is even greater when the relationship between the multivariate random variables given the third random variable cannot be expressed in a linear or monotonic function of one random variable versus the other. We also show that the sample measure is consistent and weakly convergent, and the test statistic is asymptotically normal. By applying our test in a real data analysis, we are able to identify two conditionally associated gene expressions, which otherwise cannot be revealed. Thus, our measure of conditional dependence is not only an ideal concept, but also has important practical utility.

Keywords: Conditional distance correlation; Conditional independence test; Local bootstrap; U(V) process with random kernel.