Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD

Int J Data Sci Anal. 2017;3(2):105-119. doi: 10.1007/s41060-016-0034-x. Epub 2016 Dec 2.

Abstract

Causal discovery is an increasingly important method for data analysis in the field of medical research. In this paper, we consider two challenges in causal discovery that occur very often when working with medical data: a mixture of discrete and continuous variables and a substantial amount of missing values. To the best of our knowledge, there are no methods that can handle both challenges at the same time. In this paper, we develop a new method that can handle these challenges based on the assumption that data are missing at random and that continuous variables obey a non-paranormal distribution. We demonstrate the validity of our approach for causal discovery on simulated data as well as on two real-world data sets from a monetary incentive delay task and a reversal learning task. Our results help in the understanding of the etiology of attention-deficit/hyperactivity disorder (ADHD).

Keywords: ADHD; Causal discovery; Missing data; Mixture of discrete and continuous data.