PenPC: A two-step approach to estimate the skeletons of high-dimensional directed acyclic graphs

Min Jin Ha; Wei Sun; Jichun Xie

doi:10.1111/biom.12415

PenPC: A two-step approach to estimate the skeletons of high-dimensional directed acyclic graphs

Biometrics. 2016 Mar;72(1):146-55. doi: 10.1111/biom.12415. Epub 2015 Sep 25.

Authors

Min Jin Ha¹, Wei Sun^{2

3}, Jichun Xie⁴

Affiliations

¹ Department of Biostatistics, MD Anderson Cancer Center, Houston, Texas, 77030, U.S.A.
² Department of Biostatistics, Department of Genetics, UNC Chapel Hill, North Carolina, 27514, U.S.A.
³ Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA.
⁴ Department of Biostatistics & Bioinformatics, Duke University, Durham, North Carolina, 27708, U.S.A.

Abstract

Estimation of the skeleton of a directed acyclic graph (DAG) is of great importance for understanding the underlying DAG and causal effects can be assessed from the skeleton when the DAG is not identifiable. We propose a novel method named PenPC to estimate the skeleton of a high-dimensional DAG by a two-step approach. We first estimate the nonzero entries of a concentration matrix using penalized regression, and then fix the difference between the concentration matrix and the skeleton by evaluating a set of conditional independence hypotheses. For high-dimensional problems where the number of vertices p is in polynomial or exponential scale of sample size n, we study the asymptotic property of PenPC on two types of graphs: traditional random graphs where all the vertices have the same expected number of neighbors, and scale-free graphs where a few vertices may have a large number of neighbors. As illustrated by extensive simulations and applications on gene expression data of cancer patients, PenPC has higher sensitivity and specificity than the state-of-the-art method, the PC-stable algorithm.

Keywords: DAG; High dimensional; Log penalty; PC-algorithm; Penalized regression; Skeleton.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Biomarkers, Tumor / genetics*
Breast Neoplasms / epidemiology*
Breast Neoplasms / genetics*
Computer Simulation
Data Interpretation, Statistical
Female
Gene Expression Profiling / methods*
Genetic Markers / genetics
Genetic Predisposition to Disease / epidemiology
Genetic Predisposition to Disease / genetics*
Humans
Models, Statistical*
Neoplasm Proteins / genetics
Prevalence
Reproducibility of Results
Risk Factors
Sensitivity and Specificity

Substances

Biomarkers, Tumor
Genetic Markers
Neoplasm Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding