Estimation of sparse directed acyclic graphs for multivariate counts data

Biometrics. 2016 Sep;72(3):791-803. doi: 10.1111/biom.12467. Epub 2016 Feb 5.

Abstract

The next-generation sequencing data, called high-throughput sequencing data, are recorded as count data, which are generally far from normal distribution. Under the assumption that the count data follow the Poisson log-normal distribution, this article provides an L1-penalized likelihood framework and an efficient search algorithm to estimate the structure of sparse directed acyclic graphs (DAGs) for multivariate counts data. In searching for the solution, we use iterative optimization procedures to estimate the adjacency matrix and the variance matrix of the latent variables. The simulation result shows that our proposed method outperforms the approach which assumes multivariate normal distributions, and the log-transformation approach. It also shows that the proposed method outperforms the rank-based PC method under sparse network or hub network structures. As a real data example, we demonstrate the efficiency of the proposed method in estimating the gene regulatory networks of the ovarian cancer study.

Keywords: Bayesian network; Count data; Directed acyclic graph; Lasso estimation; Penalized likelihood estimation; Unknown variable ordering.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Computer Simulation
  • Female
  • Gene Regulatory Networks
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Likelihood Functions
  • Models, Statistical*
  • Ovarian Neoplasms / genetics
  • Poisson Distribution*