Testing for dependence on tree structures

Proc Natl Acad Sci U S A. 2020 May 5;117(18):9787-9792. doi: 10.1073/pnas.1912957117. Epub 2020 Apr 22.

Abstract

Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently, this is addressed on an ad hoc basis, usually requiring the user to decide on an appropriate number of clusters to prune out of the tree to be tested against the response variable. Here, we present a statistical method with statistical guarantees that tests for association between the response variable and a fixed tree structure across all levels of the tree hierarchy with high power while accounting for the overall false positive error rate. This enhances the robustness and reproducibility of such findings.

Keywords: change-point detection; hypothesis testing; subgroup detection; tree structures.

Publication types

  • Research Support, Non-U.S. Gov't