A standard paradigm in computational biology is to leverage interaction networks as prior knowledge in analyzing high-throughput biological data, where the data give a score for each vertex in the network. One classical approach is the identification of altered subnetworks, or subnetworks of the interaction network that have both outlier vertex scores and a defined network topology. One class of algorithms for identifying altered subnetworks search for high-scoring subnetworks in subnetwork families with simple topological constraints, such as connected subnetworks, and have sound statistical guarantees. A second class of algorithms employ network propagation-the smoothing of vertex scores over the network using a random walk or diffusion process-and utilize the global structure of the network. However, network propagation algorithms often rely on ad hoc heuristics that lack a rigorous statistical foundation. In this work, we unify the subnetwork family and network propagation approaches by deriving the propagation family, a subnetwork family that approximates the sets of vertices ranked highly by network propagation approaches. We introduce NetMix2, a principled algorithm for identifying altered subnetworks from a wide range of subnetwork families. When using the propagation family, NetMix2 combines the advantages of the subnetwork family and network propagation approaches. NetMix2 outperforms other methods, including network propagation on simulated data, pan-cancer somatic mutation data, and genome-wide association data from multiple human diseases.
Keywords: GWAS; altered subnetworks; anomaly detection; cancer; interaction network; network analysis; network propagation.