HCGA: Highly comparative graph analysis for network phenotyping

Patterns (N Y). 2021 Apr 2;2(4):100227. doi: 10.1016/j.patter.2021.100227. eCollection 2021 Apr 9.

Abstract

Networks are widely used as mathematical models of complex systems across many scientific disciplines. Decades of work have produced a vast corpus of research characterizing the topological, combinatorial, statistical, and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and sometimes overlapping) characteristics of a network. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph datasets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterization of graph datasets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark datasets while retaining the interpretability of network features. We exemplify HCGA by predicting the charge transfer in organic semiconductors and clustering a dataset of neuronal morphology images.

Keywords: feature extraction; graph classification; graph regression; graph theory; high-throughput phenotyping; machine learning; networks.