Fast Nonparametric Clustering of Structured Time-Series

IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):383-93. doi: 10.1109/TPAMI.2014.2318711.

Abstract

In this publication, we combine two Bayesian nonparametric models: the Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP model is to introduce a variation on the GP prior which enables us to model structured time-series data, i.e., data containing groups where we wish to model inter- and intra-group variability. Our innovation in the DP model is an implementation of a new fast collapsed variational inference procedure which enables us to optimize our variational approximation significantly faster than standard VB approaches. In a biological time series application we show how our model better captures salient features of the data, leading to better consistency with existing biological classifications, while the associated inference algorithm provides a significant speed-up over EM-based variational inference.

MeSH terms

  • Cluster Analysis*
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Profiling
  • Normal Distribution*
  • Statistics, Nonparametric