Research on change-point detection, the classical problem of detecting abrupt changes in sequential data, has focused predominantly on datasets with a single observable. A growing number of time series datasets, however, involve many observables, often with the property that a given change typically affects only a few of the observables. We introduce a general statistical method that, given many noisy observables, detects points in time at which various subsets of the observables exhibit simultaneous changes in data distribution and explicitly identifies those subsets. Our work is motivated by the problem of identifying the nature and timing of biologically interesting conformational changes that occur during atomic-level simulations of biomolecules such as proteins. This problem has proved challenging both because each such conformational change might involve only a small region of the molecule and because these changes are often subtle relative to the ever-present background of faster structural fluctuations. We show that our method is effective in detecting biologically interesting conformational changes in molecular dynamics simulations of both folded and unfolded proteins, even in cases where these changes are difficult to detect using alternative techniques. This method may also facilitate the detection of change points in other types of sequential data involving large numbers of observables--a problem likely to become increasingly important as such data continue to proliferate in a variety of application domains.
Keywords: SIMPLE; conformational change; molecular dynamics; multivariate; penalized maximum likelihood.