Evaluating data mining algorithms using molecular dynamics trajectories

Int J Data Min Bioinform. 2013;8(2):169-87. doi: 10.1504/ijdmb.2013.055499.

Abstract

Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.

MeSH terms

  • Algorithms*
  • Data Mining / methods*
  • Molecular Dynamics Simulation*
  • Molecular Mimicry
  • Peptides / chemistry
  • Serine Proteases / chemistry

Substances

  • Peptides
  • Serine Proteases