Distance metrics facilitate a number of methods for statistical analysis. For statistical mechanical applications, it is useful to be able to compute the distance between two different orientations of a molecule. However, a number of distance metrics for rotation have been employed, and in this study, we consider different distance metrics and their utility in entropy estimation using the k-nearest neighbors (KNN) algorithm. This approach shows a number of advantages over entropy estimation using a histogram method, and the different approaches are assessed using uniform randomly generated data, biased randomly generated data, and data from a molecular dynamics (MD) simulation of bulk water. The results identify quaternion metrics as superior to a metric based on the Euler angles. However, it is demonstrated that samples from MD simulation must be independent for effective use of the KNN algorithm and this finding impacts any application to time series data.
Keywords: distance metric; entropy; k-nearest neighbors; molecular dynamics; solvation; statistical mechanics.
Copyright © 2013 The Authors. Journal of Computational Chemistry published by Wiley Periodicals, Inc.