The Wasserstein distance is a powerful metric based on the theory of optimal mass transport. It gives a natural measure of the distance between two distributions with a wide range of applications. In contrast to a number of the common divergences on distributions such as Kullback-Leibler or Jensen-Shannon, it is (weakly) continuous, and thus ideal for analyzing corrupted and noisy data. Until recently, however, no kernel methods for dealing with nonlinear data have been proposed via the Wasserstein distance. In this work, we develop a novel method to compute the L2-Wasserstein distance in reproducing kernel Hilbert spaces (RKHS) called kernel L2-Wasserstein distance, which is implemented using the kernel trick. The latter is a general method in machine learning employed to handle data in a nonlinear manner. We evaluate the proposed approach in identifying computed tomography (CT) slices with dental artifacts in head and neck cancer, performing unsupervised hierarchical clustering on the resulting Wasserstein distance matrix that is computed on imaging texture features extracted from each CT slice. We further compare the performance of kernel Wasserstein distance with alternatives including kernel Kullback-Leibler divergence we previously developed. Our experiments show that the kernel approach outperforms classical non-kernel approaches in identifying CT slices with artifacts.
Keywords: Kernel Kullback–Leibler divergence; Kernel Wasserstein distance; Kernel trick; Reproducing kernel Hilbert space.
Copyright © 2020 Elsevier Ltd. All rights reserved.