Clustering Cu-S based compounds using periodic table representation and compositional Wasserstein distance

Sci Rep. 2024 Dec 30;14(1):31602. doi: 10.1038/s41598-024-79126-3.


Crystal structure similarity is useful for the chemical analysis of nowadays big materials databases and data mining new materials. Here we propose to use two-dimensional Wasserstein distance (earth mover's distance) to measure the compositional similarity between different compounds, based on the periodic table representation of compositions. To demonstrate the effectiveness of our approach, 1586 Cu-S based compounds are taken from the inorganic crystal structure database (ICSD) to form a validation dataset. By using local structure order parameters as a geometrical similarity metric, the similarity matrix including both compositional and geometrical similarities is calculated. Then all the Cu-S compounds are clustered into 86 groups using the similarity matrix and "density-based spatial clustering of applications with noise" (DBSCAN) algorithm. Some selected groups are analyzed using crystal structure visualization of hundreds of compounds, which provides chemical insights of the similarity metrics and shows the effectiveness of clustering. A group of rare earth containing layered Cu-S compounds is proposed for further experimental investigation as potential thermoelectric materials, based on a structure-property relationship consideration that similar structures tend to have similar properties. The unsupervised clustering approach in this work can be easily applied to other datasets, which will help for chemical understanding of the materials datasets and discover new materials with similarity properties based on the similarity metrics.