Robust multi-source geographic entities matching by maximizing geometric and semantic similarity

Sci Rep. 2024 Dec 30;14(1):31616. doi: 10.1038/s41598-024-79812-2.

Abstract

Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these approaches are often limited in the generalization ability for matching heterogeneous data from multiple sources and the accuracy for complex pattern matching. To resolve these problems, robust multi-source geographic entities matching by maximizing geometric and semantic similarity is proposed. First, the entire entity is segmented based on shape features, and the partitioned feature segments are extracted as matching primitives; Second, feature segments are grouped into patterns, encompassing three major categories and fourteen subcategories; Following this, pattern matching is performed based on spatial similarity metric such as maximum projection distance, etc.; Finally, the spatial matches are detected and refined through semantic similarity calculation. The proposed method is tested using two datasets from regions in southeast and northwest China. The experimental results demonstrate that our method can be effectively applied to both area and line entity matching with strong generalization and application capability and significantly improved matching accuracy. Specifically, nine feature segment matching patterns for matching area entities and six for line entities are utilized, and the precision and recall are nearly 90%.

Keywords: Area entity; Feature segment; Line entity; Pattern recognition; Semantic similarity.