Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Borthwick, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.18064  [pdf, other

    cs.IR cs.DB

    Neural Locality Sensitive Hashing for Entity Blocking

    Authors: Runhui Wang, Luyang Kong, Yefan Tao, Andrew Borthwick, Davor Golac, Henrik Johnson, Shadie Hijazi, Dong Deng, Yongfeng Zhang

    Abstract: Locality-sensitive hashing (LSH) is a fundamental algorithmic technique widely employed in large-scale data processing applications, such as nearest-neighbor search, entity resolution, and clustering. However, its applicability in some real-world scenarios is limited due to the need for careful design of hashing functions that align with specific metrics. Existing LSH-based Entity Blocking solutio… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  2. arXiv:2008.08285  [pdf, other

    cs.DB cs.DC cs.DS

    Scalable Blocking for Very Large Databases

    Authors: Andrew Borthwick, Stephen Ash, Bin Pang, Shehzad Qureshi, Timothy Jones

    Abstract: In the field of database deduplication, the goal is to find approximately matching records within a database. Blocking is a typical stage in this process that involves cheaply finding candidate pairs of records that are potential matches for further processing. We present here Hashed Dynamic Blocking, a new approach to blocking designed to address datasets larger than those studied in most prior w… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.