PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Patwary, Md. Mostofa Ali; Satish, Nadathur Rajagopalan; Sundaram, Narayanan; Liu, Jialin; Sadowski, Peter; Racah, Evan; Byna, Suren; Tull, Craig; Bhimji, Wahid; Prabhat; Dubey, Pradeep

doi:10.1109/IPDPS.2016.57

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1607.08220 (cs)

[Submitted on 27 Jul 2016]

Title:PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Authors:Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, Pradeep Dubey

View PDF

Abstract:Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing $\sim$50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems. In addition, we showcase performance and scalability on the recently released Intel Xeon Phi processor showing that our algorithm scales well even on massively parallel architectures.

Comments:	11 pages in PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures, Md. Mostofa Ali Patwary this http URL., IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1607.08220 [cs.DC]
	(or arXiv:1607.08220v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1607.08220
Related DOI:	https://doi.org/10.1109/IPDPS.2016.57

Submission history

From: Mostofa Patwary [view email]
[v1] Wed, 27 Jul 2016 19:13:07 UTC (2,790 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators