Showing 1–2 of 2 results for author: Fantl, J
-
Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies
Authors:
Prithwish Basu,
Liangyu Zhao,
Jason Fantl,
Siddharth Pal,
Arvind Krishnamurthy,
Joud Khoury
Abstract:
The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This paper takes a holistic approach to optimize the perform…
▽ More
The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology and lowering the schedules to various runtimes and interconnect technologies. We also propose a novel topology that delivers near-optimal all-to-all performance.
△ Less
Submitted 25 April, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Efficient Direct-Connect Topologies for Collective Communications
Authors:
Liangyu Zhao,
Siddharth Pal,
Tapan Chugh,
Weiyang Wang,
Jason Fantl,
Prithwish Basu,
Joud Khoury,
Arvind Krishnamurthy
Abstract:
We consider the problem of distilling efficient network topologies for collective communications. We provide an algorithmic framework for constructing direct-connect topologies optimized for the latency vs. bandwidth trade-off associated with the workload. Our approach synthesizes many different topologies and schedules for a given cluster size and degree and then identifies the appropriate topolo…
▽ More
We consider the problem of distilling efficient network topologies for collective communications. We provide an algorithmic framework for constructing direct-connect topologies optimized for the latency vs. bandwidth trade-off associated with the workload. Our approach synthesizes many different topologies and schedules for a given cluster size and degree and then identifies the appropriate topology and schedule for a given workload. Our algorithms start from small, optimal base topologies and associated communication schedules and use techniques that can be iteratively applied to derive much larger topologies and schedules. Additionally, we incorporate well-studied large-scale graph topologies into our algorithmic framework by producing efficient collective schedules for them using a novel polynomial-time algorithm. Our evaluation uses multiple testbeds and large-scale simulations to demonstrate significant performance benefits from our derived topologies and schedules.
△ Less
Submitted 12 May, 2024; v1 submitted 7 February, 2022;
originally announced February 2022.