CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs

Yue Tang; Yukai Song; Naveena Elango; Sheena Ratnam Priya; Alex K Jones; Jinjun Xiong; Peipei Zhou; Jingtong Hu

doi:10.1109/tcad.2024.3438994

CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs

IEEE Trans Comput Aided Des Integr Circuits Syst. 2024 Nov;43(11):3937-3948. doi: 10.1109/tcad.2024.3438994. Epub 2024 Nov 6.

Authors

Yue Tang¹, Yukai Song¹, Naveena Elango², Sheena Ratnam Priya², Alex K Jones³, Jinjun Xiong², Peipei Zhou⁴, Jingtong Hu¹

Affiliations

¹ Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA.
² Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, USA.
³ Department of Electrical Engineering and Computer Science Department, Syracuse University, 4-206 Center for Science and Technology, Syracuse, NY 13244, USA.
⁴ School of Engineering, Brown University, 345 Brook Street, Providence, RI 02912, USA.

PMID: 39703437
PMCID: PMC11654640 (available on 2025-11-01)
DOI: 10.1109/tcad.2024.3438994

Abstract

DNNs are rapidly evolving from streamlined single-modality single-task (SMST) to multi-modality multi-task (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e. deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10000X less than exhaustively searching the optimal solution.

Keywords: heterogeneous FPGA clusters; multi-modality multi-task (MMMT).