Towards high-performance deep learning architecture and hardware accelerator design for robust analysis in diffuse correlation spectroscopy

Zhenya Zang; Quan Wang; Mingliang Pan; Yuanzhe Zhang; Xi Chen; Xingda Li; David Day Uei Li

doi:10.1016/j.cmpb.2024.108471

Towards high-performance deep learning architecture and hardware accelerator design for robust analysis in diffuse correlation spectroscopy

Comput Methods Programs Biomed. 2025 Jan:258:108471. doi: 10.1016/j.cmpb.2024.108471. Epub 2024 Oct 28.

Authors

Zhenya Zang¹, Quan Wang¹, Mingliang Pan¹, Yuanzhe Zhang¹, Xi Chen¹, Xingda Li¹, David Day Uei Li²

Affiliations

¹ Department of Biomedical Engineering, University of Strathclyde, Glasgow, United Kingdom.
² Department of Biomedical Engineering, University of Strathclyde, Glasgow, United Kingdom. Electronic address: [email protected].

PMID: 39531806
DOI: 10.1016/j.cmpb.2024.108471

Abstract

This study proposes a compact deep learning (DL) architecture and a highly parallelized computing hardware platform to reconstruct the blood flow index (BFi) in diffuse correlation spectroscopy (DCS). We leveraged a rigorous analytical model to generate autocorrelation functions (ACFs) to train the DL network. We assessed the accuracy of the proposed DL using simulated and milk phantom data. Compared to convolutional neural networks (CNN), our lightweight DL architecture achieves 66.7% and 18.5% improvement in MSE for BFi and the coherence factor β, using synthetic data evaluation. The accuracy of rBFi over different algorithms was also investigated. We further simplified the DL computing primitives using subtraction for feature extraction, considering further hardware implementation. We extensively explored computing parallelism and fixed-point quantization within the DL architecture. With the DL model's compact size, we employed unrolling and pipelining optimizations for computation-intensive for-loops in the DL model while storing all learned parameters in on-chip BRAMs. We also achieved pixel-wise parallelism, enabling simultaneous, real-time processing of 10 and 15 autocorrelation functions on Zynq-7000 and Zynq-UltraScale+ field programmable gate array (FPGA), respectively. Unlike existing FPGA accelerators that produce BFi and the β from autocorrelation functions on standalone hardware, our approach is an encapsulated, end-to-end on-chip conversion process from intensity photon data to the temporal intensity ACF and subsequently reconstructing BFi and β. This hardware platform achieves an on-chip solution to replace post-processing and miniaturize modern DCS systems that use single-photon cameras. We also comprehensively compared the computational efficiency of our FPGA accelerator to CPU and GPU solutions.

Keywords: Blood flow index; Deep neural networks; Deep-learning hardware accelerator; Diffuse correlation spectroscope.

MeSH terms

Algorithms*
Deep Learning*
Equipment Design
Humans
Neural Networks, Computer*
Phantoms, Imaging
Spectrum Analysis*