Double-stage delay-multiply-and-sum (DS-DMAS) is an algorithm proposed for photoacoustic image reconstruction. The DS-DMAS algorithm offers a higher contrast than conventional delay-and-sum and delay-multiply and-sum but at the expense of higher computational complexity. Here, we utilized a compute unified device architecture (CUDA) graphics processing unit (GPU) parallel computation approach to address the high complexity of the DS-DMAS for photoacoustic image reconstruction generated from a commercial light-emitting diode (LED)-based photoacoustic scanner. In comparison with a single-threaded central processing unit (CPU), the GPU approach increased speeds by nearly 140-fold for 1024 × 1024 pixel image; there was no decrease in accuracy. The proposed implementation makes it possible to reconstruct photoacoustic images with frame rates of 250, 125, and 83.3 when the images are 64 × 64, 128 × 128, and 256 × 256, respectively. Thus, DS-DMAS can be efficiently used in clinical devices when coupled with CUDA GPU parallel computation.
Keywords: beamforming; central processing unit (CPU); compute unified device architecture (CUDA); double-stage delay-multiply-and-sum (DS-DMAS); graphics processing unit (GPU); linear-array imaging; parallel computing; photoacoustic imaging.