Due to the limited field of view of the microscopes, acquisitions of macroscopic specimens require many parallel image stacks to cover the whole volume of interest. Overlapping regions are introduced among stacks in order to make it possible automatic alignment by means of a 3D stitching tool. Since state-of-the-art microscopes coupled with chemical clearing procedures can generate 3D images whose size exceeds the Terabyte, parallelization is required to keep stitching time within acceptable limits. In the present paper we discuss how multi-level parallelization reduces the execution times of TeraStitcher, a tool designed to deal with very large images. Two algorithms performing dataset partition for efficient parallelization in a transparent way are presented together with experimental results proving the effectiveness of the approach that achieves a speedup close to 300×, when both coarse- and fine-grained parallelism are exploited. Multi-level parallelization of TeraStitcher led to a significant reduction of processing times with no changes in the user interface, and with no additional effort required for the maintenance of code.
Keywords: 3D microscopy; GPU; data partitioning; parallel processing; stitching; terabyte images.