¹¹institutetext: Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California (USC), Los Angeles, CA 90033, USA ²²institutetext: Alfred E. Mann Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California (USC), Los Angeles, CA 90089, USA ³³institutetext: Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California (USC), Los Angeles, CA 90089, USA ³³email: [email protected]

Diffusion Model-based FOD Restoration from High Distortion in dMRI^†^†thanks: This work is supported by the National Institute of Health (NIH) under grants R01EB022744, RF1AG077578, RF1AG056573, RF1AG064584, R21AG064776, U19AG078109, and P41EB015922.

Shuo Huang 1122 Lujia Zhong 1133 Yonggang Shi S Huang and L Zhong contributed equally to this work.112233

Abstract

Fiber orientation distributions (FODs) is a popular model to represent the diffusion MRI (dMRI) data. However, imaging artifacts such as susceptibility-induced distortion in dMRI can cause signal loss and lead to the corrupted reconstruction of FODs, which prohibits successful fiber tracking and connectivity analysis in affected brain regions such as the brain stem. Generative models, such as the diffusion models, have been successfully applied in various image restoration tasks. However, their application on FOD images poses unique challenges since FODs are 4-dimensional data represented by spherical harmonics (SPHARM) with the 4-th dimension exhibiting order-related dependency. In this paper, we propose a novel diffusion model for FOD restoration that can recover the signal loss caused by distortion artifacts. We use volume-order encoding to enhance the ability of the diffusion model to generate individual FOD volumes at all SPHARM orders. Moreover, we add cross-attention features extracted across all SPHARM orders in generating every individual FOD volume to capture the order-related dependency across FOD volumes. We also condition the diffusion model with low-distortion FODs surrounding high-distortion areas to maintain the geometric coherence of the generated FODs. We trained and tested our model using data from the UK Biobank ( $n=1315$ ). On a test set with ground truth ( $n=43$ ), we demonstrate the high accuracy of the generated FODs in terms of root mean square errors of FOD volumes and angular errors of FOD peaks. We also apply our method to a test set with large distortion in the brain stem area ( $n=1172$ ) and demonstrate the efficacy of our method in restoring the FOD integrity and, hence, greatly improving tractography performance in affected brain regions.

Keywords:

Fiber orientation distribution FOD restoration Susceptibility distortion Diffusion model.

1 Introduction

Fiber orientation distribution (FOD) is a popular representation [3, 6] to model the configuration of fascicle trajectories based on high-resolution diffusion MRI (dMRI) and has been widely used in modern tractography techniques for the reconstruction of major fiber bundles [1]. Severe distortion artifacts, however, can prohibit the computation of valid FODs due to dMRI signal loss. While various distortion correction methods [23, 18, 22] were proposed to alleviate this problem, severe residual distortions are still widely present and pose a significant challenge to the successful reconstruction of fiber trajectories in regions with strong artifacts [2](Fig .1). Building upon recent successes of diffusion models [9] in various image restoration and synthesis tasks [16], we propose in this work a novel diffusion model-based method for the generative recovery of FODs in brain regions with severe signal loss.

Refer to caption — Figure 1: The residual distortion affects FODs and tractography. (a) The Topup method from FSL [18] aligns the $b=0$ images from 2 opposite phase encoding directions to correct the susceptibility-induced distortion. (b) Corrupted FODs and failed tractography of the data in (a) illustrate the impact of the signal loss that cannot be recovered by distortion correction. (c) FODs from a low signal loss case and the successful fiber tracking results.

For the generative restoration [25, 26, 7, 15] and synthesis [30, 31] of natural images, denoising diffusion probabilistic models (DDPM) [9, 16] have achieved great success because they were shown to be more stable in training and have more controllable generation procedures than other generative models such as GANs [29] and VAEs [33]. The application of diffusion models for FOD restoration, however, poses unique challenges. First, FODs are typically represented by spherical harmonics (SPHARM) up to a maximum order $L$ [6]. The FODs are thus 4-dimensional images because they consist of multiple 3-D volumes at each SPHARM order. This would require large GPU memory and an enormous number of parameters if we generate all FOD volumes together. Second, while it is memory-efficient to treat all FOD volumes equally in training a generative DDPM model, the order-dependency of the FOD volumes can make it hard to ensure the validity of the generated FODs as shown in Fig 2(a) and (b). Third, the generated FODs need to maintain geometric coherence with surrounded voxels with low distortion and more reliable representation.

In this work, we will develop a novel diffusion model-based method for the recovery of FODs in brain regions with high distortions, namely the FOD-Diffusion model. We develop a volume order-aware diffusion model to make the FOD-Diffusion model suitable for generating FOD volumes in both high- and low-frequency orders. We also provide a frequency order-balance cross-attention for extracting related information from all FOD volumes and helping the generation of individual FOD volumes. Experiments on the large-scale UK Biobank dataset ( $n=1315$ ) [24, 32] will demonstrate that our method successfully restores FODs from dMRI signal loss due to large residual distortion artifacts and hence greatly improved fiber tracking through affected brain stem regions.

2 Method

A detailed illustration of the proposed FOD-Diffusion model is shown in Fig. 3. We will explain each aspect of the model in detail in the following sections.

2.1 Order-aware Diffusion Model

We propose a volume order-aware diffusion model to enable the diffusion model to generate FODs across all different frequency orders. Firstly, inspired by the time encoding in the diffusion model [9, 8], we propose a volume and frequency order-aware encoding (referred to as volume encoding) to enhance the model’s ability to distinguish FODs of different frequencies and orientations. The volume encoding encodes an array $[L,V]$ , where $L$ and $V$ are the frequency order number and the volume number of the single FOD volume, respectively.

Moreover, we also use the low-signal loss regions as a condition to provide the model with sufficient constraints during generation. In the low-signal loss data, we assign high-signal loss regions to number 1 to distinguish them from the background.

We use the $L_{1}$ loss during training, and enhance the loss of the high-distortion regions, as shown in Eq. 1:

\mathcal{L}=0.01\times|\hat{x}-x|_{1}+0.99\times\mathds{1}(|\hat{x}-x|_{1}),

(1)

where $\hat{x}$ and $x$ represent the generated FODs and the ground truth of the FODs, respectively. The indicator function $\mathds{1}(x)=x$ is applied within the mask of the high-signal-loss regions, while $\mathds{1}(x)=0$ is applied in other voxels.

Since FODs at different orders have different gray-level ranges, they are normalized before generation. Components with $L=0$ are normalized to the range $[0,1]$ , while other FODs are normalized to $[-1,1]$ . This ensures that the regions outside the brain remain 0 for all normalized FODs.

2.2 Frequency-balanced Cross-attention

The FODs represented by SPHARM have interdependence between different volumes, and we model the relationship between different FOD volumes to help the generation. We propose a frequency-balanced cross-attention method that extracts features from all FOD volumes and feeds them to the U-Net in the diffusion model through the cross-attention [5]. This method achieves balanced attention across each frequency order.

As shown in Fig. 3, for FOD volumes in each frequency order, we calculate their average to reduce the volume number to one in each frequency order. Specifically, for order $L=0$ , as it only has one volume, it can be directly used. Then, we use copied U-Nets from the diffusion model with time encoding $t=0$ and volume encoding $V$ to extract features from the averaged FOD volumes in each frequency order. These features correspond to the output of the self-attention of each decoding block. Afterward, a frequency-order aware convolution with kernel size 1 is performed to select and combine the information that is most relevant to the FOD volume that needs to be generated.

We use copied U-Nets from the diffusion model’s U-Net to extract cross-attention features. The parameters in copied U-Nets are frozen during the training. We copy the parameters to the copied U-Net when we update the U-Net in the diffusion model. This approach reduces the number of parameters that need to be trained; hence, it reduces the training time and prevents overfitting.

More detail of the frequency order-aware convolution is shown in Fig. 4. Firstly, we use sine and cosine functions to embed the order number. Afterward, a multilayer perceptron (MLP) layer is added to calculate the encoding weights $[e_{1},e_{2},...,e_{n}]^{T}$ . Eq. 2 is used to encode the features $F$ from each frequency order by adding the encoding weights:

E_{L=m}=F_{L=m}+e_{m}J,

(2)

where $m=0,2,...,8$ is the frequency order, and $J$ denotes the all-ones matrix.

The combined features from the 1×1 Conv are fed into the cross-attention through the Key and the Value in the U-Net of the diffusion model, and the Query is the feature from the single volume. The most useful information from other FOD volumes is selected from the Value to enhance the restoration of the current FOD volume under consideration [5].

3 Results and Discussion

3.1 Dataset and Preprocessing

Our FOD-Diffusion method was trained, validated, and tested on the UK Biobank dataset. Overall, this study utilized $n=1315$ data points. They were extracted from the dataset randomly, with ages ranging from 30 years old to 80 years old.

All extracted data were pre-processed using Topup [18] and Eddy [20, 21] for correcting distortion, eddy current, and head motion. Then we calculated the FODs with the highest order $L=8$ using the method in Ref. [6]. To extract low-residual distortion cases for model training, we used the method in Ref. [4] to calculate the severity map of residual distortion, and we extracted 143 data that have both low mean signal loss ( $<0.25$ ) and high mean FOD integrity ( $>0.09$ ), which contains about 10% of the data, for model training, validating and testing. We used $n=90$ data for training, $n=10$ data for validation, and $n=43$ data for testing. We masked out regions in the brainstem of these data to act as the high-signal loss regions. The remaining $n=1172$ data that have high residual distortion formed a large test set to test the performance of the model in real data with high distortion. The masks of high-distortion regions were extracted based on the residual distortion severity map.

3.2 Model Training and Validation

The model was trained for 100,000 iterations with the batch size of 8. We used AdamW optimizer in the training, with a learning rate of $10^{-5}$ initially and $10^{-6}$ after 70,000 iterations. We validated the model in every epoch, and the model with the least validation loss was selected. The training took about 68.5 hours on an NVIDIA RTX A5000 GPU, with about 22 GB of GPU memory. We used the DDPM for 1000 time steps in the inference, and the v-prediction [27] was used in all experiments. The inference time for a single FOD volume was about 90 seconds.

3.3 Ablation Studies

Fig. 5 shows two examples of the FODs for the ground truth and the results of our FOD-Diffusion model. These examples are from the test set. FODs generated by our FOD-diffusion model have high angular similarity with the ground truth. For the similarity of intensity, although some FODs have different intensities with the ground truth, the intensity is similar for most FODs.

We then quantitatively compared our model with the unconditional DDPM and two ablation studies. The first ablation only takes the low-signal loss FOD volume (named vol) as the condition, and the second one also has the volume encoding (named enc). Table 1 shows the mean squared error of brain stem results from different methods. The unconditional diffusion model failed in FODs’ generation. Results of the ablation study show that all the improvements we added helped to improve the accuracy of the generated FODs.

We also compared the geometric differences between different methods, following the work in Ref. [22]. We first calculated the angles of the FOD peaks with the top three largest amplitudes that have peak values larger than 0.5 from both the ground truth FODs and the results FODs. Afterward, we calculated the corresponding smallest angular differences for the highest and second highest peak from the ground truth FODs (called “ $1^{st}$ peak” and “ $2^{nd}$ peak”) to the peaks of the result FODs. The results are shown in Table 2. All ablation studies significantly overcome the unconditional diffusion model, and our FOD-Diffusion model has less angular differences and standard derivations than other models in the ablation studies.

Table 1: Root Mean Squared Errors of FODs in Test Set.

Method	$L=0$	$L=2$	$L=4$	$L=6$	$L=8$	FODs
Unconditional DDPM	0.0976	0.0664	0.0453	0.0245	0.0114	0.0365
Diffusion + vol	0.0136	0.0176	0.0143	0.0089	0.0046	0.0105
Diffusion + vol + enc	0.0126	0.0152	0.0131	0.0088	0.0045	0.0097
FOD-Diffusion	0.0121	0.0141	0.0129	0.0089	0.0045	0.0094

Table 2: Angular Differences for FODs in Test Set.

Method	$1^{st}$ peak	$2^{nd}$ peak
Unconditional DDPM	$62.7\degree\pm 2.1\degree$	$64.1\degree\pm 2.1\degree$
Diffusion + vol	$2.2\degree\pm 1.0\degree$	$7.5\degree\pm 1.5\degree$
Diffusion + vol + enc	$2.1\degree\pm 0.9\degree$	$7.3\degree\pm 1.4\degree$
FOD-diffusion	$\textbf{2.0}\degree\pm\textbf{0.8}\degree$	$\textbf{7.2}\degree\pm\textbf{1.4}\degree$

3.4 Performance on Data with High Distortion

Because we do not have ground truth for high distortion data, we will use the group-wise distribution on the FODs’ integrity at the pons regions to evaluate the FOD-Diffusion model in the high signal loss UK Biobank data ( $n=1172$ ). Here we compare our FOD-Diffusion method with the Topup method, as it is a representative registration-based method and the most widely used method in connectome research.

Fig. 6 shows the distribution of the FODs’ integrity at the pons region before and after the signal loss recovery. The 1172 data are evenly divided into five groups based on the severity of signal loss, named groups of “very slight”, “slight”, “medium”, “severe”, and “very severe” residual distortion, respectively. Each of the first four groups has 236 subjects, and the last group has 228 subjects. The mean intensity of the $L=0$ component at the pons ROIs (the atlas is defined in Ref. [28]) in Fig. 6(a) was calculated and used as a measure of FOD integrity in the evaluation following the work in Ref. [4]. This evaluation method is efficient because the intensity of the $L=0$ component represents the mean FOD value [6], and the mean FOD decreases with the increase of the residual distortion. Fig. 6 (b) shows that while the FOD integrity of the original FODs reduces with the increase of the severity in signal loss, the integrity of restored FODs have similar distributions across the groups. This shows that our method can recover signal loss for data with high residual distortions.

The trachography results for two high signal loss cases are shown in Fig. 7. Here, we calculated the corticospinal tract (CST) [11] and the middle cerebellar peduncle (MCP) [12] for both the Topup results and the results of our FOD-Diffusion method. These two fiber bundles share the same region of interest at the pons (the red mask in Fig. 6(a)) for generating the seeds. Fig. 7 shows that the tractography results of CST using FOD-Diffusion cover more pons region, and the MCP generated using FOD-Diffusion can cover the whole pons region. Therefore, our FOD-Diffusion method can successfully restore FOD-based fiber connectivity at the pons region.

4 Conclusion

In this work, we proposed a diffusion model-based method for FOD restoration to recover the signal loss caused by high residual distortions. For the generation of complex 4D FOD data, our model is memory efficient that can be trained on one GPU with 24GB GPU memory. We demonstrated the performance of our method in brainstem regions using data from UK Biobank. In data with ground truth (low distortion), we quantitatively validated the accuracy of our generated FODs. For data with high distortion, we demonstrated the restoration of FOD integrity and the potential of restored FODs in helping fiber tracking of important brainstem pathways.

Future work will include testing our method in more datasets from clinical studies, such as Alzheimer’s Disease Neuroimaging Initiative (ADNI) [35] and Health & Aging Brain among Latino Elders (HABLE) [34]. We will also test our method in more complex brain regions with high residual distortions such as the temporal lobe.

4.0.1 Acknowledgements

Authors thank Miss Mahsa Torfeh and Miss Yi Liu from University of Southern California for the help in editing the grammar of this work. We also appreciate Mr. Wenhao Chi, Mr. Jianwei Zhang, Mr. Zhiwei Deng, Miss Jiaxin Yue and Dr. Xinyu Nie for the useful discussion and suggestions.

References

[1] Aydogan, D.B., Shi, Y.: Parallel transport tractography. IEEE Transactions on Medical Imaging 40(2), 635–647 (2021)
[2] Tang, Y., Sun, W., Toga, A.W., Ringman, J.M., Shi, Y.: A probabilistic atlas of human brainstem pathways based on connectome imaging data. NeuroImage 169, 227–239 (2018)
[3] Tournier, J.D., Calamante, F., Gadian, D.G., Connelly, A.: Direct estimation of the fiber orientation density function from diffusion-weighted mri data using spherical deconvolution. NeuroImage 23(3), 1176–1185 (2004)
[4] Huang, S., Zhong, L., Shi, Y.: Automated mapping of residual distortion severity in diffusion mri. In: International Workshop on Computational Diffusion MRI. pp. 58–69. Springer (2023)
[5] Xu, Z., Zhang, J., Liew, J.H., Yan, H., Liu, J.W., Zhang, C., Feng, J., Shou, M.Z.: Magicanimate: Temporally consistent human image animation using diffusion model. arXiv preprint arXiv:2311.16498 (2023)
[6] Tran, G., Shi, Y.: Fiber orientation and compartment parameter estimation from multi-shell diffusion imaging. IEEE transactions on medical imaging 34(11), 2320–2332 (2015)
[7] Wei, C., Mangalam, K., Huang, P.Y., Li, Y., Fan, H., Xu, H., Wang, H., Xie, C., Yuille, A., Feichtenhofer, C.: Diffusion models as masked autoencoders. arXiv preprint arXiv:2304.03283 (2023)
[8] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
[9] Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems 33, 6840–6851 (2020)
[10] Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
[11] Weiss, C., Tursunova, I., Neuschmelting, V., Lockau, H., Nettekoven, C., Oros-Peusquens, A.M., Stoffels, G., Rehme, A.K., Faymonville, A.M., Shah, N.J., et al.: Improved ntms-and dti-derived cst tractography through anatomical roi seeding on anterior pontine level compared to internal capsule. NeuroImage: Clinical 7, 424–437 (2015)
[12] Beez, T., Munoz-Bendix, C., Steiger, H.J., Hänggi, D.: Functional tracts of the cerebellum—essentials for the neurosurgeon. Neurosurgical Review 44, 273–278 (2021)
[13] Sun, W., Amezcua, L., Shi, Y.: Fod restoration for enhanced mapping of white matter lesion connectivity. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 584–592. Springer (2017)
[14] Li, J., Shi, Y., Toga, A.W.: Mapping brain anatomical connectivity using diffusion magnetic resonance imaging: Structural connectivity of the human brain. IEEE signal processing magazine 33(3), 36–51 (2016)
[15] Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: Inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11461–11471 (2022)
[16] Kazerouni, A., Aghdam, E.K., Heidari, M., Azad, R., Fayyaz, M., Hacihaliloglu, I., Merhof, D.: Diffusion models in medical imaging: A comprehensive survey. Medical Image Analysis p. 102846 (2023)
[17] Tax, C.M., Bastiani, M., Veraart, J., Garyfallidis, E., Irfanoglu, M.O.: What’s new and what’s next in diffusion mri preprocessing. NeuroImage 249, 118830 (2022)
[18] Andersson, J.L., Skare, S., Ashburner, J.: How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging. Neuroimage 20(2), 870–888 (2003)
[19] Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: Fsl. Neuroimage 62(2), 782–790 (2012)
[20] Andersson, J.L., Sotiropoulos, S.N.: Non-parametric representation and prediction of single-and multi-shell diffusion-weighted mri data using gaussian processes. Neuroimage 122, 166–176 (2015)
[21] Andersson, J.L., Graham, M.S., Drobnjak, I., Zhang, H., Campbell, J.: Susceptibility-induced distortion that varies due to motion: Correction in diffusion mr without acquiring additional data. Neuroimage 171, 277–295 (2018)
[22] Qiao, Y., Shi, Y.: Unsupervised deep learning for fod-based susceptibility distortion correction in diffusion mri. IEEE transactions on medical imaging 41(5), 1165–1175 (2021)
[23] Schilling, K.G., Blaber, J., Huo, Y., Newton, A., Hansen, C., Nath, V., Shafer, A.T., Williams, O., Resnick, S.M., Rogers, B., et al.: Synthesized b0 for diffusion distortion correction (synb0-disco). Magnetic resonance imaging 64, 62–70 (2019)
[24] Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., et al.: Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12(3), e1001779 (2015)
[25] Xia, B., Zhang, Y., Wang, S., Wang, Y., Wu, X., Tian, Y., Yang, W., Van Gool, L.: Diffir: Efficient diffusion model for image restoration. arXiv preprint arXiv:2303.09472 (2023)
[26] Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1680–1691 (2023)
[27] Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022)
[28] Oishi, K., Faria, A., Jiang, H., Li, X., Akhter, K., Zhang, J., Hsu, J.T., Miller, M.I., van Zijl, P.C., Albert, M., et al.: Atlas-based whole brain white matter analysis using large deformation diffeomorphic metric mapping: application to normal elderly and alzheimer’s disease participants. Neuroimage 46(2), 486–499 (2009)
[29] Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34, 8780–8794 (2021)
[30] Jiang, L., Mao, Y., Wang, X., Chen, X., Li, C.: Cola-diff: Conditional latent diffusion model for multi-modal mri synthesis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 398–408. Springer (2023)
[31] Zhu, L., Xue, Z., Jin, Z., Liu, X., He, J., Liu, Z., Yu, L.: Make-a-volume: Leveraging latent diffusion models for cross-modality 3d brain mri synthesis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 592–601. Springer (2023)
[32] UK Biobank Homepage, https://www.ukbiobank.ac.uk. Last accessed 5 March 2024
[33] Pandey, K., Mukherjee, A., Rai, P., Kumar, A.: Vaes meet diffusion models: Efficient and high-fidelity generation. In: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications (2021)
[34] O’Bryant, S.E., Johnson, L.A., Barber, R.C., Braskie, M.N., Christian, B., Hall, J.R., Hazra, N., King, K., Kothapalli, D., Large, S., et al.: The health & aging brain among latino elders (hable) study methods and participant characteristics. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring 13(1), e12202 (2021)
[35] Petersen, R.C., Aisen, P.S., Beckett, L.A., Donohue, M.C., Gamst, A.C., Harvey, D.J., Jack, C.R., Jagust, W.J., Shaw, L.M., Toga, A.W., et al.: Alzheimer’s disease neuroimaging initiative (adni): clinical characterization. Neurology 74(3), 201–209 (2010)

Diffusion Model-based FOD Restoration from High Distortion in dMRI††thanks: This work is supported by the National Institute of Health (NIH) under grants R01EB022744, RF1AG077578, RF1AG056573, RF1AG064584, R21AG064776, U19AG078109, and P41EB015922.