MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Lu, Yizhuo; Du, Changde; Wang, Dianpeng; He, Huiguang

Abstract:Reconstructing visual stimuli from measured functional magnetic resonance imaging (fMRI) has been a meaningful and challenging task. Previous studies have successfully achieved reconstructions with structures similar to the original images, such as the outlines and size of some natural images. However, these reconstructions lack explicit semantic information and are difficult to discern. In recent years, many studies have utilized multi-modal pre-trained models with stronger generative capabilities to reconstruct images that are semantically similar to the original ones. However, these images have uncontrollable structural information such as position and orientation. To address both of the aforementioned issues simultaneously, we propose a two-stage image reconstruction model called MindDiffuser, utilizing Stable Diffusion. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion, which yields a preliminary image that contains semantic and structural information. In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information, and continually adjust the two features in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our proposed model has surpassed the current state-of-the-art models in terms of reconstruction results on Natural Scenes Dataset (NSD). Furthermore, the results of ablation experiments indicate that each component of our model is effective for image reconstruction.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.14139 [cs.CV]
	(or arXiv:2303.14139v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.14139

Computer Science > Computer Vision and Pattern Recognition

Title:MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators