Search | arXiv e-print repository

Applicability of scaling laws to vision encoding models

Authors: Takuya Matsuyama, Kota S Sasaki, Shinji Nishimoto

Abstract: In this paper, we investigated how to build a high-performance vision encoding model to predict brain activity as part of our participation in the Algonauts Project 2023 Challenge. The challenge provided brain activity recorded by functional MRI (fMRI) while participants viewed images. Several vision models with parameter sizes ranging from 86M to 4.3B were used to build predictive models. To buil… ▽ More In this paper, we investigated how to build a high-performance vision encoding model to predict brain activity as part of our participation in the Algonauts Project 2023 Challenge. The challenge provided brain activity recorded by functional MRI (fMRI) while participants viewed images. Several vision models with parameter sizes ranging from 86M to 4.3B were used to build predictive models. To build highly accurate models, we focused our analysis on two main aspects: (1) How does the sample size of the fMRI training set change the prediction accuracy? (2) How does the prediction accuracy across the visual cortex vary with the parameter size of the vision models? The results show that as the sample size used during training increases, the prediction accuracy improves according to the scaling law. Similarly, we found that as the parameter size of the vision models increases, the prediction accuracy improves according to the scaling law. These results suggest that increasing the sample size of the fMRI training set and the parameter size of visual models may contribute to more accurate visual models of the brain and lead to a better understanding of visual neuroscience. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 7 pages, 3 figures

arXiv:2307.11078 [pdf, other]

Brain2Music: Reconstructing Music from Human Brain Activity

Authors: Timo I. Denk, Yu Takagi, Takuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, Shinji Nishimoto

Abstract: The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derive… ▽ More The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Preprint; 21 pages; supplementary material: https://google-research.github.io/seanet/brain2music

arXiv:2306.11536 [pdf, other]

Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs

Authors: Yu Takagi, Shinji Nishimoto

Abstract: The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natura… ▽ More The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natural images has greatly improved its quality, and approaches that combine the diverse information contained in visual experiences have proliferated rapidly in recent years. In this technical paper, by taking advantage of the simple and generic framework that we proposed (Takagi and Nishimoto, CVPR 2023), we examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction. Specifically, we combined our earlier work with the following three techniques: using decoded text from brain activity, nonlinear optimization for structural image reconstruction, and using decoded depth information from brain activity. We confirmed that these techniques contributed to improving accuracy over the baseline. We also discuss what researchers should consider when performing visual reconstruction using deep generative models trained on large datasets. Please check our webpage at https://sites.google.com/view/stablediffusion-with-brain/. Code is also available at https://github.com/yu-takagi/StableDiffusionReconstruction. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:1912.01227 [pdf]

Geodesic Folding of Tetrahedron

Authors: Seri Nishimoto, Takashi Horiyama, Tomohiro Tachi

Abstract: In this work, we show the geometric properties of a family of polyhedra obtained by folding a regular tetrahedron along regular triangular grids. Each polyhedron is identified by a pair of nonnegative integers. The polyhedron can be cut along a geodesic strip of triangles to be decomposed and unfolded into one or multiple bands (homeomorphic to a cylinder). The number of bands is the greatest comm… ▽ More In this work, we show the geometric properties of a family of polyhedra obtained by folding a regular tetrahedron along regular triangular grids. Each polyhedron is identified by a pair of nonnegative integers. The polyhedron can be cut along a geodesic strip of triangles to be decomposed and unfolded into one or multiple bands (homeomorphic to a cylinder). The number of bands is the greatest common divisor of the two numbers. By a proper choice of pairs of numbers, we can create a common triangular band that folds into different multiple polyhedra that belongs to the family. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Journal ref: Journal of the International Society for the Interdisciplinary Study of Symmetry, Symmetry: Art and Science, 2019, Special Issue: 11th Congress and Exhibition of SIS, Kanazawa, Japan, November 25-30, 2019

arXiv:1905.10037 [pdf, other]

doi 10.1609/aaai.v34i04.5974

Brain-mediated Transfer Learning of Convolutional Neural Networks

Authors: Satoshi Nishida, Yusuke Nakano, Antoine Blanc, Naoya Maeda, Masataka Kado, Shinji Nishimoto

Abstract: The human brain can effectively learn a new task from a small number of samples, which indicate that the brain can transfer its prior knowledge to solve tasks in different domains. This function is analogous to transfer learning (TL) in the field of machine learning. TL uses a well-trained feature space in a specific task domain to improve performance in new tasks with insufficient training data.… ▽ More The human brain can effectively learn a new task from a small number of samples, which indicate that the brain can transfer its prior knowledge to solve tasks in different domains. This function is analogous to transfer learning (TL) in the field of machine learning. TL uses a well-trained feature space in a specific task domain to improve performance in new tasks with insufficient training data. TL with rich feature representations, such as features of convolutional neural networks (CNNs), shows high generalization ability across different task domains. However, such TL is still insufficient in making machine learning attain generalization ability comparable to that of the human brain. To examine if the internal representation of the brain could be used to achieve more efficient TL, we introduce a method for TL mediated by human brains. Our method transforms feature representations of audiovisual inputs in CNNs into those in activation patterns of individual brains via their association learned ahead using measured brain responses. Then, to estimate labels reflecting human cognition and behavior induced by the audiovisual inputs, the transformed representations are used for TL. We demonstrate that our brain-mediated TL (BTL) shows higher performance in the label estimation than the standard TL. In addition, we illustrate that the estimations mediated by different brains vary from brain to brain, and the variability reflects the individual variability in perception. Thus, our BTL provides a framework to improve the generalization ability of machine-learning feature representations and enable machine learning to estimate human-like cognition and behavior, including individual variability. △ Less

Submitted 8 October, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

Journal ref: Proc. Thirty-Fourth AAAI Conf. Artif. Intell. (2020) 5281-5288

arXiv:1802.02210 [pdf, other]

Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli

Authors: Eri Matsuo, Ichiro Kobayashi, Shinji Nishimoto, Satoshi Nishida, Hideki Asoh

Abstract: Quantitative modeling of human brain activity based on language representations has been actively studied in systems neuroscience. However, previous studies examined word-level representation, and little is known about whether we could recover structured sentences from brain activity. This study attempts to generate natural language descriptions of semantic contents from human brain activity evoke… ▽ More Quantitative modeling of human brain activity based on language representations has been actively studied in systems neuroscience. However, previous studies examined word-level representation, and little is known about whether we could recover structured sentences from brain activity. This study attempts to generate natural language descriptions of semantic contents from human brain activity evoked by visual stimuli. To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image-captioning network model using a deep learning framework. To apply brain activity to the image-captioning network, we train regression models that learn the relationship between brain activity and deep-layer image features. The results demonstrate that the proposed model can decode brain activity and generate descriptions using natural language sentences. We also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for sentence generations is widespread across the entire cortex. △ Less

Submitted 19 January, 2018; originally announced February 2018.

Comments: 11 pages, 8 figures

Showing 1–6 of 6 results for author: Nishimoto, S