Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Karakida, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13426  [pdf, other

    cs.LG cs.CV

    Optimal Layer Selection for Latent Data Augmentation

    Authors: Tomoumi Takase, Ryo Karakida

    Abstract: While data augmentation (DA) is generally applied to input data, several studies have reported that applying DA to hidden layers in neural networks, i.e., feature augmentation, can improve performance. However, in previous studies, the layers to which DA is applied have not been carefully considered, often being applied randomly and uniformly or only to a specific layer, leaving room for arbitrari… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:2406.12220  [pdf, other

    cs.LG cond-mat.dis-nn cs.CV cs.NE stat.ML

    Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking

    Authors: Ryo Karakida, Toshihiro Ota, Masato Taki

    Abstract: Transformers have established themselves as the leading neural network model in natural language processing and are increasingly foundational in various domains. In vision, the MLP-Mixer model has demonstrated competitive performance, suggesting that attention mechanisms might not be indispensable. Inspired by this, recent research has explored replacing attention modules with other mechanisms, in… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages

  3. arXiv:2402.02098  [pdf, other

    stat.ML cs.LG

    Self-attention Networks Localize When QK-eigenspectrum Concentrates

    Authors: Han Bao, Ryuichiro Hataya, Ryo Karakida

    Abstract: The self-attention mechanism prevails in modern machine learning. It has an interesting functionality of adaptively selecting tokens from an input sequence by modulating the degree of attention localization, which many researchers speculate is the basis of the powerful model performance but complicates the underlying mechanism of the learning dynamics. In recent years, mainly two arguments have co… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  4. arXiv:2312.12226  [pdf, other

    cs.LG

    On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

    Authors: Satoki Ishikawa, Ryo Karakida

    Abstract: Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner even if the network width increases significantly. Inspired by a… ▽ More

    Submitted 8 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: 34 pages, ICLR 2024

  5. arXiv:2306.01470  [pdf, other

    cs.LG stat.ML

    Understanding MLP-Mixer as a Wide and Sparse MLP

    Authors: Tomohiro Hayase, Ryo Karakida

    Abstract: Multi-layer perceptron (MLP) is a fundamental component of deep learning, and recent MLP-based architectures, especially the MLP-Mixer, have achieved significant empirical success. Nevertheless, our understanding of why and how the MLP-Mixer outperforms conventional MLPs remains largely unexplored. In this work, we reveal that sparseness is a key mechanism underlying the MLP-Mixers. First, the Mix… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted in ICML 2024

  6. arXiv:2212.04692  [pdf, other

    cs.LG cs.NE stat.ML

    Attention in a family of Boltzmann machines emerging from modern Hopfield networks

    Authors: Toshihiro Ota, Ryo Karakida

    Abstract: Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models. Recent studies on modern Hopfield networks have broaden the class of energy functions and led to a unified perspective on general Hopfield networks including an attention module. In this letter, we consider the BM counterparts of modern Hopfield networks using the associated energy functions, and stu… ▽ More

    Submitted 28 March, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: 15 pages, 3 figures. v2: added figures and various corrections/improvements especially in Introduction and Section 3. Published version

    Report number: RIKEN-iTHEMS-Report-22

  7. arXiv:2210.02720  [pdf, other

    cs.LG stat.ML

    Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

    Authors: Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

    Abstract: Gradient regularization (GR) is a method that penalizes the gradient norm of the training loss during training. While some studies have reported that GR can improve generalization performance, little attention has been paid to it from the algorithmic perspective, that is, the algorithms of GR that efficiently improve the performance. In this study, we first reveal that a specific finite-difference… ▽ More

    Submitted 2 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

  8. arXiv:2202.05254  [pdf, other

    cs.LG

    Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

    Authors: Kaito Watanabe, Kotaro Sakamoto, Ryo Karakida, Sho Sonoda, Shun-ichi Amari

    Abstract: A biological neural network in the cortex forms a neural field. Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields. In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields. We empirically compa… ▽ More

    Submitted 6 January, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

  9. arXiv:2112.01653  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning Curves for Continual Learning in Neural Networks: Self-Knowledge Transfer and Forgetting

    Authors: Ryo Karakida, Shotaro Akaho

    Abstract: Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a so… ▽ More

    Submitted 17 March, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: 27 pages, 6 figures

    Journal ref: Published in ICLR 2022

  10. arXiv:2010.15434  [pdf

    cs.LG

    Self-paced Data Augmentation for Training Neural Networks

    Authors: Tomoumi Takase, Ryo Karakida, Hideki Asoh

    Abstract: Data augmentation is widely used for machine learning; however, an effective method to apply data augmentation has not been established even though it includes several factors that should be tuned carefully. One such factor is sample suitability, which involves selecting samples that are suitable for data augmentation. A typical method that applies data augmentation to all training samples disrega… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: 22 pages

  11. arXiv:2010.00879  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

    Authors: Ryo Karakida, Kazuki Osawa

    Abstract: Natural Gradient Descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast in practice. Nevertheless, it remains unclear from the theoretical perspective… ▽ More

    Submitted 7 December, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  12. arXiv:2006.07814  [pdf, other

    stat.ML cs.LG math.PR

    The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

    Authors: Tomohiro Hayase, Ryo Karakida

    Abstract: The Fisher information matrix (FIM) is fundamental to understanding the trainability of deep neural nets (DNN), since it describes the parameter space's local metric. We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks achieving dynamical isometry. Then, while dynamical isometry is known to keep specific b… ▽ More

    Submitted 29 March, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: Accepted into AISTATS2021

    MSC Class: 68T07; 46L54; 60B20; 62E20 ACM Class: G.3

  13. arXiv:1910.05992  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Pathological spectra of the Fisher information metric and its variants in deep neural networks

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth and sample size when the network has random weights and is sufficiently wi… ▽ More

    Submitted 27 September, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: 23 pages, 7 figures; v2: minor improvements, Section 3.4 added

  14. arXiv:1906.02926  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions.… ▽ More

    Submitted 28 October, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: To appear in NeurIPS 2019

  15. arXiv:1808.07172  [pdf, ps, other

    cs.LG cond-mat.dis-nn stat.ML

    Fisher Information and Natural Gradient Learning of Random Deep Networks

    Authors: Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

    Abstract: A deep neural network is a hierarchical nonlinear model transforming input signals to output signals. Its input-output relation is considered to be stochastic, being described for a given input by a parameterized conditional probability distribution of outputs. The space of parameters consisting of weights and biases is a Riemannian manifold, where the metric is defined by the Fisher information m… ▽ More

    Submitted 21 August, 2018; originally announced August 2018.

    Comments: 22 pages, 2 figures

  16. arXiv:1808.07169  [pdf, ps, other

    cond-mat.dis-nn cs.LG stat.ML

    Statistical Neurodynamics of Deep Networks: Geometry of Signal Spaces

    Authors: Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

    Abstract: Statistical neurodynamics studies macroscopic behaviors of randomly connected neural networks. We consider a deep layered feedforward network where input signals are processed layer by layer. The manifold of input signals is embedded in a higher dimensional manifold of the next layer as a curved submanifold, provided the number of neurons is larger than that of inputs. We show geometrical features… ▽ More

    Submitted 21 August, 2018; originally announced August 2018.

    Comments: 23 pages, 8 figures

  17. arXiv:1806.01316  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statisti… ▽ More

    Submitted 8 October, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted at AISTATS2019. Main text: 10 pages, 2 figures. Supplementary material: 9 pages, 2 figures, typos corrected

  18. arXiv:1712.04195  [pdf, other

    stat.ML cs.LG cs.NE q-bio.NC

    Concept Formation and Dynamics of Repeated Inference in Deep Generative Models

    Authors: Yoshihiro Nagano, Ryo Karakida, Masato Okada

    Abstract: Deep generative models are reported to be useful in broad applications including image generation. Repeated inference between data space and latent space in these models can denoise cluttered images and improve the quality of inferred results. However, previous studies only qualitatively evaluated image outputs in data space, and the mechanism behind the inference has not been investigated. The pu… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: 20 pages, 9 figures

  19. arXiv:1709.10219  [pdf, other

    math.OC cs.IT

    Information Geometry Connecting Wasserstein Distance and Kullback-Leibler Divergence via the Entropy-Relaxed Transportation Problem

    Authors: Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

    Abstract: Two geometrical structures have been extensively studied for a manifold of probability distributions. One is based on the Fisher information metric, which is invariant under reversible transformations of random variables, while the other is based on the Wasserstein distance of optimal transportation, which reflects the structure of the distance between random variables. Here, we propose a new info… ▽ More

    Submitted 28 September, 2017; originally announced September 2017.