Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Amari, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2202.05254  [pdf, other

    cs.LG

    Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

    Authors: Kaito Watanabe, Kotaro Sakamoto, Ryo Karakida, Sho Sonoda, Shun-ichi Amari

    Abstract: A biological neural network in the cortex forms a neural field. Neurons in the field have their own receptive fields, and connection weights between two neurons are random but highly correlated when they are in close proximity in receptive fields. In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields. We empirically compa… ▽ More

    Submitted 6 January, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

  2. arXiv:2006.10732  [pdf, other

    stat.ML cs.LG

    When Does Preconditioning Help or Hurt Generalization?

    Authors: Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

    Abstract: While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question. This work presents a more nuanced view on how the \textit{implicit bias} of first- and second-order methods affects the comparison of generalization properties. We provide an exact asymptotic bias-variance decomposition of the generalizatio… ▽ More

    Submitted 8 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 42 pages

  3. arXiv:2001.06931  [pdf, ps, other

    stat.ML cs.LG

    Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

    Authors: Shun-ichi Amari

    Abstract: It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the p… ▽ More

    Submitted 17 March, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

  4. arXiv:1910.05992  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Pathological spectra of the Fisher information metric and its variants in deep neural networks

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth and sample size when the network has random weights and is sufficiently wi… ▽ More

    Submitted 27 September, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: 23 pages, 7 figures; v2: minor improvements, Section 3.4 added

  5. arXiv:1906.02926  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions.… ▽ More

    Submitted 28 October, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: To appear in NeurIPS 2019

  6. arXiv:1808.07172  [pdf, ps, other

    cs.LG cond-mat.dis-nn stat.ML

    Fisher Information and Natural Gradient Learning of Random Deep Networks

    Authors: Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

    Abstract: A deep neural network is a hierarchical nonlinear model transforming input signals to output signals. Its input-output relation is considered to be stochastic, being described for a given input by a parameterized conditional probability distribution of outputs. The space of parameters consisting of weights and biases is a Riemannian manifold, where the metric is defined by the Fisher information m… ▽ More

    Submitted 21 August, 2018; originally announced August 2018.

    Comments: 22 pages, 2 figures

  7. arXiv:1808.07169  [pdf, ps, other

    cond-mat.dis-nn cs.LG stat.ML

    Statistical Neurodynamics of Deep Networks: Geometry of Signal Spaces

    Authors: Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

    Abstract: Statistical neurodynamics studies macroscopic behaviors of randomly connected neural networks. We consider a deep layered feedforward network where input signals are processed layer by layer. The manifold of input signals is embedded in a higher dimensional manifold of the next layer as a curved submanifold, provided the number of neurons is larger than that of inputs. We show geometrical features… ▽ More

    Submitted 21 August, 2018; originally announced August 2018.

    Comments: 23 pages, 8 figures

  8. arXiv:1806.01316  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statisti… ▽ More

    Submitted 8 October, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted at AISTATS2019. Main text: 10 pages, 2 figures. Supplementary material: 9 pages, 2 figures, typos corrected

  9. arXiv:1709.10219  [pdf, other

    math.OC cs.IT

    Information Geometry Connecting Wasserstein Distance and Kullback-Leibler Divergence via the Entropy-Relaxed Transportation Problem

    Authors: Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

    Abstract: Two geometrical structures have been extensively studied for a manifold of probability distributions. One is based on the Fisher information metric, which is invariant under reversible transformations of random variables, while the other is based on the Wasserstein distance of optimal transportation, which reflects the structure of the distance between random variables. Here, we propose a new info… ▽ More

    Submitted 28 September, 2017; originally announced September 2017.

  10. arXiv:1709.02050  [pdf, ps, other

    cs.IT

    Geometry of Information Integration

    Authors: Shun-ichi Amari, Naotsugu Tsuchiya, Masafumi Oizumi

    Abstract: Information geometry is used to quantify the amount of information integration within multiple terminals of a causal dynamical system. Integrated information quantifies how much information is lost when a system is split into parts and information transmission between the parts is removed. Multiple measures have been proposed as a measure of integrated information. Here, we analyze four of the pre… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  11. A unified framework for information integration based on information geometry

    Authors: Masafumi Oizumi, Naotsugu Tsuchiya, Shun-ichi Amari

    Abstract: We propose a unified theoretical framework for quantifying spatio-temporal interactions in a stochastic dynamical system based on information geometry. In the proposed framework, the degree of interactions is quantified by the divergence between the actual probability distribution of the system and a constrained probability distribution where the interactions of interest are disconnected. This fra… ▽ More

    Submitted 15 October, 2015; originally announced October 2015.

  12. Measuring integrated information from the decoding perspective

    Authors: Masafumi Oizumi, Shun-ichi Amari, Toru Yanagawa, Naotaka Fujii, Naotsugu Tsuchiya

    Abstract: Accumulating evidence indicates that the capacity to integrate information in the brain is a prerequisite for consciousness. Integrated Information Theory (IIT) of consciousness provides a mathematical approach to quantifying the information integrated in a system, called integrated information, $Φ$. Integrated information is defined theoretically as the amount of information a system generates as… ▽ More

    Submitted 17 May, 2015; originally announced May 2015.

    Journal ref: PLoS Comput Biol 12(1), e1004654, 2016

  13. arXiv:1412.7146  [pdf, other

    stat.CO cs.IT

    Log-Determinant Divergences Revisited: Alpha--Beta and Gamma Log-Det Divergences

    Authors: Andrzej Cichocki, Sergio Cruces, Shun-Ichi Amari

    Abstract: In this paper, we review and extend a family of log-det divergences for symmetric positive definite (SPD) matrices and discuss their fundamental properties. We show how to generate from parameterized Alpha-Beta (AB) and Gamma Log-det divergences many well known divergences, for example, the Stein's loss, S-divergence, called also Jensen-Bregman LogDet (JBLD) divergence, the Logdet Zero (Bhattachar… ▽ More

    Submitted 23 December, 2014; v1 submitted 18 December, 2014; originally announced December 2014.

    Comments: 35 pages, 4 figures

  14. Bayesian Robust Tensor Factorization for Incomplete Multiway Data

    Authors: Qibin Zhao, Guoxu Zhou, Liqing Zhang, Andrzej Cichocki, Shun-ichi Amari

    Abstract: We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is mo… ▽ More

    Submitted 16 April, 2015; v1 submitted 9 October, 2014; originally announced October 2014.

    Comments: in IEEE Transactions on Neural Networks and Learning Systems, 2015

  15. arXiv:1311.5125  [pdf, ps, other

    cs.IT

    On conformal divergences and their population minimizers

    Authors: Richard Nock, Frank Nielsen, Shun-ichi Amari

    Abstract: Total Bregman divergences are a recent tweak of ordinary Bregman divergences originally motivated by applications that required invariance by rotations. They have displayed superior results compared to ordinary Bregman divergences on several clustering, computer vision, medical imaging and machine learning tasks. These preliminary results raise two important problems : First, report a complete cha… ▽ More

    Submitted 8 June, 2015; v1 submitted 20 November, 2013; originally announced November 2013.

  16. arXiv:1304.6591  [pdf, ps, other

    cs.IT

    Lp-Regularized Least Squares (0<p<1) and Critical Path

    Authors: Masahiro Yukawa, Shun-ichi Amari

    Abstract: The least squares problem is formulated in terms of Lp quasi-norm regularization (0<p<1). Two formulations are considered: (i) an Lp-constrained optimization and (ii) an Lp-penalized (unconstrained) optimization. Due to the nonconvexity of the Lp quasi-norm, the solution paths of the regularized least squares problem are not ensured to be continuous. A critical path, which is a maximal continuous… ▽ More

    Submitted 24 April, 2013; originally announced April 2013.

  17. arXiv:1010.4965  [pdf, ps, other

    cond-mat.stat-mech cs.IT math.DG

    Dually flat structure with escort probability and its application to alpha-Voronoi diagrams

    Authors: Atsumi Ohara, Hiroshi Matsuzoe, Shun-ichi Amari

    Abstract: This paper studies geometrical structure of the manifold of escort probability distributions and shows its new applicability to information science. In order to realize escort probabilities we use a conformal transformation that flattens so-called alpha-geometry of the space of discrete probability distributions, which well characterizes nonadditive statistics on the space. As a result escort prob… ▽ More

    Submitted 24 October, 2010; originally announced October 2010.

    Comments: Several results in this paper can be found in the conference paper [36] without complete proofs