Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Akaho, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.01301  [pdf, ps, other

    stat.ML cs.LG

    Geometry of EM and related iterative algorithms

    Authors: Hideitsu Hino, Shotaro Akaho, Noboru Murata

    Abstract: The Expectation--Maximization (EM) algorithm is a simple meta-algorithm that has been used for many years as a methodology for statistical inference when there are missing measurements in the observed data or when the data is composed of observables and unobservables. Its general properties are well studied, and also, there are countless ways to apply it to individual problems. In this paper, we i… ▽ More

    Submitted 12 November, 2022; v1 submitted 2 September, 2022; originally announced September 2022.

    Comments: to appear in Information Geometry Journal

  2. arXiv:2202.08472  [pdf, other

    cs.LG

    Full-Span Log-Linear Model and Fast Learning Algorithm

    Authors: Kazuya Takabatake, Shotaro Akaho

    Abstract: The full-span log-linear(FSLL) model introduced in this paper is considered an $n$-th order Boltzmann machine, where $n$ is the number of all variables in the target system. Let $X=(X_0,...,X_{n-1})$ be finite discrete random variables that can take $|X|=|X_0|...|X_{n-1}|$ different values. The FSLL model has $|X|-1$ parameters and can represent arbitrary positive distributions of $X$. The FSLL mo… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 25pages, 6figures

  3. arXiv:2112.01653  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning Curves for Continual Learning in Neural Networks: Self-Knowledge Transfer and Forgetting

    Authors: Ryo Karakida, Shotaro Akaho

    Abstract: Sequential training from task to task is becoming one of the major objects in deep learning applications such as continual learning and transfer learning. Nevertheless, it remains unclear under what conditions the trained model's performance improves or deteriorates. To deepen our understanding of sequential training, this study provides a theoretical analysis of generalization performance in a so… ▽ More

    Submitted 17 March, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: 27 pages, 6 figures

    Journal ref: Published in ICLR 2022

  4. Principal component analysis for Gaussian process posteriors

    Authors: Hideaki Ishibashi, Shotaro Akaho

    Abstract: This paper proposes an extension of principal component analysis for Gaussian process (GP) posteriors, denoted by GP-PCA. Since GP-PCA estimates a low-dimensional space of GP posteriors, it can be used for meta-learning, which is a framework for improving the performance of target tasks by estimating a structure of a set of tasks. The issue is how to define a structure of a set of GPs with an infi… ▽ More

    Submitted 6 April, 2023; v1 submitted 15 July, 2021; originally announced July 2021.

    Journal ref: Neural Computation, 34, 1189-1219, 2022

  5. arXiv:2107.00871  [pdf, other

    cs.LG stat.ML

    Reconsidering Dependency Networks from an Information Geometry Perspective

    Authors: Kazuya Takabatake, Shotaro Akaho

    Abstract: Dependency networks (Heckerman et al., 2000) are potential probabilistic graphical models for systems comprising a large number of variables. Like Bayesian networks, the structure of a dependency network is represented by a directed graph, and each node has a conditional probability table. Learning and inference are realized locally on individual nodes; therefore, computation remains tractable eve… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

    Comments: 28pages, 7figures

  6. arXiv:1910.05992  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Pathological spectra of the Fisher information metric and its variants in deep neural networks

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth and sample size when the network has random weights and is sufficiently wi… ▽ More

    Submitted 27 September, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: 23 pages, 7 figures; v2: minor improvements, Section 3.4 added

  7. arXiv:1909.12644  [pdf, other

    cs.LG stat.ML

    On a convergence property of a geometrical algorithm for statistical manifolds

    Authors: Shotaro Akaho, Hideitsu Hino, Noboru Murata

    Abstract: In this paper, we examine a geometrical projection algorithm for statistical inference. The algorithm is based on Pythagorean relation and it is derivative-free as well as representation-free that is useful in nonparametric cases. We derive a bound of learning rate to guarantee local convergence. In special cases of m-mixture and e-mixture estimation problems, we calculate specific forms of the bo… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

  8. arXiv:1906.02926  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions.… ▽ More

    Submitted 28 October, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: To appear in NeurIPS 2019

  9. arXiv:1806.01316  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

    Authors: Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

    Abstract: The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statisti… ▽ More

    Submitted 8 October, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted at AISTATS2019. Main text: 10 pages, 2 figures. Supplementary material: 9 pages, 2 figures, typos corrected

  10. arXiv:1206.3721  [pdf, ps, other

    cs.LG stat.ML

    Constraint-free Graphical Model with Fast Learning Algorithm

    Authors: Kazuya Takabatake, Shotaro Akaho

    Abstract: In this paper, we propose a simple, versatile model for learning the structure and parameters of multivariate distributions from a data set. Learning a Markov network from a given data set is not a simple problem, because Markov networks rigorously represent Markov properties, and this rigor imposes complex constraints on the design of the networks. Our proposed model removes these constraints, ac… ▽ More

    Submitted 17 June, 2012; originally announced June 2012.

    Comments: 9 pages, 11 figures, submitted to UAI2012

  11. arXiv:cs/0609071  [pdf, ps, other

    cs.LG cs.CV

    A kernel method for canonical correlation analysis

    Authors: Shotaro Akaho

    Abstract: Canonical correlation analysis is a technique to extract common features from a pair of multivariate data. In complex situations, however, it does not extract useful features because of its linearity. On the other hand, kernel method used in support vector machine is an efficient approach to improve such a linear method. In this paper, we investigate the effectiveness of applying kernel method t… ▽ More

    Submitted 14 February, 2007; v1 submitted 12 September, 2006; originally announced September 2006.

    Comments: Full version of paper presented in IMPS2001 (International Meeting of Psychometric Society) 2007-Feb-14: typos in equations (23) and (24) in page 3 of the first version have been corrected

  12. arXiv:cs/0211007  [pdf, ps, other

    cs.LG

    Approximating Incomplete Kernel Matrices by the em Algorithm

    Authors: Koji Tsuda, Shotaro Akaho, Kiyoshi Asai

    Abstract: In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, we make use of a parametric model of kernel matrices, and estimate missing entries by fitting the model to existing entries. The parametric model is created as a set… ▽ More

    Submitted 7 November, 2002; originally announced November 2002.

    Comments: 17 pages, 4 figures

    ACM Class: I2.6; I5.2

  13. arXiv:cs/0211006  [pdf, ps, other

    cs.AI cs.LG

    Maximing the Margin in the Input Space

    Authors: Shotaro Akaho

    Abstract: We propose a novel criterion for support vector machine learning: maximizing the margin in the input space, not in the feature (Hilbert) space. This criterion is a discriminative version of the principal curve proposed by Hastie et al. The criterion is appropriate in particular when the input space is already a well-designed feature space with rather small dimensionality. The definition of the m… ▽ More

    Submitted 7 November, 2002; originally announced November 2002.

    Comments: 19 pages, 5 figures, NIPS workshop

    ACM Class: I.2.6; I.5.1