Search | arXiv e-print repository

arXiv:2405.20879 [pdf, other]

Flow matching achieves minimax optimal convergence

Authors: Kenji Fukumizu, Taiji Suzuki, Noboru Isobe, Kazusato Oko, Masanori Koyama

Abstract: Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of F… ▽ More Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM in terms of the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve the minmax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain these optimal rates. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.09860 [pdf, other]

Optimal Switching Networks for Paired-Egress Bell State Analyzer Pools

Authors: Marii Koyama, Claire Yun, Amin Taherkhani, Naphan Benchasattabuse, Bernard Ousmane Sane, Michal Hajdušek, Shota Nagayama, Rodney Van Meter

Abstract: To scale quantum computers to useful levels, we must build networks of quantum computational nodes that can share entanglement for use in distributed forms of quantum algorithms. In one proposed architecture, node-to-node entanglement is created when nodes emit photons entangled with stationary memories, with the photons routed through a switched interconnect to a shared pool of Bell state analyze… ▽ More To scale quantum computers to useful levels, we must build networks of quantum computational nodes that can share entanglement for use in distributed forms of quantum algorithms. In one proposed architecture, node-to-node entanglement is created when nodes emit photons entangled with stationary memories, with the photons routed through a switched interconnect to a shared pool of Bell state analyzers (BSAs). Designs that optimize switching circuits will reduce loss and crosstalk, raising entanglement rates and fidelity. We present optimal designs for switched interconnects constrained to planar layouts, appropriate for silicon waveguides and Mach-Zehnder interferometer (MZI) $2 \times 2$ switch points. The architectures for the optimal designs are scalable and algorithmically structured to pair any arbitrary inputs in a rearrangeable, non-blocking way. For pairing $N$ inputs, $N(N - 2)/4$ switches are required, which is less than half of number of switches required for full permutation switching networks. An efficient routing algorithm is also presented for each architecture. These designs can also be employed in reverse for entanglement generation using a shared pool of entangled paired photon sources. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures, 1 table

arXiv:2402.18839 [pdf, other]

Extended Flow Matching: a Method of Conditional Generation with Generalized Continuity Equation

Authors: Noboru Isobe, Masanori Koyama, Jinzhe Zhang, Kohei Hayashi, Kenji Fukumizu

Abstract: The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can… ▽ More The task of conditional generation is one of the most important applications of generative models, and numerous methods have been developed to date based on the celebrated flow-based models. However, many flow-based models in use today are not built to allow one to introduce an explicit inductive bias to how the conditional distribution to be generated changes with respect to conditions. This can result in unexpected behavior in the task of style transfer, for example. In this research, we introduce extended flow matching (EFM), a direct extension of flow matching that learns a "matrix field" corresponding to the continuous map from the space of conditions to the space of distributions. We show that we can introduce inductive bias to the conditional generation through the matrix field and demonstrate this fact with MMOT-EFM, a version of EFM that aims to minimize the Dirichlet energy or the sensitivity of the distribution with respect to conditions. We will present our theory along with experimental results that support the competitiveness of EFM in conditional generation. △ Less

Submitted 5 July, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 27 pages, 10 figures, We have corrected an error in our experiment on COT-FM

MSC Class: 68T07 (Primary); 49Q22 (Secondary)

arXiv:2305.18484 [pdf, other]

Neural Fourier Transform: A General Approach to Equivariant Representation Learning

Authors: Masanori Koyama, Kenji Fukumizu, Kohei Hayashi, Takeru Miyato

Abstract: Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the g… ▽ More Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the group without assuming explicit knowledge of how the group acts on data. We present the theoretical foundations of NFT and show that the existence of a linear equivariant feature, which has been assumed ubiquitously in equivariance learning, is equivalent to the existence of a group invariant kernel on the dataspace. We also provide experimental results to demonstrate the application of NFT in typical scenarios with varying levels of knowledge about the acting group. △ Less

Submitted 14 February, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2210.07413 [pdf, other]

Invariance-adapted decomposition and Lasso-type contrastive learning

Authors: Masanori Koyama, Takeru Miyato, Kenji Fukumizu

Abstract: Recent years have witnessed the effectiveness of contrastive learning in obtaining the representation of dataset that is useful in interpretation and downstream tasks. However, the mechanism that describes this effectiveness have not been thoroughly analyzed, and many studies have been conducted to investigate the data structures captured by contrastive learning. In particular, the recent study of… ▽ More Recent years have witnessed the effectiveness of contrastive learning in obtaining the representation of dataset that is useful in interpretation and downstream tasks. However, the mechanism that describes this effectiveness have not been thoroughly analyzed, and many studies have been conducted to investigate the data structures captured by contrastive learning. In particular, the recent study of \citet{content_isolate} has shown that contrastive learning is capable of decomposing the data space into the space that is invariant to all augmentations and its complement. In this paper, we introduce the notion of invariance-adapted latent space that decomposes the data space into the intersections of the invariant spaces of each augmentation and their complements. This decomposition generalizes the one introduced in \citet{content_isolate}, and describes a structure that is analogous to the frequencies in the harmonic analysis of a group. We experimentally show that contrastive learning with lasso-type metric can be used to find an invariance-adapted latent space, thereby suggesting a new potential for the contrastive learning. We also investigate when such a latent space can be identified up to mixings within each component. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Journal ref: 2022 ICML workshop of Topology, Algebra and Geometry in Machine Learning (spotlight)

arXiv:2210.05972 [pdf, other]

Unsupervised Learning of Equivariant Structure from Sequences

Authors: Takeru Miyato, Masanori Koyama, Kenji Fukumizu

Abstract: In this study, we present meta-sequential prediction (MSP), an unsupervised framework to learn the symmetry from the time sequence of length at least three. Our method leverages the stationary property (e.g. constant velocity, constant acceleration) of the time sequence to learn the underlying equivariant structure of the dataset by simply training the encoder-decoder model to be able to predict t… ▽ More In this study, we present meta-sequential prediction (MSP), an unsupervised framework to learn the symmetry from the time sequence of length at least three. Our method leverages the stationary property (e.g. constant velocity, constant acceleration) of the time sequence to learn the underlying equivariant structure of the dataset by simply training the encoder-decoder model to be able to predict the future observations. We will demonstrate that, with our framework, the hidden disentangled structure of the dataset naturally emerges as a by-product by applying simultaneous block-diagonalization to the transition operators in the latent space, the procedure which is commonly used in representation theory to decompose the feature-space based on the type of response to group actions. We will showcase our method from both empirical and theoretical perspectives. Our result suggests that finding a simple structured relation and learning a model with extrapolation capability are two sides of the same coin. The code is available at https://github.com/takerum/meta_sequential_prediction. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2111.07679 [pdf, other]

Contrastive Representation Learning with Trainable Augmentation Channel

Authors: Masanori Koyama, Kentaro Minami, Takeru Miyato, Yarin Gal

Abstract: In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by forma… ▽ More In contrastive representation learning, data representation is trained so that it can classify the image instances even when the images are altered by augmentations. However, depending on the datasets, some augmentations can damage the information of the images beyond recognition, and such augmentations can result in collapsed representations. We present a partial solution to this problem by formalizing a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder. We show that, with the infoMax objective based on this framework, we can learn a data-dependent distribution of augmentations to avoid the collapse of the representation. △ Less

Submitted 15 November, 2021; originally announced November 2021.

arXiv:2008.01883 [pdf, other]

When is invariance useful in an Out-of-Distribution Generalization problem ?

Authors: Masanori Koyama, Shoichiro Yamaguchi

Abstract: The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments. Popular approaches in this field use the hypothesis that such a predictor shall be an \textit{invariant predictor} that captures the mechanism that remains constant across environments. While these approaches have been experimentally successful in various case studies, there i… ▽ More The goal of Out-of-Distribution (OOD) generalization problem is to train a predictor that generalizes on all environments. Popular approaches in this field use the hypothesis that such a predictor shall be an \textit{invariant predictor} that captures the mechanism that remains constant across environments. While these approaches have been experimentally successful in various case studies, there is still much room for the theoretical validation of this hypothesis. This paper presents a new set of theoretical conditions necessary for an invariant predictor to achieve the OOD optimality. Our theory not only applies to non-linear cases, but also generalizes the necessary condition used in \citet{rojas2018invariant}. We also derive Inter Gradient Alignment algorithm from our theory and demonstrate its competitiveness on MNIST-derived benchmark datasets as well as on two of the three \textit{Invariance Unit Tests} proposed by \citet{aubinlinear}. △ Less

Submitted 25 November, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2007.10623 [pdf, other]

Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective

Authors: Ruixiang Zhang, Masanori Koyama, Katsuhiko Ishiguro

Abstract: Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the grap… ▽ More Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extend the theory of multivariate information bottleneck to enforce it. Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures like capturing correlation or encoding invariance, while also offering the flexibility to automatically estimate the dependency structure from data. We show that our framework unifies many existing generative models and can be applied to a variety of tasks including multi-modal data modeling, algorithmic fairness, and invariant risk minimization. △ Less

Submitted 2 October, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: ICML2020 accepted paper. Author name fixed

arXiv:2006.01488 [pdf, other]

Meta Learning as Bayes Risk Minimization

Authors: Shin-ichi Maeda, Toshiki Nakanishi, Masanori Koyama

Abstract: Meta-Learning is a family of methods that use a set of interrelated tasks to learn a model that can quickly learn a new query task from a possibly small contextual dataset. In this study, we use a probabilistic framework to formalize what it means for two tasks to be related and reframe the meta-learning problem into the problem of Bayesian risk minimization (BRM). In our formulation, the BRM opti… ▽ More Meta-Learning is a family of methods that use a set of interrelated tasks to learn a model that can quickly learn a new query task from a possibly small contextual dataset. In this study, we use a probabilistic framework to formalize what it means for two tasks to be related and reframe the meta-learning problem into the problem of Bayesian risk minimization (BRM). In our formulation, the BRM optimal solution is given by the predictive distribution computed from the posterior distribution of the task-specific latent variable conditioned on the contextual dataset, and this justifies the philosophy of Neural Process. However, the posterior distribution in Neural Process violates the way the posterior distribution changes with the contextual dataset. To address this problem, we present a novel Gaussian approximation for the posterior distribution that generalizes the posterior of the linear Gaussian model. Unlike that of the Neural Process, our approximation of the posterior distributions converges to the maximum likelihood estimate with the same rate as the true posterior distribution. We also demonstrate the competitiveness of our approach on benchmark datasets. △ Less

Submitted 2 June, 2020; originally announced June 2020.

arXiv:1909.09540 [pdf, other]

Reconnaissance and Planning algorithm for constrained MDP

Authors: Shin-ichi Maeda, Hayato Watahiki, Shintarou Okada, Masanori Koyama

Abstract: Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this b… ▽ More Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this by decomposing the CMDP into a pair of MDPs; reconnaissance MDP and planning MDP. The purpose of reconnaissance MDP is to evaluate the set of actions that are safe, and the purpose of planning MDP is to maximize the return while using the actions authorized by reconnaissance MDP. RMDP can define a set of safe policies for any given set of safety constraint, and this set of safe policies can be used to solve another CMDP problem with different reward. Our method is not only computationally less demanding than the previous simulator-based approaches to CMDP, but also capable of finding a competitive reward-seeking policy in a high dimensional environment, including those involving multiple moving obstacles. △ Less

Submitted 20 September, 2019; originally announced September 2019.

arXiv:1907.10902 [pdf, other]

Optuna: A Next-generation Hyperparameter Optimization Framework

Authors: Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama

Abstract: The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purpo… ▽ More The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/). △ Less

Submitted 25 July, 2019; originally announced July 2019.

Comments: 10 pages, Accepted at KDD 2019 Applied Data Science track

arXiv:1905.13021 [pdf, other]

Robustness to Adversarial Perturbations in Learning from Incomplete Data

Authors: Amir Najafi, Shin-ichi Maeda, Masanori Koyama, Takeru Miyato

Abstract: What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measure… ▽ More What is the role of unlabeled data in an inference problem, when the presumed underlying distribution is adversarially perturbed? To provide a concrete answer to this question, this paper unifies two major learning frameworks: Semi-Supervised Learning (SSL) and Distributionally Robust Learning (DRL). We develop a generalization theory for our framework based on a number of novel complexity measures, such as an adversarial extension of Rademacher complexity and its semi-supervised analogue. Moreover, our analysis is able to quantify the role of unlabeled data in the generalization under a more general condition compared to the existing theoretical works in SSL. Based on our framework, we also present a hybrid of DRL and EM algorithms that has a guaranteed convergence rate. When implemented with deep neural networks, our method shows a comparable performance to those of the state-of-the-art on a number of real-world benchmark datasets. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: 41 pages, 9 figures

arXiv:1905.11722 [pdf, ps, other]

A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Authors: Mitsuru Kusumoto, Takuya Inoue, Gentaro Watanabe, Takuya Akiba, Masanori Koyama

Abstract: Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous… ▽ More Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by 36%~81%, which outperforms the reduction achieved by other methods. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1902.02992 [pdf, other]

A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning

Authors: Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama

Abstract: Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribu… ▽ More Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet. △ Less

Submitted 9 May, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: 20 pages, 12 figures

arXiv:1902.01020 [pdf, other]

Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis

Authors: Katsuhiko Ishiguro, Shin-ichi Maeda, Masanori Koyama

Abstract: Graph Neural Network (GNN) is a popular architecture for the analysis of chemical molecules, and it has numerous applications in material and medicinal science. Current lines of GNNs developed for molecular analysis, however, do not fit well on the training set, and their performance does not scale well with the complexity of the network. In this paper, we propose an auxiliary module to be attache… ▽ More Graph Neural Network (GNN) is a popular architecture for the analysis of chemical molecules, and it has numerous applications in material and medicinal science. Current lines of GNNs developed for molecular analysis, however, do not fit well on the training set, and their performance does not scale well with the complexity of the network. In this paper, we propose an auxiliary module to be attached to a GNN that can boost the representation power of the model without hindering with the original GNN architecture. Our auxiliary module can be attached to a wide variety of GNNs, including those that are used commonly in biochemical applications. With our auxiliary architecture, the performances of many GNNs used in practice improve more consistently, achieving the state-of-the-art performance on popular molecular graph datasets. △ Less

Submitted 24 May, 2019; v1 submitted 3 February, 2019; originally announced February 2019.

Comments: Augmented experiments, title slightly modified

arXiv:1811.10153 [pdf, other]

Spatially Controllable Image Synthesis with Internal Representation Collaging

Authors: Ryohei Suzuki, Masanori Koyama, Takeru Miyato, Taizan Yonetsuji, Huachun Zhu

Abstract: We present a novel CNN-based image editing strategy that allows the user to change the semantic information of an image over an arbitrary region by manipulating the feature-space representation of the image in a trained GAN model. We will present two variants of our strategy: (1) spatial conditional batch normalization (sCBN), a type of conditional batch normalization with user-specifiable spatial… ▽ More We present a novel CNN-based image editing strategy that allows the user to change the semantic information of an image over an arbitrary region by manipulating the feature-space representation of the image in a trained GAN model. We will present two variants of our strategy: (1) spatial conditional batch normalization (sCBN), a type of conditional batch normalization with user-specifiable spatial weight maps, and (2) feature-blending, a method of directly modifying the intermediate features. Our methods can be used to edit both artificial image and real image, and they both can be used together with any GAN with conditional normalization layers. We will demonstrate the power of our method through experiments on various types of GANs trained on different datasets. Code will be available at https://github.com/pfnet-research/neural-collage. △ Less

Submitted 9 April, 2019; v1 submitted 25 November, 2018; originally announced November 2018.

arXiv:1811.09245 [pdf, other]

doi 10.1007/s11263-020-01333-y

Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN

Authors: Masaki Saito, Shunta Saito, Masanori Koyama, Sosuke Kobayashi

Abstract: Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost sc… ▽ More Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution. We achieve this by designing the generator model as a stack of small sub-generators and training the model in a specific way. We train each sub-generator with its own specific discriminator. At the time of the training, we introduce between each pair of consecutive sub-generators an auxiliary subsampling layer that reduces the frame-rate by a certain ratio. This procedure can allow each sub-generator to learn the distribution of the video at different levels of resolution. We also need only a few GPUs to train a highly complex generator that far outperforms the predecessor in terms of inception scores. △ Less

Submitted 1 June, 2020; v1 submitted 22 November, 2018; originally announced November 2018.

Comments: Accepted at International Journal of Computer Vision. The source code is available at https://github.com/pfnet-research/tgan2

arXiv:1802.05957 [pdf, other]

Spectral Normalization for Generative Adversarial Networks

Authors: Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida

Abstract: One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral norm… ▽ More One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques. △ Less

Submitted 16 February, 2018; originally announced February 2018.

Comments: Published as a conference paper at ICLR 2018

arXiv:1802.05637 [pdf, other]

cGANs with Projection Discriminator

Authors: Takeru Miyato, Masanori Koyama

Abstract: We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model. This approach is in contrast with most frameworks of conditional GANs used in application today, which use the conditional information by concatenating the (embedded) conditional vector to th… ▽ More We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model. This approach is in contrast with most frameworks of conditional GANs used in application today, which use the conditional information by concatenating the (embedded) conditional vector to the feature vectors. With this modification, we were able to significantly improve the quality of the class conditional image generation on ILSVRC2012 (ImageNet) 1000-class image dataset from the current state-of-the-art result, and we achieved this with a single pair of a discriminator and a generator. We were also able to extend the application to super-resolution and succeeded in producing highly discriminative super-resolution images. This new structure also enabled high quality category transformation based on parametric functional transformation of conditional batch normalization layers in the generator. △ Less

Submitted 14 August, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

Comments: Published as a conference paper at ICLR 2018

arXiv:1710.07375 [pdf]

SPICE Simulation of tunnel FET aiming at 32 kHz crystal-oscillator operation

Authors: Tetsufumi Tanamoto, Chika Tanaka, Satoshi Takaya, Masato Koyama

Abstract: We numerically investigate the possibility of using Tunnel field-effect transistor (TFET) in a 32 kHz crystal oscillator circuit to reduce power consumption. A simulation using SPICE (Simulation Program with Integrated Circuit Emphasis) is carried out based on a conventional CMOS transistor model. It is shown that the power consumption of TFET is one-tenth that of conventional low-power CMOS. We numerically investigate the possibility of using Tunnel field-effect transistor (TFET) in a 32 kHz crystal oscillator circuit to reduce power consumption. A simulation using SPICE (Simulation Program with Integrated Circuit Emphasis) is carried out based on a conventional CMOS transistor model. It is shown that the power consumption of TFET is one-tenth that of conventional low-power CMOS. △ Less

Submitted 19 October, 2017; originally announced October 2017.

Comments: 7 pages, 2 tables

arXiv:1704.03976 [pdf, other]

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

Authors: Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Shin Ishii

Abstract: We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label info… ▽ More We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10. △ Less

Submitted 27 June, 2018; v1 submitted 12 April, 2017; originally announced April 2017.

Comments: To be appeared in IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:1507.00677 [pdf, other]

Distributional Smoothing with Virtual Adversarial Training

Authors: Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

Abstract: We propose local distributional smoothness (LDS), a new notion of smoothness for statistical model that can be used as a regularization term to promote the smoothness of the model distribution. We named the LDS based regularization as virtual adversarial training (VAT). The LDS of a model at an input datapoint is defined as the KL-divergence based robustness of the model distribution against local… ▽ More We propose local distributional smoothness (LDS), a new notion of smoothness for statistical model that can be used as a regularization term to promote the smoothness of the model distribution. We named the LDS based regularization as virtual adversarial training (VAT). The LDS of a model at an input datapoint is defined as the KL-divergence based robustness of the model distribution against local perturbation around the datapoint. VAT resembles adversarial training, but distinguishes itself in that it determines the adversarial direction from the model distribution alone without using the label information, making it applicable to semi-supervised learning. The computational cost for VAT is relatively low. For neural network, the approximated gradient of the LDS can be computed with no more than three pairs of forward and back propagations. When we applied our technique to supervised and semi-supervised learning for the MNIST dataset, it outperformed all the training methods other than the current state of the art method, which is based on a highly advanced generative model. We also applied our method to SVHN and NORB, and confirmed our method's superior performance over the current state of the art semi-supervised method applied to these datasets. △ Less

Submitted 11 June, 2016; v1 submitted 2 July, 2015; originally announced July 2015.

Comments: Under review as a conference paper at ICLR 2016

arXiv:1502.00093 [pdf, other]

Deep learning of fMRI big data: a novel approach to subject-transfer decoding

Authors: Sotetsu Koyamada, Yumi Shikauchi, Ken Nakae, Masanori Koyama, Shin Ishii

Abstract: As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specifi… ▽ More As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specific decoders. In this study, we used deep neural network (DNN), a nonlinear hierarchical model, to construct a subject-transfer decoder. Our decoder is the first successful DNN-based subject-transfer decoder. When applied to a large-scale functional magnetic resonance imaging (fMRI) database, our DNN-based decoder achieved higher decoding accuracy than other baseline methods, including support vector machine (SVM). In order to analyze the knowledge acquired by this decoder, we applied principal sensitivity analysis (PSA) to the decoder and visualized the discriminative features that are common to all subjects in the dataset. Our PSA successfully visualized the subject-independent features contributing to the subject-transferability of the trained decoder. △ Less

Submitted 31 January, 2015; originally announced February 2015.

arXiv:1412.6785 [pdf, other]

doi 10.1007/978-3-319-18038-0_48

Principal Sensitivity Analysis

Authors: Sotetsu Koyamada, Masanori Koyama, Ken Nakae, Shin Ishii

Abstract: We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural… ▽ More We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural networks with artificial data and real data, and apply the algorithm to the obtained supervised classifiers. We then visualize the PSMs to demonstrate the PSA's ability to decompose the knowledge acquired by the trained classifiers. △ Less

Submitted 11 March, 2015; v1 submitted 21 December, 2014; originally announced December 2014.

Showing 1–25 of 25 results for author: Koyama, M