-
Deep Transformed Gaussian Processes
Authors:
Francisco Javier Sáez-Maldonado,
Juan Maroñas,
Daniel Hernández-Lobato
Abstract:
Transformed Gaussian Processes (TGPs) are stochastic processes specified by transforming samples from the joint distribution from a prior process (typically a GP) using an invertible transformation; increasing the flexibility of the base process.
Furthermore, they achieve competitive results compared with Deep Gaussian Processes (DGPs), which are another generalization constructed by a hierarchi…
▽ More
Transformed Gaussian Processes (TGPs) are stochastic processes specified by transforming samples from the joint distribution from a prior process (typically a GP) using an invertible transformation; increasing the flexibility of the base process.
Furthermore, they achieve competitive results compared with Deep Gaussian Processes (DGPs), which are another generalization constructed by a hierarchical concatenation of GPs. In this work, we propose a generalization of TGPs named Deep Transformed Gaussian Processes (DTGPs), which follows the trend of concatenating layers of stochastic processes. More precisely, we obtain a multi-layer model in which each layer is a TGP. This generalization implies an increment of flexibility with respect to both TGPs and DGPs. Exact inference in such a model is intractable. However, we show that one can use variational inference to approximate the required computations yielding a straightforward extension of the popular DSVI inference algorithm Salimbeni et al (2017). The experiments conducted evaluate the proposed novel DTGPs in multiple regression datasets, achieving good scalability and performance.
△ Less
Submitted 2 November, 2023; v1 submitted 27 October, 2023;
originally announced October 2023.
-
Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models
Authors:
Zhidi Lin,
Juan Maroñas,
Ying Li,
Feng Yin,
Sergios Theodoridis
Abstract:
The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states.…
▽ More
The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
Towards Flexibility and Interpretability of Gaussian Process State-Space Model
Authors:
Zhid Lin,
Feng Yin,
Juan Maroñas
Abstract:
The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a ne…
▽ More
The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a new class of probabilistic state-space models called TGPSSMs, which leverage a parametric normalizing flow to enrich the GP priors in the standard GPSSM, enabling greater flexibility and expressivity. Additionally, we present a scalable variational inference algorithm that offers a flexible and optimal structure for the variational distribution of latent states. The proposed algorithm is interpretable and computationally efficient due to the sparse GP representation and the bijective nature of normalizing flow. Moreover, we incorporate a constrained optimization framework into the algorithm to enhance the state-space representation capabilities and optimize the hyperparameters, leading to superior learning and inference performance. Experimental results on synthetic and real datasets corroborate that the proposed TGPSSM outperforms several state-of-the-art methods. The accompanying source code is available at \url{https://github.com/zhidilin/TGPSSM}.
△ Less
Submitted 6 April, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Adaptive Temperature Scaling for Robust Calibration of Deep Neural Networks
Authors:
Sergio A. Balanya,
Juan Maroñas,
Daniel Ramos
Abstract:
In this paper, we study the post-hoc calibration of modern neural networks, a problem that has drawn a lot of attention in recent years. Many calibration methods of varying complexity have been proposed for the task, but there is no consensus about how expressive these should be. We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling, we ca…
▽ More
In this paper, we study the post-hoc calibration of modern neural networks, a problem that has drawn a lot of attention in recent years. Many calibration methods of varying complexity have been proposed for the task, but there is no consensus about how expressive these should be. We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling, we call these the Adaptive Temperature Scaling family. We analyse expressive functions that improve calibration and propose interpretable methods. We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited, a common situation in certain post-hoc calibration applications like medical diagnosis. We study the functions that expressive methods learn under ideal conditions and design simpler methods but with a strong inductive bias towards these well-performing functions. Concretely, we propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy. Results show that our method obtains state-of-the-art performance when compared to others and, unlike complex models, it is robust against data scarcity. Moreover, our proposed model enables a deeper interpretation of the calibration process.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification
Authors:
Juan Maroñas,
Daniel Hernández-Lobato
Abstract:
This work introduces the Efficient Transformed Gaussian Process (ETGP), a new way of creating C stochastic processes characterized by: 1) the C processes are non-stationary, 2) the C processes are dependent by construction without needing a mixing matrix, 3) training and making predictions is very efficient since the number of Gaussian Processes (GP) operations (e.g. inverting the inducing point's…
▽ More
This work introduces the Efficient Transformed Gaussian Process (ETGP), a new way of creating C stochastic processes characterized by: 1) the C processes are non-stationary, 2) the C processes are dependent by construction without needing a mixing matrix, 3) training and making predictions is very efficient since the number of Gaussian Processes (GP) operations (e.g. inverting the inducing point's covariance matrix) do not depend on the number of processes. This makes the ETGP particularly suited for multi-class problems with a very large number of classes, which are the problems studied in this work. ETGPs exploit the recently proposed Transformed Gaussian Process (TGP), a stochastic process specified by transforming a Gaussian Process using an invertible transformation. However, unlike TGPs, ETGPs are constructed by transforming a single sample from a GP using C invertible transformations. We derive an efficient sparse variational inference algorithm for the proposed model and demonstrate its utility in 5 classification tasks which include low/medium/large datasets and a different number of classes, ranging from just a few to hundreds. Our results show that ETGPs, in general, outperform state-of-the-art methods for multi-class classification based on GPs, and have a lower computational cost (around one order of magnitude smaller).
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Transforming Gaussian Processes With Normalizing Flows
Authors:
Juan Maroñas,
Oliver Hamelijnck,
Jeremias Knoblauch,
Theodoros Damoulas
Abstract:
Gaussian Processes (GPs) can be used as flexible, non-parametric function priors. Inspired by the growing body of work on Normalizing Flows, we enlarge this class of priors through a parametric invertible transformation that can be made input-dependent. Doing so also allows us to encode interpretable prior knowledge (e.g., boundedness constraints). We derive a variational approximation to the resu…
▽ More
Gaussian Processes (GPs) can be used as flexible, non-parametric function priors. Inspired by the growing body of work on Normalizing Flows, we enlarge this class of priors through a parametric invertible transformation that can be made input-dependent. Doing so also allows us to encode interpretable prior knowledge (e.g., boundedness constraints). We derive a variational approximation to the resulting Bayesian inference problem, which is as fast as stochastic variational GP regression (Hensman et al., 2013; Dezfouli and Bonilla,2015). This makes the model a computationally efficient alternative to other hierarchical extensions of GP priors (Lazaro-Gredilla,2012; Damianou and Lawrence, 2013). The resulting algorithm's computational and inferential performance is excellent, and we demonstrate this on a range of data sets. For example, even with only 5 inducing points and an input-dependent flow, our method is consistently competitive with a standard sparse GP fitted using 100 inducing points.
△ Less
Submitted 25 February, 2021; v1 submitted 3 November, 2020;
originally announced November 2020.
-
On Calibration of Mixup Training for Deep Neural Networks
Authors:
Juan Maroñas,
Daniel Ramos,
Roberto Paredes
Abstract:
Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and still a field under study. Consequently, DNN can overfit and assign overconfident predictions -- effects that have been shown to affect the calibration of the confidences assigned to unseen data. Data Augmentation (DA) strategies ha…
▽ More
Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and still a field under study. Consequently, DNN can overfit and assign overconfident predictions -- effects that have been shown to affect the calibration of the confidences assigned to unseen data. Data Augmentation (DA) strategies have been proposed to regularize these models, being Mixup one of the most popular due to its ability to improve the accuracy, the uncertainty quantification and the calibration of DNN. In this work however we argue and provide empirical evidence that, due to its fundamentals, Mixup does not necessarily improve calibration. Based on our observations we propose a new loss function that improves the calibration, and also sometimes the accuracy, of DNN trained with this DA technique. Our loss is inspired by Bayes decision theory and introduces a new training framework for designing losses for probabilistic modelling. We provide state-of-the-art accuracy with consistent improvements in calibration performance. Appendix and code are provided here: https://github.com/jmaronas/calibration_MixupDNN_ARCLoss.pytorch.git
△ Less
Submitted 28 January, 2021; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Solving Partial Differential Equations with Neural Networks
Authors:
Juan B. Pedro,
Juan Maroñas,
Roberto Paredes
Abstract:
Many scientific and industrial applications require solving Partial Differential Equations (PDEs) to describe the physical phenomena of interest. Some examples can be found in the fields of aerodynamics, astrodynamics, combustion and many others. In some exceptional cases an analytical solution to the PDEs exists, but in the vast majority of the applications some kind of numerical approximation ha…
▽ More
Many scientific and industrial applications require solving Partial Differential Equations (PDEs) to describe the physical phenomena of interest. Some examples can be found in the fields of aerodynamics, astrodynamics, combustion and many others. In some exceptional cases an analytical solution to the PDEs exists, but in the vast majority of the applications some kind of numerical approximation has to be computed. In this work, an alternative approach is proposed using neural networks (NNs) as the approximation function for the PDEs. Unlike traditional numerical methods, NNs have the property to be able to approximate any function given enough parameters. Moreover, these solutions are continuous and derivable over the entire domain removing the need for discretization. Another advantage that NNs as function approximations provide is the ability to include the free-parameters in the process of finding the solution. As a result, the solution can generalize to a range of situations instead of a particular case, avoiding the need of performing new calculations every time a parameter is changed dramatically decreasing the optimization time. We believe that the presented method has the potential to disrupt the physics simulation field enabling real-time physics simulation and geometry optimization without the need of big supercomputers to perform expensive and time consuming simulations
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Bayesian Strategies for Likelihood Ratio Computation in Forensic Voice Comparison with Automatic Systems
Authors:
Daniel Ramos,
Juan Maroñas,
Alicia Lozano-Diez
Abstract:
This paper explores several strategies for Forensic Voice Comparison (FVC), aimed at improving the performance of the LRs when using generative Gaussian score-to-LR models. First, different anchoring strategies are proposed, with the objective of adapting the LR computation process to the case at hand, always respecting the propositions defined for the particular case. Second, a fully-Bayesian Gau…
▽ More
This paper explores several strategies for Forensic Voice Comparison (FVC), aimed at improving the performance of the LRs when using generative Gaussian score-to-LR models. First, different anchoring strategies are proposed, with the objective of adapting the LR computation process to the case at hand, always respecting the propositions defined for the particular case. Second, a fully-Bayesian Gaussian model is used to tackle the sparsity in the training scores that is often present when the proposed anchoring strategies are used. Experiments are performed using the 2014 i-Vector challenge set-up, which presents high variability in a telephone speech context. The results show that the proposed fully-Bayesian model clearly outperforms a more common Maximum-Likelihood approach, leading to high robustness when the scores to train the model become sparse.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Calibration of Deep Probabilistic Models with Decoupled Bayesian Neural Networks
Authors:
Juan Maroñas,
Roberto Paredes,
Daniel Ramos
Abstract:
Deep Neural Networks (DNNs) have achieved state-of-the-art accuracy performance in many tasks. However, recent works have pointed out that the outputs provided by these models are not well-calibrated, seriously limiting their use in critical decision scenarios. In this work, we propose to use a decoupled Bayesian stage, implemented with a Bayesian Neural Network (BNN), to map the uncalibrated prob…
▽ More
Deep Neural Networks (DNNs) have achieved state-of-the-art accuracy performance in many tasks. However, recent works have pointed out that the outputs provided by these models are not well-calibrated, seriously limiting their use in critical decision scenarios. In this work, we propose to use a decoupled Bayesian stage, implemented with a Bayesian Neural Network (BNN), to map the uncalibrated probabilities provided by a DNN to calibrated ones, consistently improving calibration. Our results evidence that incorporating uncertainty provides more reliable probabilistic models, a critical condition for achieving good calibration. We report a generous collection of experimental results using high-accuracy DNNs in standardized image classification benchmarks, showing the good performance, flexibility and robust behavior of our approach with respect to several state-of-the-art calibration methods. Code for reproducibility is provided.
△ Less
Submitted 28 February, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Generative Models For Deep Learning with Very Scarce Data
Authors:
Juan Maroñas,
Roberto Paredes,
Daniel Ramos
Abstract:
The goal of this paper is to deal with a data scarcity scenario where deep learning techniques use to fail. We compare the use of two well established techniques, Restricted Boltzmann Machines and Variational Auto-encoders, as generative models in order to increase the training set in a classification framework. Essentially, we rely on Markov Chain Monte Carlo (MCMC) algorithms for generating new…
▽ More
The goal of this paper is to deal with a data scarcity scenario where deep learning techniques use to fail. We compare the use of two well established techniques, Restricted Boltzmann Machines and Variational Auto-encoders, as generative models in order to increase the training set in a classification framework. Essentially, we rely on Markov Chain Monte Carlo (MCMC) algorithms for generating new samples. We show that generalization can be improved comparing this methodology to other state-of-the-art techniques, e.g. semi-supervised learning with ladder networks. Furthermore, we show that RBM is better than VAE generating new samples for training a classifier with good generalization capabilities.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.