Search | arXiv e-print repository

Amortized Active Learning for Nonparametric Functions

Authors: Cen-You Li, Marc Toussaint, Barbara Rakitsch, Christoph Zimmer

Abstract: Active learning (AL) is a sequential learning scheme aiming to select the most informative data. AL reduces data consumption and avoids the cost of labeling large amounts of data. However, AL trains the model and solves an acquisition optimization for each selection. It becomes expensive when the model training or acquisition optimization is challenging. In this paper, we focus on active nonparame… ▽ More Active learning (AL) is a sequential learning scheme aiming to select the most informative data. AL reduces data consumption and avoids the cost of labeling large amounts of data. However, AL trains the model and solves an acquisition optimization for each selection. It becomes expensive when the model training or acquisition optimization is challenging. In this paper, we focus on active nonparametric function learning, where the gold standard Gaussian process (GP) approaches suffer from cubic time complexity. We propose an amortized AL method, where new data are suggested by a neural network which is trained up-front without any real data (Figure 1). Our method avoids repeated model training and requires no acquisition optimization during the AL deployment. We (i) utilize GPs as function priors to construct an AL simulator, (ii) train an AL policy that can zero-shot generalize from simulation to real learning problems of nonparametric functions and (iii) achieve real-time data selection and comparable learning performances to time-consuming baseline methods. △ Less

Submitted 10 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted for publication at ECML-PKDD 2024 Workshop & Tutorial on Interactive Adaptive Learning

arXiv:2402.14402 [pdf, other]

Global Safe Sequential Learning via Efficient Knowledge Transfer

Authors: Cen-You Li, Olaf Duennbier, Marc Toussaint, Barbara Rakitsch, Christoph Zimmer

Abstract: Sequential learning methods such as active learning and Bayesian optimization select the most informative data to learn about a task. In many medical or engineering applications, the data selection is constrained by a priori unknown safety conditions. A promissing line of safe learning methods utilize Gaussian processes (GPs) to model the safety probability and perform data selection in areas with… ▽ More Sequential learning methods such as active learning and Bayesian optimization select the most informative data to learn about a task. In many medical or engineering applications, the data selection is constrained by a priori unknown safety conditions. A promissing line of safe learning methods utilize Gaussian processes (GPs) to model the safety probability and perform data selection in areas with high safety confidence. However, accurate safety modeling requires prior knowledge or consumes data. In addition, the safety confidence centers around the given observations which leads to local exploration. As transferable source knowledge is often available in safety critical experiments, we propose to consider transfer safe sequential learning to accelerate the learning of safety. We further consider a pre-computation of source components to reduce the additional computational load that is introduced by incorporating source data. In this paper, we theoretically analyze the maximum explorable safe regions of conventional safe learning methods. Furthermore, we empirically demonstrate that our approach 1) learns a task with lower data consumption, 2) globally explores multiple disjoint safe regions under guidance of the source knowledge, and 3) operates with computation comparable to conventional safe learning methods. △ Less

Submitted 15 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.00033 [pdf, other]

Hybrid Modeling Design Patterns

Authors: Maja Rudolph, Stefan Kurz, Barbara Rakitsch

Abstract: Design patterns provide a systematic way to convey solutions to recurring modeling challenges. This paper introduces design patterns for hybrid modeling, an approach that combines modeling based on first principles with data-driven modeling techniques. While both approaches have complementary advantages there are often multiple ways to combine them into a hybrid model, and the appropriate solution… ▽ More Design patterns provide a systematic way to convey solutions to recurring modeling challenges. This paper introduces design patterns for hybrid modeling, an approach that combines modeling based on first principles with data-driven modeling techniques. While both approaches have complementary advantages there are often multiple ways to combine them into a hybrid model, and the appropriate solution will depend on the problem at hand. In this paper, we provide four base patterns that can serve as blueprints for combining data-driven components with domain knowledge into a hybrid approach. In addition, we also present two composition patterns that govern the combination of the base patterns into more complex hybrid models. Each design pattern is illustrated by typical use cases from application areas such as climate modeling, engineering, and physics. △ Less

Submitted 29 December, 2023; originally announced January 2024.

arXiv:2309.08256 [pdf, other]

Sampling-Free Probabilistic Deep State-Space Models

Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters

Abstract: Many real-world dynamical systems can be described as State-Space Models (SSMs). In this formulation, each observation is emitted by a latent state, which follows first-order Markovian dynamics. A Probabilistic Deep SSM (ProDSSM) generalizes this framework to dynamical systems of unknown parametric form, where the transition and emission models are described by neural networks with uncertain weigh… ▽ More Many real-world dynamical systems can be described as State-Space Models (SSMs). In this formulation, each observation is emitted by a latent state, which follows first-order Markovian dynamics. A Probabilistic Deep SSM (ProDSSM) generalizes this framework to dynamical systems of unknown parametric form, where the transition and emission models are described by neural networks with uncertain weights. In this work, we propose the first deterministic inference algorithm for models of this type. Our framework allows efficient approximations for training and testing. We demonstrate in our experiments that our new method can be employed for a variety of tasks and enjoys a superior balance between predictive performance and computational budget. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.05282 [pdf, other]

Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving

Authors: Ali Keysan, Andreas Look, Eitan Kosman, Gonca Gürsun, Jörg Wagner, Yu Yao, Barbara Rakitsch

Abstract: In autonomous driving tasks, scene understanding is the first step towards predicting the future behavior of the surrounding traffic participants. Yet, how to represent a given scene and extract its features are still open research questions. In this study, we propose a novel text-based representation of traffic scenes and process it with a pre-trained language encoder. First, we show that text-… ▽ More In autonomous driving tasks, scene understanding is the first step towards predicting the future behavior of the surrounding traffic participants. Yet, how to represent a given scene and extract its features are still open research questions. In this study, we propose a novel text-based representation of traffic scenes and process it with a pre-trained language encoder. First, we show that text-based representations, combined with classical rasterized image representations, lead to descriptive scene embeddings. Second, we benchmark our predictions on the nuScenes dataset and show significant improvements compared to baselines. Third, we show in an ablation study that a joint encoder of text and rasterized images outperforms the individual encoders confirming that both representations have their complementary strengths. △ Less

Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2305.01773 [pdf, other]

Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems

Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters

Abstract: Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing method… ▽ More Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2302.13754 [pdf, other]

Combining Slow and Fast: Complementary Filtering for Dynamics Learning

Authors: Katharina Ensinger, Sebastian Ziesche, Barbara Rakitsch, Michael Tiemann, Sebastian Trimpe

Abstract: Modeling an unknown dynamical system is crucial in order to predict the future behavior of the system. A standard approach is training recurrent models on measurement data. While these models typically provide exact short-term predictions, accumulating errors yield deteriorated long-term behavior. In contrast, models with reliable long-term predictions can often be obtained, either by training a r… ▽ More Modeling an unknown dynamical system is crucial in order to predict the future behavior of the system. A standard approach is training recurrent models on measurement data. While these models typically provide exact short-term predictions, accumulating errors yield deteriorated long-term behavior. In contrast, models with reliable long-term predictions can often be obtained, either by training a robust but less detailed model, or by leveraging physics-based simulations. In both cases, inaccuracies in the models yield a lack of short-time details. Thus, different models with contrastive properties on different time horizons are available. This observation immediately raises the question: Can we obtain predictions that combine the best of both worlds? Inspired by sensor fusion tasks, we interpret the problem in the frequency domain and leverage classical methods from signal processing, in particular complementary filters. This filtering technique combines two signals by applying a high-pass filter to one signal, and low-pass filtering the other. Essentially, the high-pass filter extracts high-frequencies, whereas the low-pass filter extracts low frequencies. Applying this concept to dynamics model learning enables the construction of models that yield accurate long- and short-term predictions. Here, we propose two methods, one being purely learning-based and the other one being a hybrid model that requires an additional physics-based simulator. △ Less

Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2205.11894 [pdf, other]

Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs

Authors: Çağatay Yıldız, Melih Kandemir, Barbara Rakitsch

Abstract: We study time uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation… ▽ More We study time uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation, each object is represented as a graph node and interactions are modeled by accumulating the messages coming from neighboring objects. We show that efficient inference of such a complex network of variables is possible with modern variational sparse Gaussian process inference techniques. We empirically demonstrate that our model improves the reliability of long-term predictions over neural network based alternatives and it successfully handles missing dynamic or static information. Furthermore, we observe that only our model can successfully encapsulate independent dynamics and interaction information in distinct functions and show the benefit from this disentanglement in extrapolation scenarios. △ Less

Submitted 12 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

arXiv:2203.14849 [pdf, other]

Safe Active Learning for Multi-Output Gaussian Processes

Authors: Cen-You Li, Barbara Rakitsch, Christoph Zimmer

Abstract: Multi-output regression problems are commonly encountered in science and engineering. In particular, multi-output Gaussian processes have been emerged as a promising tool for modeling these complex systems since they can exploit the inherent correlations and provide reliable uncertainty estimates. In many applications, however, acquiring the data is expensive and safety concerns might arise (e.g.… ▽ More Multi-output regression problems are commonly encountered in science and engineering. In particular, multi-output Gaussian processes have been emerged as a promising tool for modeling these complex systems since they can exploit the inherent correlations and provide reliable uncertainty estimates. In many applications, however, acquiring the data is expensive and safety concerns might arise (e.g. robotics, engineering). We propose a safe active learning approach for multi-output Gaussian process regression. This approach queries the most informative data or output taking the relatedness between the regressors and safety constraints into account. We prove the effectiveness of our approach by providing theoretical analysis and by demonstrating empirical results on simulated datasets and on a real-world engineering dataset. On all datasets, our approach shows improved convergence compared to its competitors. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Accepted for publication at AISTATS 2022

arXiv:2112.03230 [pdf, other]

Traversing Time with Multi-Resolution Gaussian Process State-Space Models

Authors: Krista Longi, Jakob Lindinger, Olaf Duennbier, Melih Kandemir, Arto Klami, Barbara Rakitsch

Abstract: Gaussian Process state-space models capture complex temporal dependencies in a principled manner by placing a Gaussian Process prior on the transition function. These models have a natural interpretation as discretized stochastic differential equations, but inference for long sequences with fast and slow transitions is difficult. Fast transitions need tight discretizations whereas slow transitions… ▽ More Gaussian Process state-space models capture complex temporal dependencies in a principled manner by placing a Gaussian Process prior on the transition function. These models have a natural interpretation as discretized stochastic differential equations, but inference for long sequences with fast and slow transitions is difficult. Fast transitions need tight discretizations whereas slow transitions require backpropagating the gradients over long subtrajectories. We propose a novel Gaussian process state-space architecture composed of multiple components, each trained on a different resolution, to model effects on different timescales. The combined model allows traversing time on adaptive scales, providing efficient inference for arbitrarily long sequences with complex dynamics. We benchmark our novel method on semi-synthetic data and on an engine modeling task. In both experiments, our approach compares favorably against its state-of-the-art alternatives that operate on a single time-scale only. △ Less

Submitted 23 February, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: Added links to code and dataset. Added author contributions

arXiv:2006.09914 [pdf, other]

Learning Partially Known Stochastic Dynamics with Empirical PAC Bayes

Authors: Manuel Haussmann, Sebastian Gerwinn, Andreas Look, Barbara Rakitsch, Melih Kandemir

Abstract: Neural Stochastic Differential Equations model a dynamical environment with neural nets assigned to their drift and diffusion terms. The high expressive power of their nonlinearity comes at the expense of instability in the identification of the large set of free parameters. This paper presents a recipe to improve the prediction accuracy of such models in three steps: i) accounting for epistemic u… ▽ More Neural Stochastic Differential Equations model a dynamical environment with neural nets assigned to their drift and diffusion terms. The high expressive power of their nonlinearity comes at the expense of instability in the identification of the large set of free parameters. This paper presents a recipe to improve the prediction accuracy of such models in three steps: i) accounting for epistemic uncertainty by assuming probabilistic weights, ii) incorporation of partial knowledge on the state dynamics, and iii) training the resultant hybrid model by an objective derived from a PAC-Bayesian generalization bound. We observe in our experiments that this recipe effectively translates partial and noisy prior knowledge into an improved model fit. △ Less

Submitted 26 February, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Accepted at AISTATS 2021

arXiv:2006.08973 [pdf, other]

A Deterministic Approximation to Neural SDEs

Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters

Abstract: Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties have been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As… ▽ More Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties have been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As a remedy, we develop a computationally affordable deterministic scheme which accurately approximates the transition kernel, when dynamics is governed by a NSDE. Our method introduces a bidimensional moment matching algorithm: vertical along the neural net layers and horizontal along the time direction, which benefits from an original combination of effective approximations. Our deterministic approximation of the transition kernel is applicable to both training and prediction. We observe in multiple experiments that the uncertainty calibration quality of our method can be matched by Monte Carlo sampling only after introducing high computational cost. Thanks to the numerical stability of deterministic training, our method also improves prediction accuracy. △ Less

Submitted 12 September, 2022; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2005.11110 [pdf, other]

Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties

Authors: Jakob Lindinger, David Reeb, Christoph Lippert, Barbara Rakitsch

Abstract: Deep Gaussian Processes learn probabilistic data representations for supervised learning by cascading multiple Gaussian Processes. While this model family promises flexible predictive distributions, exact inference is not tractable. Approximate inference techniques trade off the ability to closely resemble the posterior distribution against speed of convergence and computational efficiency. We pro… ▽ More Deep Gaussian Processes learn probabilistic data representations for supervised learning by cascading multiple Gaussian Processes. While this model family promises flexible predictive distributions, exact inference is not tractable. Approximate inference techniques trade off the ability to closely resemble the posterior distribution against speed of convergence and computational efficiency. We propose a novel Gaussian variational family that allows for retaining covariances between latent processes while achieving fast convergence by marginalising out all global latent variables. After providing a proof of how this marginalisation can be done for general covariances, we restrict them to the ones we empirically found to be most important in order to also achieve computational efficiency. We provide an efficient implementation of our new approach and apply it to several benchmark datasets. It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives. △ Less

Submitted 22 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: 12 pages main text, 20 pages appendix. v2: changes due to NeurIPS review process. Camera-ready version to be published at NeurIPS 2020

arXiv:1810.12263 [pdf, other]

Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Authors: David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch

Abstract: Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, ins… ▽ More Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets. △ Less

Submitted 28 December, 2018; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: 11 pages main text, 12 pages appendix. v2: minor changes, new NeurIPS style file. Final camera-ready version submitted to NeurIPS 2018

Journal ref: Advances in Neural Information Processing Systems 31 (Proceedings of the NeurIPS Conference 2018), https://papers.nips.cc/paper/7594-learning-gaussian-processes-by-minimizing-pac-bayesian-generalization-bounds

Showing 1–14 of 14 results for author: Rakitsch, B