License: CC BY 4.0
arXiv:2403.11966v1 [cs.LG] 18 Mar 2024

Informed Spectral Normalized Gaussian Processes for Trajectory Prediction

 Christian Schlauch
Humboldt-Universität zu Berlin,
and Continental AG
Berlin, Germany
& Christian Wirth
Continental AG
Frankfurt am Main, Germany & Nadja Klein
Technische Universität Dortmund
Chair of Uncertainty Quantification and Statistical Learning
Berlin, Germany
Abstract

Prior parameter distributions provide an elegant way to represent prior expert and world knowledge for informed learning. Previous work has shown that using such informative priors to regularize probabilistic deep learning (DL) models increases their performance and data-efficiency. However, commonly used sampling-based approximations for probabilistic DL models can be computationally expensive, requiring multiple inference passes and longer training times. Promising alternatives are compute-efficient last layer kernel approximations like spectral normalized Gaussian processes (SNGPs). We propose a novel regularization-based continual learning method for SNGPs, which enables the use of informative priors that represent prior knowledge learned from previous tasks. Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion. We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge. On two public datasets, we investigate its performance under diminishing training data and across locations, and thereby demonstrate an increase in data-efficiency and robustness to location-transfers over non-informed and informed baselines.

1 Introduction

Deep learning (DL) has become a powerful artificial intelligence (AI) tool for handling complex tasks. However, DL typically requires extensive training data to provide robust results [?]. High acquisition costs can render the collection of sufficient data unfeasible. This is especially problematic in safety-critical domains like autonomous driving, where we encounter a wide range of edge cases associated with high risks [?]. Informed learning (IL) aims to improve the data efficiency and robustness of DL models by integrating prior knowledge [?]. Most IL approaches consider prior scientific knowledge by constraining or verifying the problem space or learning process directly. However, hard constraints are not suitable for qualitative prior expert and world knowledge where ubiquitous exceptions exist. In autonomous driving, for example, we expect traffic participants to comply with speed regulations but must not rule out violations. Still, prior knowledge about norms and regulations, like in this example, are highly informative for most cases and readily available at low costs.

A recent idea is the integration of such prior expert and world knowledge into probabilistic DL models [??]. These models maintain a distribution over possible model parameters instead of single maximum likelihood estimates. The prior knowledge can be represented as a prior parameter distribution, learned from arbitrarily defined knowledge tasks, to regularize training on real-world observations. The probabilistic informed learning (PIL) approach of Schlauch [?] applies this idea to the trajectory prediction in autonomous driving using regularization-based continual learning methods, achieving a substantially improved data efficiency. However, typical sampling-based probabilistic DL model approximations, such as the variational inference (VI) used by Schlauch [?], are computationally expensive, since they require multiple inference passes and substantially more training epochs. A promising alternative are compute-efficient last layer approximations [?]. The spectral normalized Gaussian process (SNGP) [?] is a particularly efficient approximation, that applies a Gaussian process (GP) as last layer to a deterministic deep neural network (DNN). The DNN acts as scalable feature extractor, while the last layer GP allows the deterministic estimation of the uncertainty in a single inference pass. The last layer GP kernel itself is approximated via Fourier features, which is asymptotically exact and can be easily scaled.

We propose a novel regularization-based continual learning method to enable the use of SNGPs in a PIL approach. Our proposal is conceptually simple, builds upon well-established methods [??], imposes little computational overhead and requires no additional architecture changes, making implementation straightforward. We apply our method in a PIL approach for the trajectory prediction in autonomous driving, which is an especially challenging application since well-calibrated, multi-modal predictions are required to enable safe planning.

Refer to caption
Figure 1: The informed CoverNet-SNGP model consists of a spectral normalized feature extractor and a last layer Gaussian Process with a Fourier feature approximated radial basis function kernel. Given a Birds-Eye-View RGB rendering and the target’s current state, the model classifies a set of candidate trajectories according to their drivability in task i𝑖iitalic_i and their likely realization in task i+1𝑖1i+1italic_i + 1. Our method regularizes the training on task i+1𝑖1i+1italic_i + 1, given the MAP estimates and Laplace approximated covariance from task i𝑖iitalic_i as informative priors, thereby integrating the drivability knowledge following the PIL approach.

Following Schlauch [?], we employ CoverNet as base model and integrate the prior drivability knowledge that trajectories are likely to stay on-road. We benchmark our proposed informed CoverNet-SNGP on two public datasets, NuScenes and Argoverse2, against the non-informed Base-CoverNet, CoverNet-SNGP and informed Transfer-CoverNet, GVCL-Det-CoverNet as baselines. To this end, we evaluate data-efficiency under diminishing training data availability and robustness to location-transfers, both being key aspects for safe autonomous driving [??]. We observe benefits in favor of our informed CoverNet-SNGP across various performance metrics, especially in low data regimes, which demonstrates our method’s viability to increase data-efficiency and robustness in a PIL approach. Our code is available on GitHub111https://github.com/continental/kiwissen-bayesian-trajectory-prediction.

2 Related Work

Van Rueden [?] provides an overview of IL as an emerging field of research, which is also known as knowledge-guided or -augmented learning [?]. In trajectory prediction, like in other domains, most work concentrates on integrating prior scientific knowledge. Dynamical models are used, for instance, to encode physical limitations of motion in the architecture [?], in the output representation [?] or in a post-hoc verification [?]. Approaches similar to the PIL approach [?], that focus on integrating expert and world prior knowledge, typically leverage transfer- or multi-task learning settings [?]. However, transfer learning does not prevent catastrophic forgetting, while multi-task learning requires a single dataset with simultaneously available labels. PIL can be applied without these limitations.

SNGPs and related models, known as deterministic uncertainty models (DUMs), have been analyzed by Postels [?] and Charpentier [?]. Most closely related to SNGPs is the deterministic uncertainty estimator (DUE) proposed by van Amersfoort [?], which approximates the last layer kernel with sparse variational inducing points instead of Fourier features. DUE preserves the non-parametric nature of the kernel, but is sensitive to its initialization and generally not asymptotically exact.

Parisi [?] and De Lange [?] give a detailed survey of continual learning methods and their classification. Our proposed continual learning method for SNGPs is purely regularization-based, in contrast to the functional regularization introduced by Titsias [?], which could be directly applied to the DUE model, and the work of Derakhshani [?], which also considers a kernel approximation based on Fourier features. Both these methods require rehearsal, the latter also a parameter expansion. Rehearsal is likely to be sensitive to the data imbalances [?] in our application, while parameter expansions require architecture changes which introduce additional complexity. Our proposed method is conceptually simple and builds upon the well-established online elastic weight consolidation (online EWC) introduced by Schwartz [?]. Online EWC can also be understood as special case of generalized variational continual learning (GVCL) described by Loo [?].

3 Informed SNGPs

3.1 Probabilistic Informed Learning

The PIL approach of Schlauch [?] integrates prior expert and world knowledge in a supervised learning setup. The basic idea is to define a sequence of knowledge tasks i=1,,M1𝑖1𝑀1i=1,\ldots,M-1italic_i = 1 , … , italic_M - 1 on datasets Di={(xj(i),yj(t))}j=1nisubscript𝐷𝑖subscriptsuperscriptsubscriptsuperscript𝑥𝑖𝑗subscriptsuperscript𝑦𝑡𝑗subscript𝑛𝑖𝑗1D_{i}=\{(x^{(i)}_{j},y^{(t)}_{j})\}^{n_{i}}_{j=1}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT with nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT samples each. These datasets can be synthetically generated, for example, by leveraging semantic annotations to map the prior knowledge to the prediction target. Semantic annotations are readily available in domains like autonomous driving, but are often underutilized in state-of-the-art models that learn from observations in the conventional task i=M𝑖𝑀i=Mitalic_i = italic_M alone [?].

Given a probabilistic DL model parameterized by θ𝜃\thetaitalic_θ and an initial uninformative prior π0(θ)subscript𝜋0𝜃\pi_{0}(\theta)italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ ), the goal is to recursively learn from the sequence of tasks by applying Bayes’ rule

p(θ|D1:i)π0(θ)j=1ipθ(yj|xj),proportional-to𝑝conditional𝜃subscript𝐷:1𝑖subscript𝜋0𝜃subscriptsuperscriptproduct𝑖𝑗1subscript𝑝𝜃conditionalsubscript𝑦𝑗subscript𝑥𝑗\begin{split}p(\theta|D_{1:i})\propto\pi_{0}(\theta)\prod^{i}_{j=1}p_{\theta}(% y_{j}|x_{j}),\end{split}start_ROW start_CELL italic_p ( italic_θ | italic_D start_POSTSUBSCRIPT 1 : italic_i end_POSTSUBSCRIPT ) ∝ italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ ) ∏ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , end_CELL end_ROW (1)

where pθ(yj|xj)subscript𝑝𝜃conditionalsubscript𝑦𝑗subscript𝑥𝑗p_{\theta}(y_{j}|x_{j})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) are the likelihood functions at task j𝑗jitalic_j, which are assumed to be conditionally independent given θ𝜃\thetaitalic_θ. This computationally intractable recursion is approximated by repurposing regularization-based continual learning methods.

The PIL approach can generally be applied, as long as first, the prior knowledge is strongly related to the observational task, second, the prior knowledge can be mapped to the prediction target and third, the posterior parameter distribution can be estimated. The informative priors make information explicit and shape the loss surface in the downstream task, improving the training outcome; even without using probabilistic inference in the end [?].

3.2 SNGP Composition

SNGPs [?] employ a composition fθ=gθGPhθNN:𝒳𝒴:subscript𝑓𝜃subscript𝑔subscript𝜃GPsubscriptsubscript𝜃NN𝒳𝒴f_{\theta}=g_{\theta_{\text{GP}}}\circ h_{\theta_{\text{NN}}}:\mathcal{X}\to% \mathcal{Y}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_h start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT end_POSTSUBSCRIPT : caligraphic_X → caligraphic_Y, θ={θNN,θGP}𝜃subscript𝜃NNsubscript𝜃GP\theta=\{\theta_{\text{NN}},\theta_{\text{GP}}\}italic_θ = { italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT }. Its first component is a deterministic, spectral normalized feature extractor hθNN:𝒳:subscriptsubscript𝜃NN𝒳h_{\theta_{\text{NN}}}:\mathcal{X}\to\mathcal{H}italic_h start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT end_POSTSUBSCRIPT : caligraphic_X → caligraphic_H with trainable parameters θNNsubscript𝜃NN\theta_{\text{NN}}italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT mapping the high dimensional input space 𝒳𝒳\mathcal{X}caligraphic_X into a low dimensional hidden space \mathcal{H}caligraphic_H. The second component is a GP output layer gθGP:𝒴:subscript𝑔subscript𝜃GP𝒴g_{\theta_{\text{GP}}}:\mathcal{H}\to\mathcal{Y}italic_g start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT end_POSTSUBSCRIPT : caligraphic_H → caligraphic_Y with a radial basis function (RBF) kernel mapping into the output space 𝒴𝒴\mathcal{Y}caligraphic_Y. The RBF kernel can be approximated by (random) Fourier features using Bochner’s Theorem [?]. This effectively reduces the GP to a Bayesian linear model, that can be written as a neural network layer with fixed hidden weights and trainable output weight parameters θGPsubscript𝜃GP\theta_{\text{GP}}italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT and enables end-to-end training with the feature extractor. The distance-sensitive of the composition prevents a “feature-collapse” [?], improving the calibration against adversarial and outlier samples. In total, SNGP introduces five additional hyperparameters, namely an upper bound s𝑠sitalic_s and number of power iterations Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT for the spectral normalization for the feature extractor and the number of Fourier features Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, the kernel’s length scale lssubscript𝑙𝑠l_{s}italic_l start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and Gaussian prior choice for the last layer.

3.3 Regularizing SNGPs

There are two problems prohibiting the direct application of the PIL approach to composite last layer kernel approximations like the SNGP. First, there is no existing continual learning method for kernels that does not require rehearsal memories or parameter expansions (see Sec. 2). Second, estimating the posterior parameter distribution of the feature extractor (e.g. via a Laplace approximation or variational inference) contradicts the motivation for the last layer kernel approximation regarding compute-efficiency.

We tackle the first problem by leveraging the Fourier feature approximation of the RBF kernel of the GP. The posterior distributions of the parameters of the last layer at task i𝑖iitalic_i can be made tractable through Laplace approximation [?], that is, we assume

p(θGP|D1:i)𝒩(θGP;θGP,i*,ΣGP,i1),𝑝conditionalsubscript𝜃GPsubscript𝐷:1𝑖𝒩subscript𝜃GPsuperscriptsubscript𝜃GP𝑖subscriptsuperscriptΣ1GP𝑖\displaystyle p(\theta_{\text{GP}}|D_{1:i})\approx\mathcal{N}(\theta_{\text{GP% }};\theta_{\text{GP},i}^{*},\Sigma^{-1}_{\text{GP},i}),italic_p ( italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT | italic_D start_POSTSUBSCRIPT 1 : italic_i end_POSTSUBSCRIPT ) ≈ caligraphic_N ( italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT ; italic_θ start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT ) ,

given a maximum a posteriori (MAP) estimate θGP,i*subscriptsuperscript𝜃GP𝑖\theta^{*}_{\text{GP},i}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT at task i𝑖iitalic_i. Similar to online EWC [?], θGP,i*subscriptsuperscript𝜃GP𝑖\theta^{*}_{\text{GP},i}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT can be obtained by minimizing

logpθGP(yi|xi)λGP2(θGPθGP,i1*)ΣGP,i11(θGPθGP,i1*)subscript𝑝subscript𝜃GPconditionalsubscript𝑦𝑖subscript𝑥𝑖subscript𝜆GP2superscriptsubscript𝜃GPsubscriptsuperscript𝜃GP𝑖1topsuperscriptsubscriptΣGP𝑖11subscript𝜃GPsubscriptsuperscript𝜃GP𝑖1\displaystyle-\log{p_{\theta_{\text{GP}}}(y_{i}|x_{i})}-\frac{\lambda_{\text{% GP}}}{2}(\theta_{\text{GP}}-\theta^{*}_{\text{GP},i-1})^{\top}\Sigma_{\text{GP% },i-1}^{-1}(\theta_{\text{GP}}-\theta^{*}_{\text{GP},i-1})- roman_log italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - divide start_ARG italic_λ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT GP , italic_i - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT GP , italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT GP , italic_i - 1 end_POSTSUBSCRIPT ) (2)

with respect to θGPsubscript𝜃GP\theta_{\text{GP}}italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT, where the precision ΣGP,i1superscriptsubscriptΣGP𝑖1\Sigma_{\text{GP},i}^{-1}roman_Σ start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is approximated by the sum of the Hessian at the MAP estimate and a scaled precision at task i1𝑖1i-1italic_i - 1, that is,

ΣGP,i1HGP,i(θGP,i*)+γGPΣGP,i11.superscriptsubscriptΣGP𝑖1subscript𝐻GP𝑖superscriptsubscript𝜃GP𝑖subscript𝛾GPsuperscriptsubscriptΣGP𝑖11\Sigma_{\text{GP},i}^{-1}\approx H_{\text{GP},i}(\theta_{\text{GP},i}^{*})+% \gamma_{\text{GP}}\Sigma_{\text{GP},i-1}^{-1}.roman_Σ start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≈ italic_H start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT GP , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + italic_γ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT GP , italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

Above, λGP>0subscript𝜆GP0\lambda_{\text{GP}}>0italic_λ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT > 0 is a temperature parameter, that scales the importance of the previous task [?], and 0<γGP10subscript𝛾GP10<\gamma_{\text{GP}}\leq 10 < italic_γ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT ≤ 1 is a decay parameter, that allows for more plasticity over very long task sequences [?]. In contrast to online EWC, we can cheaply compute the Hessian using moving averages [?] instead of using a Fisher matrix approximation. In the first task i=1𝑖1i=1italic_i = 1, we use an uninformative zero-mean, unit-variance prior π0subscript𝜋0\pi_{0}italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which amounts to a simple \mathcal{L}caligraphic_L2-regularization.

To tackle the second problem and regularize the feature extractor, we approximate the precision ΣNN,i11superscriptsubscriptΣNN𝑖11\Sigma_{\text{NN},i-1}^{-1}roman_Σ start_POSTSUBSCRIPT NN , italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT with the identity matrix 𝕀𝕀\mathbb{I}blackboard_I. This implies a simple \mathcal{L}caligraphic_L2-regularization for the MAP estimates θNN,i*subscriptsuperscript𝜃NN𝑖\theta^{*}_{\text{NN},i}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT NN , italic_i end_POSTSUBSCRIPT obtained by minimizing

logpθNN(yi|xi)λNN2(θNNθNN,i1*)2subscript𝑝subscript𝜃NNconditionalsubscript𝑦𝑖subscript𝑥𝑖subscript𝜆NN2superscriptsubscript𝜃NNsubscriptsuperscript𝜃NN𝑖12\displaystyle-\log{p_{\theta_{\text{NN}}}(y_{i}|x_{i})}-\frac{\lambda_{\text{% NN}}}{2}(\theta_{\text{NN}}-\theta^{*}_{\text{NN},i-1})^{2}- roman_log italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - divide start_ARG italic_λ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT NN , italic_i - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

with respect to θNNsubscript𝜃NN\theta_{\text{NN}}italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT, where λNNsubscript𝜆NN\lambda_{\text{NN}}italic_λ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT is the extractor specific temperature parameter. This idea is conceptually simple, but should be sufficient, since the learned representation in knowledge tasks should be suitable downstream due to the close relation between tasks.

In result, the complete model fθ:𝒳𝒴:subscript𝑓𝜃𝒳𝒴f_{\theta}:\mathcal{X}\to\mathcal{Y}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : caligraphic_X → caligraphic_Y, parameterized by θ={θNN,θGP}𝜃subscript𝜃NNsubscript𝜃GP\theta=\{\theta_{\text{NN}},\theta_{\text{GP}}\}italic_θ = { italic_θ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT }, can be effectively regularized and used in the PIL approach, as visualized in Fig. 1. Our method introduces three hyperparameters {λGP,γGP,λNN}subscript𝜆GPsubscript𝛾GPsubscript𝜆NN\{\lambda_{\text{GP}},\gamma_{\text{GP}},\lambda_{\text{NN}}\}{ italic_λ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT }. It only requires the parameters of the previous task in memory and has little computational overhead like online EWC [?].

4 Application to Trajectory Prediction

4.1 Problem Definition

We limit ourselves to the single-agent trajectory prediction problem [?]. An autonomous driving system is assumed to observe the states in the state space 𝒴𝒴\mathcal{Y}caligraphic_Y of all agents 𝒜𝒜\mathcal{A}caligraphic_A present in a scene on the road. Let y(t)𝒴superscript𝑦𝑡𝒴y^{(t)}\in\mathcal{Y}italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∈ caligraphic_Y denote the state of target agent a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A at time t𝑡titalic_t and let y(tTo:t)=(y(tTo),y(tTo+δt),,y(t))superscript𝑦:𝑡subscript𝑇𝑜𝑡superscript𝑦𝑡subscript𝑇𝑜superscript𝑦𝑡subscript𝑇𝑜𝛿𝑡superscript𝑦𝑡y^{(t-T_{o}\,:\,t)}=\big{(}y^{(t-T_{o})},y^{(t-T_{o}+\delta t)},\ldots,y^{(t)}% \big{)}italic_y start_POSTSUPERSCRIPT ( italic_t - italic_T start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT : italic_t ) end_POSTSUPERSCRIPT = ( italic_y start_POSTSUPERSCRIPT ( italic_t - italic_T start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_t - italic_T start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT + italic_δ italic_t ) end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) be its observed trajectory over an observation history Tosubscript𝑇𝑜T_{o}italic_T start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT with sampling period δt𝛿𝑡\delta titalic_δ italic_t. Additionally, we assume access to agent-centered maps \mathcal{M}caligraphic_M, which include semantic annotations such as the drivable area. Map and states make up the scene context of agent a𝑎aitalic_a, denoted as x=({yj(tTo:t)}j=1|𝒜|,)𝑥subscriptsuperscriptsuperscriptsubscript𝑦𝑗:𝑡subscript𝑇𝑜𝑡𝒜𝑗1{x=(\{y_{j}^{(t-T_{o}\,:\,t)}\}^{|\mathcal{A}|}_{j=1},\mathcal{M})}italic_x = ( { italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t - italic_T start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT : italic_t ) end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT | caligraphic_A | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT , caligraphic_M ). Given x𝑥xitalic_x, the goal is to predict the distribution of a𝑎aitalic_a’s future trajectories p(y(t+δt:t+Th)|x)𝑝conditionalsuperscript𝑦:𝑡𝛿𝑡𝑡subscript𝑇𝑥p(y^{(t+\delta t\,:\,t+T_{h})}|x)italic_p ( italic_y start_POSTSUPERSCRIPT ( italic_t + italic_δ italic_t : italic_t + italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT | italic_x ) over the prediction horizon Thsubscript𝑇T_{h}italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, where y(tδt:t+Th)=(y(t+δt),y(t+2δt),,y(t+Th))superscript𝑦:𝑡𝛿𝑡𝑡subscript𝑇superscript𝑦𝑡𝛿𝑡superscript𝑦𝑡2𝛿𝑡superscript𝑦𝑡subscript𝑇y^{(t-\delta t\,:\,t+T_{h})}=\big{(}y^{(t+\delta t)},y^{(t+2\delta t)},\ldots,% y^{(t+T_{h})}\big{)}italic_y start_POSTSUPERSCRIPT ( italic_t - italic_δ italic_t : italic_t + italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT = ( italic_y start_POSTSUPERSCRIPT ( italic_t + italic_δ italic_t ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( italic_t + 2 italic_δ italic_t ) end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT ( italic_t + italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ).

4.2 CoverNet-SNGP

CoverNet [?] approaches the single-agent trajectory problem by considering a birds-eye-view RGB rendering of the scene context x𝑥xitalic_x and the current state y(t)superscript𝑦𝑡y^{(t)}italic_y start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT of the target agent a𝑎aitalic_a as inputs. The RGB rendering is processed by a computer-vision backbone, before concatenated with the target’s current state and processed by another dense layer. The output is represented as a set 𝒦𝒦\mathcal{K}caligraphic_K of K𝐾Kitalic_K candidate trajectories yk(t+δt:t+Th)superscriptsubscript𝑦𝑘:𝑡𝛿𝑡𝑡subscript𝑇y_{k}^{(t+\delta t\,:\,t+T_{h})}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + italic_δ italic_t : italic_t + italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT. Doing so reduces the prediction problem to a classification problem, where each trajectory in the set 𝒦𝒦\mathcal{K}caligraphic_K is treated as a sample of the predictive distribution p(y(t+δt:t+Th)|x)𝑝conditionalsuperscript𝑦:𝑡𝛿𝑡𝑡subscript𝑇𝑥p(y^{(t+\delta t\,:\,t+T_{h})}|x)italic_p ( italic_y start_POSTSUPERSCRIPT ( italic_t + italic_δ italic_t : italic_t + italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT | italic_x ) and only the conditional probability of each sample is required. In principle, any space-filling heuristic may be used to define 𝒦𝒦\mathcal{K}caligraphic_K, for example, by using a dynamical model that integrates physical limitations [?], which could be applied in combination with the PIL approach. Here, we follow Phan-Minh’s [?] definition of a fixed set 𝒦𝒦\mathcal{K}caligraphic_K by solving a set-covering problem over a subsample of observed trajectories in the training split, using a greedy-algorithm222Further details in our supplemental. Also see Chapter 35.3 of Cormen [?] regarding set-covering problems in general. given a coverage-bound ϵitalic-ϵ\epsilonitalic_ϵ, which determines the number of total candidates K𝐾Kitalic_K.

The modification of CoverNet with SNGP is straightforward if a convolutional neural network (CNN) is used as backbone. In that case, a spectral normalization can be directly applied to the architecture while the last layer is replaced with a Gaussian process, approximated by Fourier features as described in Sec. 3.2.

4.3 Integrating Prior Drivability Knowledge

The PIL approach is applied sequentially on two consecutive tasks as follows. In task i𝑖iitalic_i, we integrate the prior drivability knowledge, that trajectories are likely to stay on-road. To this end, we derive new training labels (see Sec. 3.1), where all candidate trajectories in 𝒦𝒦\mathcal{K}caligraphic_K with way-points inside the drivable area for a given training scene x𝑥xitalic_x are labeled as positive [?]. We then train in a multi-label classification with a binary cross-entropy loss on these labels. In task i+1𝑖1i+1italic_i + 1, the closest candidate trajectory in 𝒦𝒦\mathcal{K}caligraphic_K to the observed ground truth is labeled as positive. We train in a multi-class classification with a sparse categorical cross-entropy loss (using softmax normalized logit transformations) on these labels [?]. In effect, the consecutive tasks are only differing in the labels and loss functions used. Applying our method described in Sec. 3.3, we first train our CoverNet-SNGP model on task i𝑖iitalic_i and then regularize its training on task i+1𝑖1i+1italic_i + 1, as exemplified in Fig. 1. We denote the resulting informed CoverNet-SNGP as CoverNet-SNGP𝐈𝐈{}_{\textbf{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT, opposed to the non-informed version CoverNet-SNGP𝐔𝐔{}_{\textbf{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT trained on task i+1𝑖1i+1italic_i + 1 only without integration of prior knowledge from task i𝑖iitalic_i.

5 Experimental Design

5.1 Datasets

We use the public NuScenes [?] and Argoverse2 [?] datasets. We replicate the NuScenes data split by Phan-Minh [?] on Argoverse2, only considering vehicle targets (exlcuding pedestrians and cyclists not driving on-road), as summarized in Tab. 1. For the RGB rendering, we consider each scene with a one-second history (To=1ssubscript𝑇𝑜1sT_{o}=1\text{s}italic_T start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = 1 s). For the candidate trajectories in 𝒦𝒦\mathcal{K}caligraphic_K, we consider a six-second prediction horizon (Th=6ssubscript𝑇6sT_{h}=6\text{s}italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 6 s), sampled at 2Hz2Hz2\text{Hz}2 Hz in NuScenes and 10Hz10Hz10\text{Hz}10 Hz in Argoverse2. Both datasets include drivable areas in the semantic map data, allowing us to define the first task as described in Sec. 4.3.

Table 1: Numbers and percentages of samples across location subsets of both NuScenes and Argoverse2.
data subset train split # (%) train-val split # (%) val split # (%)
NuScenes Total 32186 (100.0) 8560 (100.0) 9041 (100.0)
    Boston 19629 (60.99) 5855 (68.40) 5138 (56.84)
    Singapur 12557 (49.01) 2705 (31.60) 3903 (43.16)
Argoverse2 Total 161379 (100.0) 22992 (100.0) 23113 (100.0)
    Miami 42214 (26.16) 5983 (26.02) 5984 (25.89)
    Austin 34681 (21.49) 4968 (21.57) 4985 (26.16)
    Pittsburgh 33391 (20.69) 4823 (20.98) 4803 (20.78)
    Dearborn 20579 (12.75) 2933 (12.79) 3001 (12.98)
    Washington-DC 20546 (12.73) 2883 (12.54) 2976 (12.88)
    Palo-Alto 9968 (6.18) 1402 (6.10) 1364 (5.90)

5.2 Baselines

We consider the unmodified CoverNet as baseline, once as non-informed Base-CoverNet [?] and once as Transfer-CoverNet. The Transfer-CoverNet baseline, pretrained on task i𝑖iitalic_i and then trained on the current task i+1𝑖1i+1italic_i + 1, has previously been proposed by Boulton [?]. We can also understand it as an ablation-type baseline to the PIL approach without regularization. In addition, we compare to GVCL-Det-CoverNet proposed by Schlauch [?], since it only needs a single-inference pass too. However, GVCl-Det-CoverNet also requires computationally extremely expensive training of a GVCL-CoverNet model. For example, in our setting, training until convergence on a single Nvidia RTX A5000 GPUs with 10%percent1010\%10 % of NuScenes data needs around 120 hours for GVCL-CoverNet, in contrast to 8 hours for CoverNet-SNGP𝐈𝐈{}_{\textbf{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT and 6 hours for Base-CoverNet.

5.3 Metrics

We measure the average displacement error minADE11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT and final displacement error minFDE11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT , evaluating the quality of the most likely trajectory, and the minADE55{}_{5}start_FLOATSUBSCRIPT 5 end_FLOATSUBSCRIPT , which considers the five most likely trajectories [?]. The minADE55{}_{5}start_FLOATSUBSCRIPT 5 end_FLOATSUBSCRIPT depends on the probability-based ordering and, thus, indirectly on the calibration. We also consider the drivable area compliance (DAC) to evaluate the extent to which predictions align with our prior drivability knowledge.

Since observed ground truth trajectories may not be part of the trajectory set ytrue(t+δt:t+Th)𝒦superscriptsubscript𝑦true:𝑡𝛿𝑡𝑡subscript𝑇𝒦y_{\text{true}}^{(t+\delta t\,:\,t+T_{h})}\notin\mathcal{K}italic_y start_POSTSUBSCRIPT true end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + italic_δ italic_t : italic_t + italic_T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∉ caligraphic_K, the CoverNet model exhibits an irreducible approximation error. To more clearly assess the impact of our method, we also consider the classification-based negative log likelihood (NLL) and the rank of the positively labeled trajectory (RNK), both directly depending on the calibration, and the Top1-accuracy (ACC).

5.4 Implementation Details

We use the output representation described in Sec. 4 with a coverage bound ϵ=4mitalic-ϵ4m\epsilon=4\text{m}italic_ϵ = 4 m, for NuScenes with KNusc=415subscript𝐾Nusc415K_{\text{Nusc}}=415italic_K start_POSTSUBSCRIPT Nusc end_POSTSUBSCRIPT = 415 and for Argoverse2 with KArgo=518subscript𝐾Argo518K_{\text{Argo}}=518italic_K start_POSTSUBSCRIPT Argo end_POSTSUBSCRIPT = 518 candidates. We employ a ResNet-50 as backbone and SGD as optimizer. For the CoverNet-SNGPs, we fix power iterations Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT to one and the number of Fourier features Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT to 1024, following Liu [?]. The spectral normalization’s upper bound s𝑠sitalic_s and the kernel length scale lssubscript𝑙𝑠l_{s}italic_l start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are treated as additional hyperparameters. We tune the hyperparameters of each model on the respective tasks with 100% of the data using the validation NLL333Configurations are available in our supplemental and on Github.. The exception is CoverNet-SNGP𝐈𝐈{}_{\textbf{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT, which uses the same settings as CoverNet-SNGP𝐔𝐔{}_{\textbf{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT on task i+1𝑖1i+1italic_i + 1. We also fix both temperature parameters λNNsubscript𝜆NN\lambda_{\text{NN}}italic_λ start_POSTSUBSCRIPT NN end_POSTSUBSCRIPT and λGPsubscript𝜆GP\lambda_{\text{GP}}italic_λ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT ad-hoc to the inverse of the effective dataset size to keep tuning costs low. The decay parameter γGPsubscript𝛾GP\gamma_{\text{GP}}italic_γ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT is mostly relevant for very long task sequences (see Sec. 3), such that we set γGP=1subscript𝛾GP1\gamma_{\text{GP}}=1italic_γ start_POSTSUBSCRIPT GP end_POSTSUBSCRIPT = 1.

6 Results

We study the performance of our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT against the baselines under two sets of experiments. First, we investigate the performance under increasingly smaller subsets of the observational training data, allowing us to shed light on data-efficiency. These subsets are randomly subsampled once and then kept fixed across models and repetitions. In this set, we also consider GVCL-Det-CoverNet with results on NuScenes for 100%percent100100\%100 %, reported from Schlauch [?], 10%percent1010\%10 % and 3%percent33\%3 %, replicated with only three independent repetitions, due to the long training times. Second, we test the performance by training and testing on location-specific subsets, gaining insights into the robustness to location-transfers, which is often implicitly assumed in the state of the art [?]. The reported results are the average performance and standard deviation of five independent runs for each experiment.

6.1 Effect of Available Training Data

Table 2: Average performance and standard deviation of 5 independent repetitions over decreasing subsamples of NuScenes (bold as best). Data (in %) Model minADE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT minADE𝟓5\mathbf{{}_{5}}start_FLOATSUBSCRIPT bold_5 end_FLOATSUBSCRIPT minFDE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT NLL RNK ACC (in %) DAC (in %) 100 Base 4.92 ±plus-or-minus\pm±0.15 2.34 ±plus-or-minus\pm±0.05 10.94 ±plus-or-minus\pm±0.27 3.47 ±plus-or-minus\pm±0.06 15.55 ±plus-or-minus\pm±0.73 13.94 ±plus-or-minus\pm±1.10 89.26 ±plus-or-minus\pm±1.13 Transfer 4.60 ±plus-or-minus\pm±0.04 2.18 ±plus-or-minus\pm±0.02 9.94 ±plus-or-minus\pm±0.08 3.21 ±plus-or-minus\pm±0.01 11.79 ±plus-or-minus\pm±0.18 15.19 ±plus-or-minus\pm±0.43 95.73 ±plus-or-minus\pm±0.29 GVCL-Det* 4.55 ±plus-or-minus\pm±0.11 2.26 ±plus-or-minus\pm±0.05 9.93 ±plus-or-minus\pm±0.39 3.60 ±plus-or-minus\pm±0.08 11.85 ±plus-or-minus\pm±0.48 14.88 ±plus-or-minus\pm±0.94 90.94 ±plus-or-minus\pm±2.25 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 4.53 ±plus-or-minus\pm±0.09 2.25 ±plus-or-minus\pm±0.04 10.31 ±plus-or-minus\pm±0.27 3.23 ±plus-or-minus\pm±0.01 13.25 ±plus-or-minus\pm±0.19 17.04 ±plus-or-minus\pm±0.68 91.19 ±plus-or-minus\pm±0.61 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.45 ±plus-or-minus\pm±0.04 2.21 ±plus-or-minus\pm±0.01 10.09 ±plus-or-minus\pm±0.12 3.19 ±plus-or-minus\pm±0.01 12.44 ±plus-or-minus\pm±0.14 17.36 ±plus-or-minus\pm±0.59 91.65 ±plus-or-minus\pm±0.59 50 Base 5.15 ±plus-or-minus\pm±0.23 2.37 ±plus-or-minus\pm±0.11 11.46 ±plus-or-minus\pm±0.60 3.52 ±plus-or-minus\pm±0.06 17.21 ±plus-or-minus\pm±1.33 13.55 ±plus-or-minus\pm±0.62 86.68 ±plus-or-minus\pm±4.72 Transfer 4.86 ±plus-or-minus\pm±0.04 2.26 ±plus-or-minus\pm±0.01 10.38 ±plus-or-minus\pm±0.06 3.35 ±plus-or-minus\pm±0.01 13.46 ±plus-or-minus\pm±0.21 14.37 ±plus-or-minus\pm±0.09 95.66 ±plus-or-minus\pm±0.28 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 4.57 ±plus-or-minus\pm±0.05 2.26 ±plus-or-minus\pm±0.04 10.40 ±plus-or-minus\pm±0.15 3.30 ±plus-or-minus\pm±0.02 14.62 ±plus-or-minus\pm±0.17 16.83 ±plus-or-minus\pm±0.59 90.09 ±plus-or-minus\pm±0.56 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.48 ±plus-or-minus\pm±0.07 2.22 ±plus-or-minus\pm±0.04 10.13 ±plus-or-minus\pm±0.13 3.25 ±plus-or-minus\pm±0.02 13.39±plus-or-minus\pm±0.31 16.72 ±plus-or-minus\pm±0.76 91.10 ±plus-or-minus\pm±0.72 30 Base 5.40 ±plus-or-minus\pm±0.03 2.44 ±plus-or-minus\pm±0.07 12.01 ±plus-or-minus\pm±0.20 3.68 ±plus-or-minus\pm±0.04 19.80 ±plus-or-minus\pm±1.03 12.70 ±plus-or-minus\pm±0.81 86.58 ±plus-or-minus\pm±2.54 Transfer 5.08 ±plus-or-minus\pm±0.03 2.34 ±plus-or-minus\pm±0.02 10.80 ±plus-or-minus\pm±0.07 3.47 ±plus-or-minus\pm±0.01 15.00 ±plus-or-minus\pm±0.05 13.38 ±plus-or-minus\pm±0.31 96.07 ±plus-or-minus\pm±0.32 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 4.68 ±plus-or-minus\pm±0.09 2.29 ±plus-or-minus\pm±0.04 10.61 ±plus-or-minus\pm±0.26 3.37 ±plus-or-minus\pm±0.01 16.02 ±plus-or-minus\pm±0.22 16.69 ±plus-or-minus\pm±0.62 89.22 ±plus-or-minus\pm±0.30 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.58 ±plus-or-minus\pm±0.03 2.30 ±plus-or-minus\pm±0.02 10.35 ±plus-or-minus\pm±0.08 3.31 ±plus-or-minus\pm±0.02 14.67 ±plus-or-minus\pm±0.20 17.10 ±plus-or-minus\pm±0.34 90.41 ±plus-or-minus\pm±0.49 10 Base 5.89 ±plus-or-minus\pm±0.28 2.72 ±plus-or-minus\pm±0.11 12.88 ±plus-or-minus\pm±0.63 3.99 ±plus-or-minus\pm±0.06 32.74 ±plus-or-minus\pm±1.48 12.38 ±plus-or-minus\pm±0.96 86.38 ±plus-or-minus\pm±2.64 Transfer 6.09 ±plus-or-minus\pm±0.03 2.65 ±plus-or-minus\pm±0.02 12.60 ±plus-or-minus\pm±0.06 3.89 ±plus-or-minus\pm±0.01 24.82 ±plus-or-minus\pm±0.13 10.35 ±plus-or-minus\pm±0.15 95.54 ±plus-or-minus\pm±0.23 GVCL-Det* 5.27 ±plus-or-minus\pm±0.27 2.53 ±plus-or-minus\pm±0.09 12.03 ±plus-or-minus\pm±0.58 4.05 ±plus-or-minus\pm±0.07 24.78 ±plus-or-minus\pm±0.45 12.95 ±plus-or-minus\pm±0.80 91.52 ±plus-or-minus\pm±1.54 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 5.00 ±plus-or-minus\pm±0.04 2.52 ±plus-or-minus\pm±0.03 11.36 ±plus-or-minus\pm±0.22 3.60 ±plus-or-minus\pm±0.02 25.19 ±plus-or-minus\pm±0.30 15.73 ±plus-or-minus\pm±0.22 88.62 ±plus-or-minus\pm±0.56 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.96 ±plus-or-minus\pm±0.05 2.47 ±plus-or-minus\pm±0.04 11.25 ±plus-or-minus\pm±0.16 3.52 ±plus-or-minus\pm±0.03 20.94 ±plus-or-minus\pm±0.59 15.39 ±plus-or-minus\pm±0.29 89.53 ±plus-or-minus\pm±1.11 5 Base 5.90 ±plus-or-minus\pm±0.17 2.82 ±plus-or-minus\pm±0.06 12.81 ±plus-or-minus\pm±0.38 4.26 ±plus-or-minus\pm±0.03 42.55 ±plus-or-minus\pm±1.92 10.17 ±plus-or-minus\pm±1.26 86.89 ±plus-or-minus\pm±1.96 Transfer 6.62 ±plus-or-minus\pm±0.04 2.89 ±plus-or-minus\pm±0.01 13.41 ±plus-or-minus\pm±0.09 4.30 ±plus-or-minus\pm±0.01 29.74 ±plus-or-minus\pm±0.44 8.70 ±plus-or-minus\pm±0.14 97.46 ±plus-or-minus\pm±0.07 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 5.07 ±plus-or-minus\pm±0.05 2.58 ±plus-or-minus\pm±0.02 11.63 ±plus-or-minus\pm±0.13 3.90 ±plus-or-minus\pm±0.03 31.77 ±plus-or-minus\pm±0.85 14.29 ±plus-or-minus\pm±0.30 86.31 ±plus-or-minus\pm±0.82 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 5.01 ±plus-or-minus\pm±0.04 2.53 ±plus-or-minus\pm±0.04 11.43 ±plus-or-minus\pm±0.10 3.72 ±plus-or-minus\pm±0.03 25.99 ±plus-or-minus\pm±0.65 14.32 ±plus-or-minus\pm±0.59 86.85 ±plus-or-minus\pm±0.94 3 Base 6.23 ±plus-or-minus\pm±0.16 3.11 ±plus-or-minus\pm±0.11 13.32 ±plus-or-minus\pm±0.28 4.53 ±plus-or-minus\pm±0.03 59.34 ±plus-or-minus\pm±3.76 10.42 ±plus-or-minus\pm±0.71 84.83 ±plus-or-minus\pm±2.00 Transfer 7.52 ±plus-or-minus\pm±0.09 3.35 ±plus-or-minus\pm±0.07 14.71 ±plus-or-minus\pm±0.14 4.61 ±plus-or-minus\pm±0.01 36.62 ±plus-or-minus\pm±0.60 7.33 ±plus-or-minus\pm±0.10 97.80 ±plus-or-minus\pm±0.08 GVCL-Det* 6.12 ±plus-or-minus\pm±0.11 2.86 ±plus-or-minus\pm±0.09 13.25 ±plus-or-minus\pm±0.31 4.26 ±plus-or-minus\pm±0.05 31.96 ±plus-or-minus\pm±3.01 10.87 ±plus-or-minus\pm±0.49 93.05 ±plus-or-minus\pm±1.21 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 5.56 ±plus-or-minus\pm±0.09 2.85 ±plus-or-minus\pm±0.07 12.64 ±plus-or-minus\pm±0.15 4.61 ±plus-or-minus\pm±0.02 46.82 ±plus-or-minus\pm±1.37 13.37 ±plus-or-minus\pm±0.09 86.00 ±plus-or-minus\pm±0.95 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 5.44 ±plus-or-minus\pm±0.13 2.74 ±plus-or-minus\pm±0.06 12.38 ±plus-or-minus\pm±0.29 3.90 ±plus-or-minus\pm±0.01 27.88 ±plus-or-minus\pm±1.54 12.62 ±plus-or-minus\pm±0.64 86.18 ±plus-or-minus\pm±0.77 1 Base 8.39 ±plus-or-minus\pm±1.16 3.44 ±plus-or-minus\pm±0.30 16.25 ±plus-or-minus\pm±2.27 5.23 ±plus-or-minus\pm±0.10 83.20 ±plus-or-minus\pm±3.84 5.39 ±plus-or-minus\pm±1.92 81.48 ±plus-or-minus\pm±4.34 Transfer 8.44 ±plus-or-minus\pm±0.07 4.18 ±plus-or-minus\pm±0.08 15.71 ±plus-or-minus\pm±0.27 5.52 ±plus-or-minus\pm±0.01 52.92 ±plus-or-minus\pm±0.11 4.53 ±plus-or-minus\pm±0.15 98.29 ±plus-or-minus\pm±0.27 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 6.33 ±plus-or-minus\pm±0.64 2.88 ±plus-or-minus\pm±0.07 12.76 ±plus-or-minus\pm±1.07 5.48 ±plus-or-minus\pm±0.01 77.64 ±plus-or-minus\pm±3.38 8.48 ±plus-or-minus\pm±0.92 70.42 ±plus-or-minus\pm±3.97 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 5.39 ±plus-or-minus\pm±0.28 2.68 ±plus-or-minus\pm±0.07 12.27 ±plus-or-minus\pm±0.53 4.40 ±plus-or-minus\pm±0.01 50.19 ±plus-or-minus\pm±1.15 9.94 ±plus-or-minus\pm±0.56 79.34 ±plus-or-minus\pm±2.54
Table 3: Average performance and standard deviation of 5 independent repetitions over decreasing subsamples of Argoverse2 (bold as best). Data (in %) Model minADE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT minADE𝟓5\mathbf{{}_{5}}start_FLOATSUBSCRIPT bold_5 end_FLOATSUBSCRIPT minFDE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT NLL RNK ACC (in %) DAC (in %) 100 Base 3.57 ±plus-or-minus\pm±0.07 1.84 ±plus-or-minus\pm±0.04 8.96 ±plus-or-minus\pm±0.15 2.73±plus-or-minus\pm±0.01 7.87 ±plus-or-minus\pm±0.58 24.46 ±plus-or-minus\pm±0.59 94.77 ±plus-or-minus\pm±0.32 Transfer 3.60 ±plus-or-minus\pm±0.04 1.76 ±plus-or-minus\pm±0.02 8.78 ±plus-or-minus\pm±0.08 2.68 ±plus-or-minus\pm±0.01 7.31 ±plus-or-minus\pm±0.18 24.61 ±plus-or-minus\pm±0.33 96.91 ±plus-or-minus\pm±0.09 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 3.60 ±plus-or-minus\pm±0.04 1.86 ±plus-or-minus\pm±0.03 9.00 ±plus-or-minus\pm±0.15 2.74 ±plus-or-minus\pm±0.03 8.19 ±plus-or-minus\pm±0.28 25.24 ±plus-or-minus\pm±0.29 95.00 ±plus-or-minus\pm±0.47 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 3.51 ±plus-or-minus\pm±0.06 1.82 ±plus-or-minus\pm±0.03 8.73 ±plus-or-minus\pm±0.07 2.69 ±plus-or-minus\pm±0.01 7.69 ±plus-or-minus\pm±0.17 25.56 ±plus-or-minus\pm±0.42 95.01 ±plus-or-minus\pm±0.12 50 Base 3.93 ±plus-or-minus\pm±0.10 1.97 ±plus-or-minus\pm±0.07 9.78 ±plus-or-minus\pm±0.31 2.98 ±plus-or-minus\pm±0.07 9.89 ±plus-or-minus\pm±0.53 21.02 ±plus-or-minus\pm±1.17 93.99 ±plus-or-minus\pm±1.58 Transfer 3.80 ±plus-or-minus\pm±0.01 1.83 ±plus-or-minus\pm±0.02 9.36 ±plus-or-minus\pm±0.05 2.80 ±plus-or-minus\pm±0.01 8.16 ±plus-or-minus\pm±0.02 23.41 ±plus-or-minus\pm±0.29 97.04 ±plus-or-minus\pm±0.22 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 3.84 ±plus-or-minus\pm±0.04 2.01 ±plus-or-minus\pm±0.02 9.67 ±plus-or-minus\pm±0.15 2.89 ±plus-or-minus\pm±0.02 9.95 ±plus-or-minus\pm±0.28 23.53 ±plus-or-minus\pm±0.40 94.93 ±plus-or-minus\pm±0.19 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 3.76 ±plus-or-minus\pm±0.02 1.95 ±plus-or-minus\pm±0.02 9.38 ±plus-or-minus\pm±0.06 2.84 ±plus-or-minus\pm±0.02 9.27 ±plus-or-minus\pm±0.15 23.81 ±plus-or-minus\pm±0.76 94.92 ±plus-or-minus\pm±0.64 30 Base 4.22 ±plus-or-minus\pm±0.10 2.04 ±plus-or-minus\pm±0.01 10.41 ±plus-or-minus\pm±0.28 3.07 ±plus-or-minus\pm±0.06 11.18 ±plus-or-minus\pm±1.33 19.76 ±plus-or-minus\pm±0.49 94.37 ±plus-or-minus\pm±0.71 Transfer 3.99 ±plus-or-minus\pm±0.02 1.89 ±plus-or-minus\pm±0.02 9.76 ±plus-or-minus\pm±0.05 2.91 ±plus-or-minus\pm±0.01 9.07 ±plus-or-minus\pm±0.04 22.02 ±plus-or-minus\pm±0.23 97.15 ±plus-or-minus\pm±0.23 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 3.98 ±plus-or-minus\pm±0.03 2.09 ±plus-or-minus\pm±0.04 9.95 ±plus-or-minus\pm±0.09 2.99 ±plus-or-minus\pm±0.01 11.40 ±plus-or-minus\pm±0.30 22.79 ±plus-or-minus\pm±0.22 94.73 ±plus-or-minus\pm±0.56 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 3.96 ±plus-or-minus\pm±0.04 2.04 ±plus-or-minus\pm±0.02 9.88 ±plus-or-minus\pm±0.12 2.95 ±plus-or-minus\pm±0.02 10.54 ±plus-or-minus\pm±0.18 22.60 ±plus-or-minus\pm±0.30 94.94 ±plus-or-minus\pm±0.58 10 Base 4.70 ±plus-or-minus\pm±0.10 2.25 ±plus-or-minus\pm±0.02 11.43 ±plus-or-minus\pm±0.16 3.42 ±plus-or-minus\pm±0.06 17.17 ±plus-or-minus\pm±0.28 16.91 ±plus-or-minus\pm±0.38 93.63 ±plus-or-minus\pm±0.52 Transfer 4.49 ±plus-or-minus\pm±0.02 2.07 ±plus-or-minus\pm±0.02 10.76 ±plus-or-minus\pm±0.05 3.21 ±plus-or-minus\pm±0.01 12.40 ±plus-or-minus\pm±0.07 18.46 ±plus-or-minus\pm±0.27 97.48 ±plus-or-minus\pm±0.55 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 4.26 ±plus-or-minus\pm±0.03 2.23 ±plus-or-minus\pm±0.03 10.49 ±plus-or-minus\pm±0.22 3.19 ±plus-or-minus\pm±0.02 14.99 ±plus-or-minus\pm±0.13 20.19 ±plus-or-minus\pm±0.29 94.22 ±plus-or-minus\pm±0.89 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.23 ±plus-or-minus\pm±0.08 2.22 ±plus-or-minus\pm±0.05 10.47 ±plus-or-minus\pm±0.20 3.15 ±plus-or-minus\pm±0.02 13.85 ±plus-or-minus\pm±0.38 20.90 ±plus-or-minus\pm±0.16 94.35 ±plus-or-minus\pm±0.65 5 Base 5.04 ±plus-or-minus\pm±0.09 2.41 ±plus-or-minus\pm±0.06 12.33 ±plus-or-minus\pm±0.23 3.67 ±plus-or-minus\pm±0.02 23.73 ±plus-or-minus\pm±0.87 15.05 ±plus-or-minus\pm±0.80 90.79 ±plus-or-minus\pm±1.79 Transfer 4.94 ±plus-or-minus\pm±0.01 2.25 ±plus-or-minus\pm±0.01 11.49 ±plus-or-minus\pm±0.02 3.50 ±plus-or-minus\pm±0.01 16.80 ±plus-or-minus\pm±0.03 15.86 ±plus-or-minus\pm±0.16 97.12 ±plus-or-minus\pm±0.38 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 4.43 ±plus-or-minus\pm±0.04 2.31 ±plus-or-minus\pm±0.02 11.06 ±plus-or-minus\pm±0.08 3.36 ±plus-or-minus\pm±0.02 18.92 ±plus-or-minus\pm±0.29 19.32 ±plus-or-minus\pm±0.29 91.60 ±plus-or-minus\pm±0.82 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.41 ±plus-or-minus\pm±0.01 2.24 ±plus-or-minus\pm±0.02 10.92 ±plus-or-minus\pm±0.11 3.28 ±plus-or-minus\pm±0.01 16.30 ±plus-or-minus\pm±0.30 19.36 ±plus-or-minus\pm±0.48 93.17 ±plus-or-minus\pm±1.20 3 Base 5.41 ±plus-or-minus\pm±0.09 2.48 ±plus-or-minus\pm±0.11 12.99 ±plus-or-minus\pm±0.47 3.88 ±plus-or-minus\pm±0.03 28.85 ±plus-or-minus\pm±1.35 13.55 ±plus-or-minus\pm±0.47 90.96 ±plus-or-minus\pm±0.50 Transfer 5.44 ±plus-or-minus\pm±0.01 2.44 ±plus-or-minus\pm±0.07 12.35 ±plus-or-minus\pm±0.04 3.73 ±plus-or-minus\pm±0.01 20.89 ±plus-or-minus\pm±0.04 13.81 ±plus-or-minus\pm±0.07 97.16 ±plus-or-minus\pm±0.33 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 4.54 ±plus-or-minus\pm±0.04 2.34 ±plus-or-minus\pm±0.02 11.31 ±plus-or-minus\pm±0.13 3.50 ±plus-or-minus\pm±0.01 21.96 ±plus-or-minus\pm±0.34 17.78 ±plus-or-minus\pm±0.22 91.28 ±plus-or-minus\pm±0.82 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 4.51 ±plus-or-minus\pm±0.05 2.33 ±plus-or-minus\pm±0.04 11.06 ±plus-or-minus\pm±0.14 3.41 ±plus-or-minus\pm±0.01 18.04 ±plus-or-minus\pm±0.33 17.94 ±plus-or-minus\pm±0.20 92.49 ±plus-or-minus\pm±0.79 1 Base 5.96 ±plus-or-minus\pm±0.26 2.75 ±plus-or-minus\pm±0.04 14.15 ±plus-or-minus\pm±0.62 4.46 ±plus-or-minus\pm±0.01 50.60 ±plus-or-minus\pm±0.99 11.33 ±plus-or-minus\pm±0.96 87.43 ±plus-or-minus\pm±3.49 Transfer 6.52 ±plus-or-minus\pm±0.03 2.95 ±plus-or-minus\pm±0.01 14.30 ±plus-or-minus\pm±0.06 4.28 ±plus-or-minus\pm±0.01 33.31 ±plus-or-minus\pm±0.12 10.02 ±plus-or-minus\pm±0.02 98.70 ±plus-or-minus\pm±0.03 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT 5.02 ±plus-or-minus\pm±0.05 2.53 ±plus-or-minus\pm±0.02 12.26 ±plus-or-minus\pm±0.13 3.96 ±plus-or-minus\pm±0.01 40.58 ±plus-or-minus\pm±0.50 15.14 ±plus-or-minus\pm±0.50 89.11 ±plus-or-minus\pm±0.83 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT 5.00 ±plus-or-minus\pm±0.09 2.50 ±plus-or-minus\pm±0.02 12.14 ±plus-or-minus\pm±0.19 3.75 ±plus-or-minus\pm±0.02 26.63 ±plus-or-minus\pm±0.90 15.12 ±plus-or-minus\pm±0.39 90.34 ±plus-or-minus\pm±0.64
Refer to caption
Figure 2: Average performance and standard deviation in NLL, minFDE11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT and DAC of five repetitions for the informed and non-informed CoverNet-SNGP over decreasing subsamples of NuScenes.

Tab. 2 and Tab. 3 show the performance of our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT in comparison to the baselines on NuScenes and Argoverse2, respectively. We observe, that the prior drivability knowledge leads to notable performance benefits in our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT and informed baselines (Transfer-CoverNet, GVCl-Det-CoverNet) across most metrics. The benefits from the prior drivability knowledge are most substantial in the calibration-sensitive metrics (RNK and notably NLL, e.g., as seen in Fig 2) that directly benefit from the optimization in the knowledge tasks. The drivability knowledge is less helpful in discerning the best candidate between the remaining drivable candidate trajectories, leading to lower benefits in the respective metrics (minADE11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT , minFDE11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT , ACC).

We also observe, that Transfer-CoverNet’s benefits are limited to higher data regimes. In low data regimes, Transfer-CoverNet can even perform substantially worse than Base-CoverNet across all metrics (except DAC). In these low data regimes, Transfer-CoverNet may converge to less adequate minima, due to its weight initialization being overly biased towards drivability (illustrated by the rising DAC). In contrast, GVCL-Det-CoverNet and our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT never decrease performance, with consistent benefits especially in low data regimes. This highlights a principal advantage of the PIL approach, where the informative prior helps to shape the complete loss landscape during training.

In comparison to GVCL-Det-CoverNet, our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT shows benefits across most metrics, especially in low data regimes, even though both are trained using the PIL approach. The advantage is most visible in the metrics concerning the most-likely trajectory (minADE11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT , ACC). CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT also shows more stable results with a lower standard deviations. Here, our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT profits from using the full information of the posterior distribution at inference.

6.2 Effect of Location-Specific Training

Table 4: Average performance and standard deviation of 5 independent repetitions trained on Singapore and Boston locations from NuScenes. Train Location Model Test Location minADE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT minADE𝟓5\mathbf{{}_{5}}start_FLOATSUBSCRIPT bold_5 end_FLOATSUBSCRIPT minFDE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT NLL RNK ACC DAC Singapur Base Singapur 5.33 ±plus-or-minus\pm±0.40 2.37 ±plus-or-minus\pm±0.06 11.54±plus-or-minus\pm±0.80 3.69±plus-or-minus\pm±0.06 19.79±plus-or-minus\pm±0.71 12.97±plus-or-minus\pm±1.89 84.94±plus-or-minus\pm±1.35 Boston 5.83 ±plus-or-minus\pm±0.22 2.64 ±plus-or-minus\pm±0.04 12.76±plus-or-minus\pm±0.41 3.93±plus-or-minus\pm±0.07 24.98±plus-or-minus\pm±1.08 10.18±plus-or-minus\pm±0.80 89.79±plus-or-minus\pm±2.13 Transfer Singapur 5.47 ±plus-or-minus\pm±0.07 2.35 ±plus-or-minus\pm±0.03 11.41±plus-or-minus\pm±0.14 3.49±plus-or-minus\pm±0.02 13.79±plus-or-minus\pm±0.18 11.49±plus-or-minus\pm±0.50 94.29±plus-or-minus\pm±0.59 Boston 6.65 ±plus-or-minus\pm±0.10 2.94 ±plus-or-minus\pm±0.03 14.26 ±plus-or-minus\pm±0.20 4.09 ±plus-or-minus\pm±0.01 24.09 ±plus-or-minus\pm±0.31 8.55 ±plus-or-minus\pm±0.40 96.09±plus-or-minus\pm±0.20 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT Singapur 4.48 ±plus-or-minus\pm±0.06 2.26 ±plus-or-minus\pm±0.02 10.05 ±plus-or-minus\pm±0.16 3.38 ±plus-or-minus\pm±0.03 15.06 ±plus-or-minus\pm±0.31 15.85 ±plus-or-minus\pm±0.46 85.30 ±plus-or-minus\pm±1.01 Boston 5.38 ±plus-or-minus\pm±0.15 2.71 ±plus-or-minus\pm±0.05 12.20 ±plus-or-minus\pm±0.35 3.65 ±plus-or-minus\pm±0.02 20.81 ±plus-or-minus\pm±0.56 13.15±plus-or-minus\pm±0.73 90.37±plus-or-minus\pm±0.95 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT Singapur 4.43 ±plus-or-minus\pm±0.07 2.20 ±plus-or-minus\pm±0.06 9.83 ±plus-or-minus\pm±0.15 3.31 ±plus-or-minus\pm±0.06 13.64 ±plus-or-minus\pm±1.31 15.84±plus-or-minus\pm±1.05 86.56±plus-or-minus\pm±0.50 Boston 5.36 ±plus-or-minus\pm±0.09 2.68 ±plus-or-minus\pm±0.08 12.18 ±plus-or-minus\pm±0.26 3.65 ±plus-or-minus\pm±0.01 21.56±plus-or-minus\pm±1.40 12.95 ±plus-or-minus\pm±0.74 90.68 ±plus-or-minus\pm±0.79 Boston Base Boston 5.02 ±plus-or-minus\pm±0.20 2.32 ±plus-or-minus\pm±0.09 11.18±plus-or-minus\pm±0.49 3.57 ±plus-or-minus\pm±0.08 18.10±plus-or-minus\pm±1.31 13.18±plus-or-minus\pm±1.23 90.18 ±plus-or-minus\pm±2.13 Singapur 5.69 ±plus-or-minus\pm±0.28 2.73 ±plus-or-minus\pm±0.15 12.77 ±plus-or-minus\pm±0.78 3.88 ±plus-or-minus\pm±0.07 23.42 ±plus-or-minus\pm±0.47 11.37 ±plus-or-minus\pm±0.98 82.03 ±plus-or-minus\pm±2.92 Transfer Boston 4.78 ±plus-or-minus\pm±0.06 2.19 ±plus-or-minus\pm±0.01 10.21 ±plus-or-minus\pm±0.12 3.41 ±plus-or-minus\pm±0.01 14.39 ±plus-or-minus\pm±0.19 14.02 ±plus-or-minus\pm±0.56 96.50 ±plus-or-minus\pm±0.74 Singapur 5.63 ±plus-or-minus\pm±0.06 2.64 ±plus-or-minus\pm±0.04 12.17 ±plus-or-minus\pm±0.20 3.70 ±plus-or-minus\pm±0.01 18.77 ±plus-or-minus\pm±0.16 11.40 ±plus-or-minus\pm±0.61 93.10 ±plus-or-minus\pm±1.15 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT Boston 4.62 ±plus-or-minus\pm±0.10 2.23 ±plus-or-minus\pm±0.02 10.46 ±plus-or-minus\pm±0.25 3.32 ±plus-or-minus\pm±0.01 14.83 ±plus-or-minus\pm±0.40 16.57 ±plus-or-minus\pm±0.64 93.31 ±plus-or-minus\pm±0.40 Singapur 4.94 ±plus-or-minus\pm±0.07 2.61 ±plus-or-minus\pm±0.11 11.27 ±plus-or-minus\pm±0.17 3.58 ±plus-or-minus\pm±0.03 19.48 ±plus-or-minus\pm±0.16 14.93 ±plus-or-minus\pm±1.06 83.28 ±plus-or-minus\pm±0.60 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT Boston 4.50 ±plus-or-minus\pm±0.04 2.19 ±plus-or-minus\pm±0.02 10.13 ±plus-or-minus\pm±0.11 3.26 ±plus-or-minus\pm±0.02 12.97 ±plus-or-minus\pm±0.33 16.94 ±plus-or-minus\pm±0.60 94.01 ±plus-or-minus\pm±0.27 Singapur 4.82 ±plus-or-minus\pm±0.07 2.60 ±plus-or-minus\pm±0.07 10.95 ±plus-or-minus\pm±0.18 3.52 ±plus-or-minus\pm±0.04 18.39 ±plus-or-minus\pm±0.66 15.60 ±plus-or-minus\pm±0.70 85.36 ±plus-or-minus\pm±1.18
Table 5: Average performance and standard deviation of 5 independent repetitions trained on Palo-Alto and Miami locations from Argoverse2. Train Location Model Test Location minADE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT minADE𝟓5\mathbf{{}_{5}}start_FLOATSUBSCRIPT bold_5 end_FLOATSUBSCRIPT minFDE𝟏1\mathbf{{}_{1}}start_FLOATSUBSCRIPT bold_1 end_FLOATSUBSCRIPT NLL RNK ACC DAC Palo-Alto Base Palo-Alto 4.94 ±plus-or-minus\pm±0.12 2.35 ±plus-or-minus\pm±0.05 12.13 ±plus-or-minus\pm±0.20 3.45 ±plus-or-minus\pm±0.07 17.41 ±plus-or-minus\pm±0.21 14.72 ±plus-or-minus\pm±1.01 92.94 ±plus-or-minus\pm±1.41 Ex-Palo-Alto 5.02 ±plus-or-minus\pm±0.42 2.51 ±plus-or-minus\pm±0.23 12.24 ±plus-or-minus\pm±0.51 3.65 ±plus-or-minus\pm±0.12 22.18 ±plus-or-minus\pm±1.20 14.18 ±plus-or-minus\pm±0.79 91.90 ±plus-or-minus\pm±1.53 Transfer Palo-Alto 4.91 ±plus-or-minus\pm±0.05 2.19 ±plus-or-minus\pm±0.01 11.32 ±plus-or-minus\pm±0.13 3.27 ±plus-or-minus\pm±0.01 13.75 ±plus-or-minus\pm±0.13 18.66 ±plus-or-minus\pm±0.43 95.92 ±plus-or-minus\pm±0.38 Ex-Palo-Alto 5.33 ±plus-or-minus\pm±0.03 2.44 ±plus-or-minus\pm±0.01 12.39 ±plus-or-minus\pm±0.90 3.63 ±plus-or-minus\pm±0.01 18.30 ±plus-or-minus\pm±0.13 13.68 ±plus-or-minus\pm±0.34 95.92 ±plus-or-minus\pm±0.46 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT Palo-Alto 4.23 ±plus-or-minus\pm±0.06 2.20 ±plus-or-minus\pm±0.01 10.63 ±plus-or-minus\pm±0.19 3.11 ±plus-or-minus\pm±0.03 15.03 ±plus-or-minus\pm±0.56 23.42 ±plus-or-minus\pm±0.35 92.02 ±plus-or-minus\pm±1.74 Ex-Palo-Alto 4.55 ±plus-or-minus\pm±0.05 2.38 ±plus-or-minus\pm±0.02 11.35 ±plus-or-minus\pm±0.13 3.37 ±plus-or-minus\pm±0.02 18.66 ±plus-or-minus\pm±0.55 18.58 ±plus-or-minus\pm±0.68 92.06 ±plus-or-minus\pm±1.55 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT Palo-Alto 4.23 ±plus-or-minus\pm±0.05 2.19 ±plus-or-minus\pm±0.04 10.40 ±plus-or-minus\pm±0.22 3.06 ±plus-or-minus\pm±0.03 13.72 ±plus-or-minus\pm±0.54 22.38 ±plus-or-minus\pm±0.47 91.74 ±plus-or-minus\pm±3.01 Ex-Palo-Alto 4.57 ±plus-or-minus\pm±0.11 2.37 ±plus-or-minus\pm±0.04 11.30 ±plus-or-minus\pm±0.32 3.35 ±plus-or-minus\pm±0.01 17.43 ±plus-or-minus\pm±0.54 18.09 ±plus-or-minus\pm±0.74 91.88 ±plus-or-minus\pm±2.25 Miami Base Miami 4.02 ±plus-or-minus\pm±0.21 2.22 ±plus-or-minus\pm±0.11 10.28 ±plus-or-minus\pm±0.35 3.45 ±plus-or-minus\pm±0.06 14.12 ±plus-or-minus\pm±0.78 18.97 ±plus-or-minus\pm±0.31 95.20 ±plus-or-minus\pm±0.98 Ex-Miami 4.29 ±plus-or-minus\pm±0.22 2.31 ±plus-or-minus\pm±0.13 11.01 ±plus-or-minus\pm±0.39 3.47 ±plus-or-minus\pm±0.09 16.18 ±plus-or-minus\pm±0.92 17.92 ±plus-or-minus\pm±0.79 94.99 ±plus-or-minus\pm±1.12 Transfer Miami 3.91 ±plus-or-minus\pm±0.01 1.85 ±plus-or-minus\pm±0.01 9.52 ±plus-or-minus\pm±0.02 2.94 ±plus-or-minus\pm±0.01 9.17 ±plus-or-minus\pm±0.04 21.33 ±plus-or-minus\pm±0.29 97.42 ±plus-or-minus\pm±0.72 Ex-Miami 4.31 ±plus-or-minus\pm±0.02 2.07 ±plus-or-minus\pm±0.01 10.47 ±plus-or-minus\pm±0.05 3.10 ±plus-or-minus\pm±0.01 10.62 ±plus-or-minus\pm±0.04 19.65 ±plus-or-minus\pm±0.35 97.41 ±plus-or-minus\pm±0.98 SNGPUU{}_{\text{U}}start_FLOATSUBSCRIPT U end_FLOATSUBSCRIPT Miami 3.88 ±plus-or-minus\pm±0.04 2.03 ±plus-or-minus\pm±0.02 9.74 ±plus-or-minus\pm±0.11 3.00 ±plus-or-minus\pm±0.01 11.48 ±plus-or-minus\pm±0.16 22.07 ±plus-or-minus\pm±0.41 95.58 ±plus-or-minus\pm±0.40 Ex-Miami 4.15 ±plus-or-minus\pm±0.04 2.21 ±plus-or-minus\pm±0.02 10.44 ±plus-or-minus\pm±0.13 3.11 ±plus-or-minus\pm±0.01 13.56 ±plus-or-minus\pm±0.21 21.50 ±plus-or-minus\pm±0.51 94.81 ±plus-or-minus\pm±0.35 SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT Miami 3.88 ±plus-or-minus\pm±0.05 1.99 ±plus-or-minus\pm±0.02 9.65 ±plus-or-minus\pm±0.15 2.99 ±plus-or-minus\pm±0.01 10.75 ±plus-or-minus\pm±0.21 21.71 ±plus-or-minus\pm±0.53 95.21 ±plus-or-minus\pm±0.46 Ex-Miami 4.17 ±plus-or-minus\pm±0.05 2.20 ±plus-or-minus\pm±0.03 10.42 ±plus-or-minus\pm±0.15 3.09 ±plus-or-minus\pm±0.02 12.68 ±plus-or-minus\pm±0.31 21.25 ±plus-or-minus\pm±0.59 94.26 ±plus-or-minus\pm±0.58
Refer to caption
Figure 3: Average performance and standard deviation of the informed and non-informed CoverNet-SNGP on Boston and Singapore test data, with (a) models trained on Singapore training data and (b) models trained on Boston training data (five repetitions).

Tab. 4 and Tab. 5 show location-specific performances of our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT in comparison to the baselines on NuScenes and Argoverse2, respectively. We observe, that the performance generally and substantially deteriorates in locations which are not included in the training data. This sensitivity of trajectory prediction models to location-transfers can be a major limitation to their practical use.

We also observe, that our CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT can help to alleviate this issue by consistently improving the generalization over location-transfers. This is most visible in the comparison of the Boston trained models on NuScenes (see Fig. 3) and the Palo-Alto trained models in Argoverse2, where we see a better performance across most metrics in same-location and location-transfer tests. The Transfer-CoverNet baseline performs even worse than Base-CoverNet in these cases, pointing to the same limitation we see in Sec. 6.1 regarding its bias. In the other two comparisons, CoverNet-SNGPII{}_{\text{I}}start_FLOATSUBSCRIPT I end_FLOATSUBSCRIPT still shows advantages (notably NLL). However, in case of Miami in Argoverse2, more training data is available (compare Sec. 6.1), and in case of Singapore in NuScenes the drivability knowledge might be less useful (see Fig. 3), since all models achieve a lower DAC.

7 Conclusion

Our work introduces a novel regularization-based continual learning method for the SNGP model. We apply this method in a PIL approach for trajectory prediction in autonomous driving, deriving a compute-efficient informed CoverNet-SNGP model integrating prior drivability knowledge. We demonstrate on two public datasets, that our informed CoverNet-SNGP increases data-efficiency and robustness to location-transfers, outperforming informed and non-informed baselines in low data regimes. Thus, we show that our proposed continual learning method is a feasible way to regularize SNGPs using informative priors. In future work, we plan to apply informed SNGPs to more recent transformer-based prediction models using self-supervised learning and investigate robustness against adversarial attacks and outliers.

Acknowledgments

The research leading to these results is funded by the German Federal Ministry for Economic Affairs and Climate Action within the project “KI Wissen – Entwicklung von Methoden für die Einbindung von Wissen in maschinelles Lernen”. The authors would like to thank the consortium for the successful cooperation.

References

  • [Bagus and Gepperth, 2021] Benedikt Bagus and Alexander Gepperth. An investigation of replay-based approaches for continual learning. In International Joint Conference on Neural Networks, IJCNN 2021. IEEE, 2021.
  • [Bahari et al., 2021] Mohammadhossein Bahari, Ismail Nejjar, and Alexandre Alahi. Injecting Knowledge in Data-driven Vehicle Trajectory Predictors. Transportation Research Part C: Emerging Technologies, 2021.
  • [Boulton et al., 2021] Freddy A. Boulton, Elena Corina Grigore, and Eric M. Wolff. Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge. arXiv preprint, https://arxiv.org/abs/2006.04767, 2021.
  • [Caesar et al., 2020] Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 2020.
  • [Charpentier et al., 2023] Bertrand Charpentier, Chenxiang Zhang, and Stephan Günnemann. Training, architecture, and prior for deterministic uncertainty methods. arXiv preprint, https://arxiv.org/abs/2303.05796, 2023.
  • [Cormen et al., 2009] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009.
  • [Cui et al., 2020] Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, Jeff Schneider, David Bradley, and Nemanja Djuric. Deep kinematic models for kinematically feasible vehicle trajectory predictions. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, 2020.
  • [De Lange et al., 2022] Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory G. Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  • [Derakhshani et al., 2021] Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, and Cees Snoek. Kernel continual learning. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Proceedings of Machine Learning Research. PMLR, 2021.
  • [Freiesleben and Grote, 2023] Timo Freiesleben and Thomas Grote. Beyond generalization: a theory of robustness in machine learning. Synthese, 2023.
  • [Huang et al., 2022] Yanjun Huang, Jiatong Du, Ziru Yang, Zewei Zhou, Lin Zhang, and Hong Chen. A Survey on Trajectory-Prediction Methods for Autonomous Driving. IEEE Transactions on Intelligent Vehicles, 2022.
  • [Kirkpatrick et al., 2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America, 2017.
  • [Kristiadi et al., 2020] Agustinus Kristiadi, Matthias Hein, and Philipp Hennig. Being bayesian, even just a bit, fixes overconfidence in relu networks. In Proceedings of the 37th International Conference on Machine Learning, , ICML 2020, Proceedings of Machine Learning Research. PMLR, 2020.
  • [Liu et al., 2020] Jeremiah Z. Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, and Balaji Lakshminarayanan. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.
  • [Loo et al., 2021] Noel Loo, Siddharth Swaroop, and Richard E. Turner. Generalized variational continual learning. In 9th International Conference on Learning Representations, ICLR 2021. OpenReview.net, 2021.
  • [Makansi et al., 2022] Osama Makansi, Julius von Kügelgen, Francesco Locatello, Peter Vincent Gehler, Dominik Janzing, Thomas Brox, and Bernhard Schölkopf. You mostly walk alone: Analyzing feature attribution in trajectory prediction. In The Tenth International Conference on Learning Representations, ICLR 2022. OpenReview.net, 2022.
  • [Malinin et al., 2021] Andrey Malinin, Neil Band, Yarin Gal, Mark J. F. Gales, Alexander Ganshin, German Chesnokov, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Denis Roginskiy, Mariya Shmatova, Panagiotis Tigas, and Boris Yangel. Shifts: A dataset of real distributional shift across multiple large-scale tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, 2021.
  • [Parisi et al., 2019] German Ignacio Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 2019.
  • [Phan-Minh et al., 2020] Tung Phan-Minh, Elena Corina Grigore, Freddy A. Boulton, Oscar Beijbom, and Eric M. Wolff. Covernet: Multimodal behavior prediction using trajectory sets. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 2020.
  • [Postels et al., 2022] Janis Postels, Mattia Segù, Tao Sun, Luca Daniel Sieber, Luc Van Gool, Fisher Yu, and Federico Tombari. On the practicality of deterministic epistemic uncertainty. In Proceedings of the 39th International Conference on Machine Learning, ICML 2022, Proceedings of Machine Learning Research. PMLR, 2022.
  • [Rahimi and Recht, 2007] Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20: Annual Conference on Neural Information Processing Systems 2007, NeurIPS 2007. Curran Associates, Inc., 2007.
  • [Schlauch et al., 2023] Christian Schlauch, Christian Wirth, and Nadja Klein. Informed priors for knowledge integration in trajectory prediction. In Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023, Turin, Italy. Springer, 2023.
  • [Schwarz et al., 2018] Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Proceedings of Machine Learning Research. PMLR, 2018.
  • [Shwartz-Ziv et al., 2022] Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann LeCun, and Andrew Gordon Wilson. Pre-train your loss: Easy bayesian transfer learning with informative priors. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, 2022.
  • [Titsias et al., 2020] Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, and Yee Whye Teh. Functional regularisation for continual learning with gaussian processes. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, 2020.
  • [van Amersfoort et al., 2021] Joost R. van Amersfoort, Lewis Smith, Andrew Jesson, Oscar Key, and Yarin Gal. On feature collapse and deep kernel learning for single forward pass uncertainty. 2021.
  • [von Rueden et al., 2021] Laura von Rueden, Sebastian Mayer, Katharina Beckh, Bogdan Georgiev, Sven Giesselbach, Raoul Heese, Birgit Kirsch, Julius Pfrommer, Annika Pick, Rajkumar Ramamurthy, Michal Walczak, Jochen Garcke, Christian Bauckhage, and Jannis Schuecker. Informed Machine Learning – A Taxonomy and Survey of Integrating Knowledge into Learning Systems. IEEE Transactions on Knowledge and Data Engineering, 2021.
  • [Wilson et al., 2023] Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint, https://arxiv.org/abs/2301.00493, 2023.
  • [Wörmann et al., 2022] Julian Wörmann, Daniel Bogdoll, Etienne Bührle, Han Chen, Evaristus Fuh Chuo, Kostadin Cvejoski, Ludger van Elst, Tobias Gleißner, Philip Gottschall, Stefan Griesche, Christian Hellert, Christian Hesels, Sebastian Houben, Tim Joseph, Niklas Keil, Johann Kelsch, Hendrik Königshof, Erwin Kraft, Leonie Kreuser, Kevin Krone, Tobias Latka, Denny Mattern, Stefan Matthes, Mohsin Munir, Moritz Nekolla, Adrian Paschke, Maximilian Alexander Pintz, Tianming Qiu, Faraz Qureishi, Syed Tahseen Raza Rizvi, Jörg Reichardt, Laura von Rueden, Stefan Rudolph, Alexander Sagel, Gerhard Schunk, Hao Shen, Hendrik Stapelbroek, Vera Stehr, Gurucharan Srinivas, Anh Tuan Tran, Abhishek Vivekanandan, Ya Wang, Florian Wasserrab, Tino Werner, Christian Wirth, and Stefan Zwicklbauer. Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey. arXiv preprint, https://arxiv.org/abs/2205.04712, 2022.