-
Phonetic Richness for Improved Automatic Speaker Verification
Authors:
Nicholas Klein,
Ganesh Sivaraman,
Elie Khoury
Abstract:
When it comes to authentication in speaker verification systems, not all utterances are created equal. It is essential to estimate the quality of test utterances in order to account for varying acoustic conditions. In addition to the net-speech duration of an utterance, it is observed in this paper that phonetic richness is also a key indicator of utterance quality, playing a significant role in a…
▽ More
When it comes to authentication in speaker verification systems, not all utterances are created equal. It is essential to estimate the quality of test utterances in order to account for varying acoustic conditions. In addition to the net-speech duration of an utterance, it is observed in this paper that phonetic richness is also a key indicator of utterance quality, playing a significant role in accurate speaker verification. Several phonetic histogram based formulations of phonetic richness are explored using transcripts obtained from an automatic speaker recognition system. The proposed phonetic richness measure is found to be positively correlated with voice authentication scores across evaluation benchmarks. Additionally, the proposed measure in combination with net speech helps in calibrating the speaker verification scores, obtaining a relative EER improvement of 5.8% on the Voxceleb1 evaluation protocol. The proposed phonetic richness based calibration provides higher benefit for short utterances with repeated words.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Source Tracing of Audio Deepfake Systems
Authors:
Nicholas Klein,
Tianxiang Chen,
Hemlata Tak,
Ricardo Casal,
Elie Khoury
Abstract:
Recent progress in generative AI technology has made audio deepfakes remarkably more realistic. While current research on anti-spoofing systems primarily focuses on assessing whether a given audio sample is fake or genuine, there has been limited attention on discerning the specific techniques to create the audio deepfakes. Algorithms commonly used in audio deepfake generation, like text-to-speech…
▽ More
Recent progress in generative AI technology has made audio deepfakes remarkably more realistic. While current research on anti-spoofing systems primarily focuses on assessing whether a given audio sample is fake or genuine, there has been limited attention on discerning the specific techniques to create the audio deepfakes. Algorithms commonly used in audio deepfake generation, like text-to-speech (TTS) and voice conversion (VC), undergo distinct stages including input processing, acoustic modeling, and waveform generation. In this work, we introduce a system designed to classify various spoofing attributes, capturing the distinctive features of individual modules throughout the entire generation pipeline. We evaluate our system on two datasets: the ASVspoof 2019 Logical Access and the Multi-Language Audio Anti-Spoofing Dataset (MLAAD). Results from both experiments demonstrate the robustness of the system to identify the different spoofing attributes of deepfake generation systems.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
From Counting Stations to City-Wide Estimates: Data-Driven Bicycle Volume Extrapolation
Authors:
Silke K. Kaiser,
Nadja Klein,
Lynn H. Kaack
Abstract:
Shifting to cycling in urban areas reduces greenhouse gas emissions and improves public health. Street-level bicycle volume information would aid cities in planning targeted infrastructure improvements to encourage cycling and provide civil society with evidence to advocate for cyclists' needs. Yet, the data currently available to cities and citizens often only comes from sparsely located counting…
▽ More
Shifting to cycling in urban areas reduces greenhouse gas emissions and improves public health. Street-level bicycle volume information would aid cities in planning targeted infrastructure improvements to encourage cycling and provide civil society with evidence to advocate for cyclists' needs. Yet, the data currently available to cities and citizens often only comes from sparsely located counting stations. This paper extrapolates bicycle volume beyond these few locations to estimate bicycle volume for the entire city of Berlin. We predict daily and average annual daily street-level bicycle volumes using machine-learning techniques and various public data sources. These include app-based crowdsourced data, infrastructure, bike-sharing, motorized traffic, socioeconomic indicators, weather, and holiday data. Our analysis reveals that the best-performing model is XGBoost, and crowdsourced cycling and infrastructure data are most important for the prediction. We further simulate how collecting short-term counts at predicted locations improves performance. By providing ten days of such sample counts for each predicted location to the model, we are able to halve the error and greatly reduce the variability in performance among predicted locations.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Enhanced variable selection for boosting sparser and less complex models in distributional copula regression
Authors:
Annika Strömer,
Nadja Klein,
Christian Staerk,
Florian Faschingbauer,
Hannah Klinkhammer,
Andreas Mayr
Abstract:
Structured additive distributional copula regression allows to model the joint distribution of multivariate outcomes by relating all distribution parameters to covariates. Estimation via statistical boosting enables accounting for high-dimensional data and incorporating data-driven variable selection, both of which are useful given the complexity of the model class. However, as known from univaria…
▽ More
Structured additive distributional copula regression allows to model the joint distribution of multivariate outcomes by relating all distribution parameters to covariates. Estimation via statistical boosting enables accounting for high-dimensional data and incorporating data-driven variable selection, both of which are useful given the complexity of the model class. However, as known from univariate (distributional) regression, the standard boosting algorithm tends to select too many variables with minor importance, particularly in settings with large sample sizes, leading to complex models with difficult interpretation. To counteract this behavior and to avoid selecting base-learners with only a negligible impact, we combined the ideas of probing, stability selection and a new deselection approach with statistical boosting for distributional copula regression. In a simulation study and an application to the joint modelling of weight and length of newborns, we found that all proposed methods enhance variable selection by reducing the number of false positives. However, only stability selection and the deselection approach yielded similar predictive performance to classical boosting. Finally, the deselection approach is better scalable to larger datasets and led to a competitive predictive performance, which we further illustrated in a genomic cohort study from the UK Biobank by modelling the joint genetic predisposition for two phenotypes.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study
Authors:
Pallavi Mitra,
Gesina Schwalbe,
Nadja Klein
Abstract:
Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks. However, high computational and storage demands hinder their deployment into resource-constrained environments, such as embedded devices. Model pruning helps to meet these restrictions by reducing the model size, while maintaining superior performance. Meanwhile, safety-critical applicati…
▽ More
Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks. However, high computational and storage demands hinder their deployment into resource-constrained environments, such as embedded devices. Model pruning helps to meet these restrictions by reducing the model size, while maintaining superior performance. Meanwhile, safety-critical applications pose more than just resource and performance constraints. In particular, predictions must not be overly confident, i.e., provide properly calibrated uncertainty estimations (proper uncertainty calibration), and CNNs must be robust against corruptions like naturally occurring input perturbations (natural corruption robustness). This work investigates the important trade-off between uncertainty calibration, natural corruption robustness, and performance for current state-of-research post-hoc CNN pruning techniques in the context of image classification tasks. Our study reveals that post-hoc pruning substantially improves the model's uncertainty calibration, performance, and natural corruption robustness, sparking hope for safe and robust embedded CNNs.Furthermore, uncertainty calibration and natural corruption robustness are not mutually exclusive targets under pruning, as evidenced by the improved safety aspects obtained by post-hoc unstructured pruning with increasing compression.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection
Authors:
Moussa Kassem Sbeyti,
Michelle Karg,
Christian Wirth,
Nadja Klein,
Sahin Albayrak
Abstract:
Object detectors in real-world applications often fail to detect objects due to varying factors such as weather conditions and noisy input. Therefore, a process that mitigates false detections is crucial for both safety and accuracy. While uncertainty-based thresholding shows promise, previous works demonstrate an imperfect correlation between uncertainty and detection errors. This hinders ideal t…
▽ More
Object detectors in real-world applications often fail to detect objects due to varying factors such as weather conditions and noisy input. Therefore, a process that mitigates false detections is crucial for both safety and accuracy. While uncertainty-based thresholding shows promise, previous works demonstrate an imperfect correlation between uncertainty and detection errors. This hinders ideal thresholding, prompting us to further investigate the correlation and associated cost with different types of uncertainty. We therefore propose a cost-sensitive framework for object detection tailored to user-defined budgets on the two types of errors, missing and false detections. We derive minimum thresholding requirements to prevent performance degradation and define metrics to assess the applicability of uncertainty for failure recognition. Furthermore, we automate and optimize the thresholding process to maximize the failure recognition rate w.r.t. the specified budget. Evaluation on three autonomous driving datasets demonstrates that our approach significantly enhances safety, particularly in challenging scenarios. Leveraging localization aleatoric uncertainty and softmax-based entropy only, our method boosts the failure recognition rate by 36-60\% compared to conventional approaches. Code is available at https://mos-ks.github.io/publications.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Sparse Explanations of Neural Networks Using Pruned Layer-Wise Relevance Propagation
Authors:
Paulo Yanez Sarmiento,
Simon Witzke,
Nadja Klein,
Bernhard Y. Renard
Abstract:
Explainability is a key component in many applications involving deep neural networks (DNNs). However, current explanation methods for DNNs commonly leave it to the human observer to distinguish relevant explanations from spurious noise. This is not feasible anymore when going from easily human-accessible data such as images to more complex data such as genome sequences. To facilitate the accessib…
▽ More
Explainability is a key component in many applications involving deep neural networks (DNNs). However, current explanation methods for DNNs commonly leave it to the human observer to distinguish relevant explanations from spurious noise. This is not feasible anymore when going from easily human-accessible data such as images to more complex data such as genome sequences. To facilitate the accessibility of DNN outputs from such complex data and to increase explainability, we present a modification of the widely used explanation method layer-wise relevance propagation. Our approach enforces sparsity directly by pruning the relevance propagation for the different layers. Thereby, we achieve sparser relevance attributions for the input features as well as for the intermediate layers. As the relevance propagation is input-specific, we aim to prune the relevance propagation rather than the underlying model architecture. This allows to prune different neurons for different inputs and hence, might be more appropriate to the local nature of explanation methods. To demonstrate the efficacy of our method, we evaluate it on two types of data, images and genomic sequences. We show that our modification indeed leads to noise reduction and concentrates relevance on the most important features compared to the baseline.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Optimizing Josephson Junction Reproducibility in 30 kV E-beam Lithography: Analysis of Backscattered Electron Distribution
Authors:
A. M. Rebello,
L. M. Ruela,
G. Moreto,
N. Y. Klein,
E. Martins,
I. S. Oliveira,
F. Rouxinol,
J. P. Sinnecker
Abstract:
This paper explores methods to enhance the reproducibility of Josephson junctions, crucial elements in superconducting quantum technologies, when employing the Dolan technique in 30 kV e-beam processes. The study explores the influence of dose distribution along the bridge area on reproducibility, addressing challenges related to fabrication sensitivity. Experimental methods include E-beam lithogr…
▽ More
This paper explores methods to enhance the reproducibility of Josephson junctions, crucial elements in superconducting quantum technologies, when employing the Dolan technique in 30 kV e-beam processes. The study explores the influence of dose distribution along the bridge area on reproducibility, addressing challenges related to fabrication sensitivity. Experimental methods include E-beam lithography, with electron trajectory simulations shedding light on backscattered electron behavior. We demonstrate the fabrication of different junction geometries, revealing that some geometries significantly improve reproducibility by resulting in a more homogeneous dose distribution over the junction area.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Informed Spectral Normalized Gaussian Processes for Trajectory Prediction
Authors:
Christian Schlauch,
Christian Wirth,
Nadja Klein
Abstract:
Prior parameter distributions provide an elegant way to represent prior expert and world knowledge for informed learning. Previous work has shown that using such informative priors to regularize probabilistic deep learning (DL) models increases their performance and data-efficiency. However, commonly used sampling-based approximations for probabilistic DL models can be computationally expensive, r…
▽ More
Prior parameter distributions provide an elegant way to represent prior expert and world knowledge for informed learning. Previous work has shown that using such informative priors to regularize probabilistic deep learning (DL) models increases their performance and data-efficiency. However, commonly used sampling-based approximations for probabilistic DL models can be computationally expensive, requiring multiple inference passes and longer training times. Promising alternatives are compute-efficient last layer kernel approximations like spectral normalized Gaussian processes (SNGPs). We propose a novel regularization-based continual learning method for SNGPs, which enables the use of informative priors that represent prior knowledge learned from previous tasks. Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion. We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge. On two public datasets, we investigate its performance under diminishing training data and across locations, and thereby demonstrate an increase in data-efficiency and robustness to location-transfers over non-informed and informed baselines.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Boosting Distributional Copula Regression for Bivariate Binary, Discrete and Mixed Responses
Authors:
Guillermo Briseño Sanchez,
Nadja Klein,
Hannah Klinkhammer,
Andreas Mayr
Abstract:
Motivated by challenges in the analysis of biomedical data and observational studies, we develop statistical boosting for the general class of bivariate distributional copula regression with arbitrary marginal distributions, which is suited to model binary, count, continuous or mixed outcomes. In our framework, the joint distribution of arbitrary, bivariate responses is modelled through a parametr…
▽ More
Motivated by challenges in the analysis of biomedical data and observational studies, we develop statistical boosting for the general class of bivariate distributional copula regression with arbitrary marginal distributions, which is suited to model binary, count, continuous or mixed outcomes. In our framework, the joint distribution of arbitrary, bivariate responses is modelled through a parametric copula. To arrive at a model for the entire conditional distribution, not only the marginal distribution parameters but also the copula parameters are related to covariates through additive predictors. We suggest efficient and scalable estimation by means of an adapted component-wise gradient boosting algorithm with statistical models as base-learners. A key benefit of boosting as opposed to classical likelihood or Bayesian estimation is the implicit data-driven variable selection mechanism as well as shrinkage without additional input or assumptions from the analyst. To the best of our knowledge, our implementation is the only one that combines a wide range of covariate effects, marginal distributions, copula functions, and implicit data-driven variable selection. We showcase the versatility of our approach on data from genetic epidemiology, healthcare utilization and childhood undernutrition. Our developments are implemented in the R package gamboostLSS, fostering transparent and reproducible research.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Regression Copulas for Multivariate Responses
Authors:
Nadja Klein,
Michael Stanley Smith,
David Nott,
Ryan Chisholm
Abstract:
We propose a novel distributional regression model for a multivariate response vector based on a copula process over the covariate space. It uses the implicit copula of a Gaussian multivariate regression, which we call a ``regression copula''. To allow for large covariate vectors their coefficients are regularized using a novel multivariate extension of the horseshoe prior. Bayesian inference and…
▽ More
We propose a novel distributional regression model for a multivariate response vector based on a copula process over the covariate space. It uses the implicit copula of a Gaussian multivariate regression, which we call a ``regression copula''. To allow for large covariate vectors their coefficients are regularized using a novel multivariate extension of the horseshoe prior. Bayesian inference and distributional predictions are evaluated using efficient variational inference methods, allowing application to large datasets. An advantage of the approach is that the marginal distributions of the response vector can be estimated separately and accurately, resulting in predictive distributions that are marginally-calibrated. Two substantive applications of the methodology highlight its efficacy in multivariate modeling. The first is the econometric modeling and prediction of half-hourly regional Australian electricity prices. Here, our approach produces more accurate distributional forecasts than leading benchmark methods. The second is the evaluation of multivariate posteriors in likelihood-free inference (LFI) of a model for tree species abundance data, extending a previous univariate regression copula LFI method. In both applications, we demonstrate that our new approach exhibits a desirable marginal calibration property.
△ Less
Submitted 5 March, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Boosting Causal Additive Models
Authors:
Maximilian Kertel,
Nadja Klein
Abstract:
We present a boosting-based method to learn additive Structural Equation Models (SEMs) from observational data, with a focus on the theoretical aspects of determining the causal order among variables. We introduce a family of score functions based on arbitrary regression techniques, for which we establish necessary conditions to consistently favor the true causal ordering. Our analysis reveals tha…
▽ More
We present a boosting-based method to learn additive Structural Equation Models (SEMs) from observational data, with a focus on the theoretical aspects of determining the causal order among variables. We introduce a family of score functions based on arbitrary regression techniques, for which we establish necessary conditions to consistently favor the true causal ordering. Our analysis reveals that boosting with early stopping meets these criteria and thus offers a consistent score function for causal orderings. To address the challenges posed by high-dimensional data sets, we adapt our approach through a component-wise gradient descent in the space of additive SEMs. Our simulation study underlines our theoretical results for lower dimensions and demonstrates that our high-dimensional adaptation is competitive with state-of-the-art methods. In addition, it exhibits robustness with respect to the choice of the hyperparameters making the procedure easy to tune.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Density regression via Dirichlet process mixtures of normal structured additive regression models
Authors:
María Xosé Rodríguez-Álvarez,
Vanda Inácio,
Nadja Klein
Abstract:
Within Bayesian nonparametrics, dependent Dirichlet process mixture models provide a highly flexible approach for conducting inference about the conditional density function. However, several formulations of this class make either rather restrictive modelling assumptions or involve intricate algorithms for posterior inference, thus preventing their widespread use. In response to these challenges,…
▽ More
Within Bayesian nonparametrics, dependent Dirichlet process mixture models provide a highly flexible approach for conducting inference about the conditional density function. However, several formulations of this class make either rather restrictive modelling assumptions or involve intricate algorithms for posterior inference, thus preventing their widespread use. In response to these challenges, we present a flexible, versatile, and computationally tractable model for density regression based on a single-weights dependent Dirichlet process mixture of normal distributions model for univariate continuous responses. We assume an additive structure for the mean of each mixture component and incorporate the effects of continuous covariates through smooth nonlinear functions. The key components of our modelling approach are penalised B-splines and their bivariate tensor product extension. Our proposed method also seamlessly accommodates parametric effects of categorical covariates, linear effects of continuous covariates, interactions between categorical and/or continuous covariates, varying coefficient terms, and random effects, which is why we refer our model as a Dirichlet process mixture of normal structured additive regression models. A noteworthy feature of our method is its efficiency in posterior simulation through Gibbs sampling, as closed-form full conditional distributions for all model parameters are available. Results from a simulation study demonstrate that our approach successfully recovers true conditional densities and other regression functionals in various challenging scenarios. Applications to a toxicology, disease diagnosis, and agricultural study are provided and further underpin the broad applicability of our modelling framework. An R package, DDPstar, implementing the proposed method is publicly available at https://bitbucket.org/mxrodriguez/ddpstar.
△ Less
Submitted 13 May, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Bayesian Effect Selection in Additive Models with an Application to Time-to-Event Data
Authors:
Paul Bach,
Nadja Klein
Abstract:
Accurately selecting and estimating smooth functional effects in additive models with potentially many functions is a challenging task. We introduce a novel Demmler-Reinsch basis expansion to model the functional effects that allows us to orthogonally decompose an effect into its linear and nonlinear parts. We show that our representation allows to consistently estimate both parts as opposed to co…
▽ More
Accurately selecting and estimating smooth functional effects in additive models with potentially many functions is a challenging task. We introduce a novel Demmler-Reinsch basis expansion to model the functional effects that allows us to orthogonally decompose an effect into its linear and nonlinear parts. We show that our representation allows to consistently estimate both parts as opposed to commonly employed mixed model representations. Equipping the reparameterized regression coefficients with normal beta prime spike and slab priors allows us to determine whether a continuous covariate has a linear, a nonlinear or no effect at all. We provide new theoretical results for the prior and a compelling explanation for its superior Markov chain Monte Carlo mixing performance compared to the spike-and-slab group lasso. We establish an efficient posterior estimation scheme and illustrate our approach along effect selection on the hazard rate of a time-to-event response in the geoadditive Cox regression model in simulations and data on survival with leukemia.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Modeling the Ratio of Correlated Biomarkers Using Copula Regression
Authors:
Moritz Berger,
Nadja Klein,
Michael Wagner,
Matthias Schmid
Abstract:
Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are based on oversimplified assumptions, assuming e.g.\@ independence or strictly positive associations…
▽ More
Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are based on oversimplified assumptions, assuming e.g.\@ independence or strictly positive associations between the components. In this paper, we close this gap in the literature and propose a regression model where the marginal distributions of the two components are linked by Frank copula. A key feature of our model is that it allows for both positive and negative correlations between the components, with one of the model parameters being directly interpretable in terms of Kendall's rank correlation coefficient. We study our method theoretically, evaluate finite sample properties in a simulation study and demonstrate its efficacy in an application to diagnosis of Alzheimer's disease via ratios of amyloid-beta and total tau protein biomarkers.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Ghost Value Augmentation for $k$-Edge-Connectivity
Authors:
D Ellis Hershkowitz,
Nathan Klein,
Rico Zenklusen
Abstract:
We give a poly-time algorithm for the $k$-edge-connected spanning subgraph ($k$-ECSS) problem that returns a solution of cost no greater than the cheapest $(k+10)$-ECSS on the same graph. Our approach enhances the iterative relaxation framework with a new ingredient, which we call ghost values, that allows for high sparsity in intermediate problems.
Our guarantees improve upon the best-known app…
▽ More
We give a poly-time algorithm for the $k$-edge-connected spanning subgraph ($k$-ECSS) problem that returns a solution of cost no greater than the cheapest $(k+10)$-ECSS on the same graph. Our approach enhances the iterative relaxation framework with a new ingredient, which we call ghost values, that allows for high sparsity in intermediate problems.
Our guarantees improve upon the best-known approximation factor of $2$ for $k$-ECSS whenever the optimal value of $(k+10)$-ECSS is close to that of $k$-ECSS. This is a property that holds for the closely related problem $k$-edge-connected spanning multi-subgraph ($k$-ECSM), which is identical to $k$-ECSS except edges can be selected multiple times at the same cost. As a consequence, we obtain a $\left(1+O\left(\frac{1}{k}\right)\right)$-approximation algorithm for $k$-ECSM, which resolves a conjecture of Pritchard and improves upon a recent $\left(1+O\left(\frac{1}{\sqrt{k}}\right)\right)$-approximation algorithm of Karlin, Klein, Oveis Gharan, and Zhang. Moreover, we present a matching lower bound for $k$-ECSM, showing that our approximation ratio is tight up to the constant factor in $O\left(\frac{1}{k}\right)$, unless $P=NP$.
△ Less
Submitted 24 April, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
From Trees to Polynomials and Back Again: New Capacity Bounds with Applications to TSP
Authors:
Leonid Gurvits,
Nathan Klein,
Jonathan Leake
Abstract:
We give simply exponential lower bounds on the probabilities of a given strongly Rayleigh distribution, depending only on its expectation. This resolves a weak version of a problem left open by Karlin-Klein-Oveis Gharan in their recent breakthrough work on metric TSP, and this resolution leads to a minor improvement of their approximation factor for metric TSP. Our results also allow for a more st…
▽ More
We give simply exponential lower bounds on the probabilities of a given strongly Rayleigh distribution, depending only on its expectation. This resolves a weak version of a problem left open by Karlin-Klein-Oveis Gharan in their recent breakthrough work on metric TSP, and this resolution leads to a minor improvement of their approximation factor for metric TSP. Our results also allow for a more streamlined analysis of the algorithm.
To achieve these new bounds, we build upon the work of Gurvits-Leake on the use of the productization technique for bounding the capacity of a real stable polynomial. This technique allows one to reduce certain inequalities for real stable polynomials to products of affine linear forms, which have an underlying matrix structure. In this paper, we push this technique further by characterizing the worst-case polynomials via bipartitioned forests. This rigid combinatorial structure yields a clean induction argument, which implies our stronger bounds.
In general, we believe the results of this paper will lead to further improvement and simplification of the analysis of various combinatorial and probabilistic bounds and algorithms.
△ Less
Submitted 9 May, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Scalable Estimation for Structured Additive Distributional Regression Through Variational Inference
Authors:
Jana Kleinemeier,
Nadja Klein
Abstract:
Structured additive distributional regression models offer a versatile framework for estimating complete conditional distributions by relating all parameters of a parametric distribution to covariates. Although these models efficiently leverage information in vast and intricate data sets, they often result in highly-parameterized models with many unknowns. Standard estimation methods, like Bayesia…
▽ More
Structured additive distributional regression models offer a versatile framework for estimating complete conditional distributions by relating all parameters of a parametric distribution to covariates. Although these models efficiently leverage information in vast and intricate data sets, they often result in highly-parameterized models with many unknowns. Standard estimation methods, like Bayesian approaches based on Markov chain Monte Carlo methods, face challenges in estimating these models due to their complexity and costliness. To overcome these issues, we suggest a fast and scalable alternative based on variational inference. Our approach combines a parsimonious parametric approximation for the posteriors of regression coefficients, with the exact conditional posterior for hyperparameters. For optimization, we use a stochastic gradient ascent method combined with an efficient strategy to reduce the variance of estimators. We provide theoretical properties and investigate global and local annealing to enhance robustness, particularly against data outliers. Our implementation is very general, allowing us to include various functional effects like penalized splines or complex tensor product interactions. In a simulation study, we demonstrate the efficacy of our approach in terms of accuracy and computation time. Lastly, we present two real examples illustrating the modeling of infectious COVID-19 outbreaks and outlier detection in brain activity.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Deep mixture of linear mixed models for complex longitudinal data
Authors:
Lucas Kock,
Nadja Klein,
David J. Nott
Abstract:
Mixtures of linear mixed models are widely used for modelling longitudinal data for which observation times differ between subjects. In typical applications, temporal trends are described using a basis expansion, with basis coefficients treated as random effects varying by subject. Additional random effects can describe variation between mixture components, or other known sources of variation in c…
▽ More
Mixtures of linear mixed models are widely used for modelling longitudinal data for which observation times differ between subjects. In typical applications, temporal trends are described using a basis expansion, with basis coefficients treated as random effects varying by subject. Additional random effects can describe variation between mixture components, or other known sources of variation in complex experimental designs. A key advantage of these models is that they provide a natural mechanism for clustering, which can be helpful for interpretation in many applications. Current versions of mixtures of linear mixed models are not specifically designed for the case where there are many observations per subject and a complex temporal trend, which requires a large number of basis functions to capture. In this case, the subject-specific basis coefficients are a high-dimensional random effects vector, for which the covariance matrix is hard to specify and estimate, especially if it varies between mixture components. To address this issue, we consider the use of recently-developed deep mixture of factor analyzers models as the prior for the random effects. The resulting deep mixture of linear mixed models is well-suited to high-dimensional settings, and we describe an efficient variational inference approach to posterior computation. The efficacy of the method is demonstrated on both real and simulated data.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
A Lower Bound for the Max Entropy Algorithm for TSP
Authors:
Billy Jin,
Nathan Klein,
David P. Williamson
Abstract:
One of the most famous conjectures in combinatorial optimization is the four-thirds conjecture, which states that the integrality gap of the subtour LP relaxation of the TSP is equal to $\frac43$. For 40 years, the best known upper bound was 1.5, due to Wolsey (1980). Recently, Karlin, Klein, and Oveis Gharan (2022) showed that the max entropy algorithm for the TSP gives an improved bound of…
▽ More
One of the most famous conjectures in combinatorial optimization is the four-thirds conjecture, which states that the integrality gap of the subtour LP relaxation of the TSP is equal to $\frac43$. For 40 years, the best known upper bound was 1.5, due to Wolsey (1980). Recently, Karlin, Klein, and Oveis Gharan (2022) showed that the max entropy algorithm for the TSP gives an improved bound of $1.5 - 10^{-36}$. In this paper, we show that the approximation ratio of the max entropy algorithm is at least 1.375, even for graphic TSP. Thus the max entropy algorithm does not appear to be the algorithm that will ultimately resolve the four-thirds conjecture in the affirmative, should that be possible.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
A Better-Than-1.6-Approximation for Prize-Collecting TSP
Authors:
Jannis Blauth,
Nathan Klein,
Martin Nägele
Abstract:
Prize-Collecting TSP is a variant of the traveling salesperson problem where one may drop vertices from the tour at the cost of vertex-dependent penalties. The quality of a solution is then measured by adding the length of the tour and the sum of all penalties of vertices that are not visited. We present a polynomial-time approximation algorithm with an approximation guarantee slightly below…
▽ More
Prize-Collecting TSP is a variant of the traveling salesperson problem where one may drop vertices from the tour at the cost of vertex-dependent penalties. The quality of a solution is then measured by adding the length of the tour and the sum of all penalties of vertices that are not visited. We present a polynomial-time approximation algorithm with an approximation guarantee slightly below $1.6$, where the guarantee is with respect to the natural linear programming relaxation of the problem. This improves upon the previous best-known approximation ratio of $1.774$. Our approach is based on a known decomposition for solutions of this linear relaxation into rooted trees. Our algorithm takes a tree from this decomposition and then performs a pruning step before doing parity correction on the remainder. Using a simple analysis, we bound the approximation guarantee of the proposed algorithm by $(1+\sqrt{5})/2 \approx 1.618$, the golden ratio. With some additional technical care we further improve it to $1.599$.
△ Less
Submitted 14 February, 2024; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Heat increases experienced racial segregation in the United States
Authors:
Till Baldenius,
Nicolas Koch,
Hannah Klauber,
Nadja Klein
Abstract:
Segregation on the basis of ethnic groups stands as a pervasive and persistent social challenge in many cities across the globe. Public spaces provide opportunities for diverse encounters but recent research suggests individuals adjust their time spent in such places to cope with extreme temperatures. We evaluate to what extent such adaptation affects racial segregation and thus shed light on a ye…
▽ More
Segregation on the basis of ethnic groups stands as a pervasive and persistent social challenge in many cities across the globe. Public spaces provide opportunities for diverse encounters but recent research suggests individuals adjust their time spent in such places to cope with extreme temperatures. We evaluate to what extent such adaptation affects racial segregation and thus shed light on a yet unexplored channel through which global warming might affect social welfare. We use large-scale foot traffic data for millions of places in 315 US cities between 2018 and 2020 to estimate an index of experienced isolation in daily visits between whites and other ethnic groups. We find that heat increases segregation. Results from panel regressions imply that a week with temperatures above 33°C in a city like Los Angeles induces an upward shift of visit isolation by 0.7 percentage points, which equals about 14% of the difference in the isolation index of Los Angeles to the more segregated city of Atlanta. The segregation-increasing effect is particularly strong for individuals living in lower-income areas and at places associated with leisure activities. Combining our estimates with climate model projections, we find that stringent mitigation policy can have significant co-benefits in terms of cushioning increases in racial segregation in the future.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Truly Multivariate Structured Additive Distributional Regression
Authors:
Lucas Kock,
Nadja Klein
Abstract:
Generalized additive models for location, scale and shape (GAMLSS) are a popular extension to mean regression models where each parameter of an arbitrary distribution is modelled through covariates. While such models have been developed for univariate and bivariate responses, the truly multivariate case remains extremely challenging for both computational and theoretical reasons. Alternative appro…
▽ More
Generalized additive models for location, scale and shape (GAMLSS) are a popular extension to mean regression models where each parameter of an arbitrary distribution is modelled through covariates. While such models have been developed for univariate and bivariate responses, the truly multivariate case remains extremely challenging for both computational and theoretical reasons. Alternative approaches to GAMLSS may allow for higher dimensional response vectors to be modelled jointly but often assume a fixed dependence structure not depending on covariates or are limited with respect to modelling flexibility or computational aspects. We contribute to this gap in the literature and propose a truly multivariate distributional model, which allows one to benefit from the flexibility of GAMLSS even when the response has dimension larger than two or three. Building on copula regression, we model the dependence structure of the response through a Gaussian copula, while the marginal distributions can vary across components. Our model is highly parameterized but estimation becomes feasible with Bayesian inference employing shrinkage priors. We demonstrate the competitiveness of our approach in a simulation study and illustrate how it complements existing models along the examples of childhood malnutrition and a yet unexplored data set on traffic detection in Berlin.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
The Deep Promotion Time Cure Model
Authors:
Victor Medina-Olivares,
Stefan Lessmann,
Nadja Klein
Abstract:
We propose a novel method for predicting time-to-event in the presence of cure fractions based on flexible survivals models integrated into a deep neural network framework. Our approach allows for non-linear relationships and high-dimensional interactions between covariates and survival and is suitable for large-scale applications. Furthermore, we allow the method to incorporate an identified pred…
▽ More
We propose a novel method for predicting time-to-event in the presence of cure fractions based on flexible survivals models integrated into a deep neural network framework. Our approach allows for non-linear relationships and high-dimensional interactions between covariates and survival and is suitable for large-scale applications. Furthermore, we allow the method to incorporate an identified predictor formed of an additive decomposition of interpretable linear and non-linear effects and add an orthogonalization layer to capture potential higher dimensional interactions. We demonstrate the usefulness and computational efficiency of our method via simulations and apply it to a large portfolio of US mortgage loans. Here, we find not only a better predictive performance of our framework but also a more realistic picture of covariate effects.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families
Authors:
Benedikt Lütke Schwienhorst,
Lucas Kock,
David J. Nott,
Nadja Klein
Abstract:
Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both th…
▽ More
Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. Training is performed using stochastic gradient descent with adaptive learning rate. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modelled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Holographic 6d co-dimension 2 defect solutions in M-theory
Authors:
Michael Gutperle,
Nicholas Klein,
Dikshant Rathore
Abstract:
We consider the uplift of co-dimension two defect solutions of seven dimensional gauged supergravity to eleven dimensions, previously found by two of the authors. The uplifted solutions are expressed as Lin-Lunin-Maldacena solutions and an infinite family of regular solutions describing holographic defects is found using the electrostatic formulation of LLM solutions.
We consider the uplift of co-dimension two defect solutions of seven dimensional gauged supergravity to eleven dimensions, previously found by two of the authors. The uplifted solutions are expressed as Lin-Lunin-Maldacena solutions and an infinite family of regular solutions describing holographic defects is found using the electrostatic formulation of LLM solutions.
△ Less
Submitted 19 November, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Semi-supervised Learning of Pushforwards For Domain Translation & Adaptation
Authors:
Nishant Panda,
Natalie Klein,
Dominic Yang,
Patrick Gasda,
Diane Oyen
Abstract:
Given two probability densities on related data spaces, we seek a map pushing one density to the other while satisfying application-dependent constraints. For maps to have utility in a broad application space (including domain translation, domain adaptation, and generative modeling), the map must be available to apply on out-of-sample data points and should correspond to a probabilistic model over…
▽ More
Given two probability densities on related data spaces, we seek a map pushing one density to the other while satisfying application-dependent constraints. For maps to have utility in a broad application space (including domain translation, domain adaptation, and generative modeling), the map must be available to apply on out-of-sample data points and should correspond to a probabilistic model over the two spaces. Unfortunately, existing approaches, which are primarily based on optimal transport, do not address these needs. In this paper, we introduce a novel pushforward map learning algorithm that utilizes normalizing flows to parameterize the map. We first re-formulate the classical optimal transport problem to be map-focused and propose a learning algorithm to select from all possible maps under the constraint that the map minimizes a probability distance and application-specific regularizers; thus, our method can be seen as solving a modified optimal transport problem. Once the map is learned, it can be used to map samples from a source domain to a target domain. In addition, because the map is parameterized as a composition of normalizing flows, it models the empirical distributions over the two data spaces and allows both sampling and likelihood evaluation for both data sets. We compare our method (parOT) to related optimal transport approaches in the context of domain adaptation and domain translation on benchmark data sets. Finally, to illustrate the impact of our work on applied problems, we apply parOT to a real scientific application: spectral calibration for high-dimensional measurements from two vastly different environments
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Thin trees for laminar families
Authors:
Nathan Klein,
Neil Olver
Abstract:
In the laminar-constrained spanning tree problem, the goal is to find a minimum-cost spanning tree which respects upper bounds on the number of times each cut in a given laminar family is crossed. This generalizes the well-studied degree-bounded spanning tree problem, as well as a previously studied setting where a chain of cuts is given. We give the first constant-factor approximation algorithm;…
▽ More
In the laminar-constrained spanning tree problem, the goal is to find a minimum-cost spanning tree which respects upper bounds on the number of times each cut in a given laminar family is crossed. This generalizes the well-studied degree-bounded spanning tree problem, as well as a previously studied setting where a chain of cuts is given. We give the first constant-factor approximation algorithm; in particular we show how to obtain a multiplicative violation of the crossing bounds of less than 22 while losing less than a factor of 5 in terms of cost.
Our result compares to the natural LP relaxation. As a consequence, our results show that given a $k$-edge-connected graph and a laminar family $\mathcal{L} \subseteq 2^V$ of cuts, there exists a spanning tree which contains only an $O(1/k)$ fraction of the edges across every cut in $\mathcal{L}$. This can be viewed as progress towards the Thin Tree Conjecture, which (in a strong form) states that this guarantee can be obtained for all cuts simultaneously.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Scalable Estimation for Structured Additive Distributional Regression
Authors:
Nikolaus Umlauf,
Johannes Seiler,
Mattias Wetscher,
Thorsten Simon,
Stefan Lang,
Nadja Klein
Abstract:
Recently, fitting probabilistic models have gained importance in many areas but estimation of such distributional models with very large data sets is a difficult task. In particular, the use of rather complex models can easily lead to memory-related efficiency problems that can make estimation infeasible even on high-performance computers. We therefore propose a novel backfitting algorithm, which…
▽ More
Recently, fitting probabilistic models have gained importance in many areas but estimation of such distributional models with very large data sets is a difficult task. In particular, the use of rather complex models can easily lead to memory-related efficiency problems that can make estimation infeasible even on high-performance computers. We therefore propose a novel backfitting algorithm, which is based on the ideas of stochastic gradient descent and can deal virtually with any amount of data on a conventional laptop. The algorithm performs automatic selection of variables and smoothing parameters, and its performance is in most cases superior or at least equivalent to other implementations for structured additive distributional regression, e.g., gradient boosting, while maintaining low computation time. Performance is evaluated using an extensive simulation study and an exceptionally challenging and unique example of lightning count prediction over Austria. A very large dataset with over 9 million observations and 80 covariates is used, so that a prediction model cannot be estimated with standard distributional regression methods but with our new approach.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Generative structured normalizing flow Gaussian processes applied to spectroscopic data
Authors:
Natalie Klein,
Nishant Panda,
Patrick Gasda,
Diane Oyen
Abstract:
In this work, we propose a novel generative model for mapping inputs to structured, high-dimensional outputs using structured conditional normalizing flows and Gaussian process regression. The model is motivated by the need to characterize uncertainty in the input/output relationship when making inferences on new data. In particular, in the physical sciences, limited training data may not adequate…
▽ More
In this work, we propose a novel generative model for mapping inputs to structured, high-dimensional outputs using structured conditional normalizing flows and Gaussian process regression. The model is motivated by the need to characterize uncertainty in the input/output relationship when making inferences on new data. In particular, in the physical sciences, limited training data may not adequately characterize future observed data; it is critical that models adequately indicate uncertainty, particularly when they may be asked to extrapolate. In our proposed model, structured conditional normalizing flows provide parsimonious latent representations that relate to the inputs through a Gaussian process, providing exact likelihood calculations and uncertainty that naturally increases away from the training data inputs. We demonstrate the methodology on laser-induced breakdown spectroscopy data from the ChemCam instrument onboard the Mars rover Curiosity. ChemCam was designed to recover the chemical composition of rock and soil samples by measuring the spectral properties of plasma atomic emissions induced by a laser pulse. We show that our model can generate realistic spectra conditional on a given chemical composition and that we can use the model to perform uncertainty quantification of chemical compositions for new observed spectra. Based on our results, we anticipate that our proposed modeling approach may be useful in other scientific domains with high-dimensional, complex structure where it is important to quantify predictive uncertainty.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
A (Slightly) Improved Deterministic Approximation Algorithm for Metric TSP
Authors:
Anna R. Karlin,
Nathan Klein,
Shayan Oveis Gharan
Abstract:
We show that the max entropy algorithm can be derandomized (with respect to a particular objective function) to give a deterministic $3/2-ε$ approximation algorithm for metric TSP for some $ε> 10^{-36}$.
To obtain our result, we apply the method of conditional expectation to an objective function constructed in prior work which was used to certify that the expected cost of the algorithm is at mo…
▽ More
We show that the max entropy algorithm can be derandomized (with respect to a particular objective function) to give a deterministic $3/2-ε$ approximation algorithm for metric TSP for some $ε> 10^{-36}$.
To obtain our result, we apply the method of conditional expectation to an objective function constructed in prior work which was used to certify that the expected cost of the algorithm is at most $3/2-ε$ times the cost of an optimal solution to the subtour elimination LP. The proof in this work involves showing that the expected value of this objective function can be computed in polynomial time (at all stages of the algorithm's execution).
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Accounting for Time Dependency in Meta-Analyses of Concordance Probability Estimates
Authors:
Matthias Schmid,
Tim Friede,
Nadja Klein,
Leonie Weinhold
Abstract:
Recent years have seen the development of many novel scoring tools for disease prognosis and prediction. To become accepted for use in clinical applications, these tools have to be validated on external data. In practice, validation is often hampered by logistical issues, resulting in multiple small-sized validation studies. It is therefore necessary to synthesize the results of these studies usin…
▽ More
Recent years have seen the development of many novel scoring tools for disease prognosis and prediction. To become accepted for use in clinical applications, these tools have to be validated on external data. In practice, validation is often hampered by logistical issues, resulting in multiple small-sized validation studies. It is therefore necessary to synthesize the results of these studies using techniques for meta-analysis. Here we consider strategies for meta-analyzing the concordance probability for time-to-event data ("C-index"), which has become a popular tool to evaluate the discriminatory power of prediction models with a right-censored outcome. We show that standard meta-analysis of the C-index may lead to biased results, as the magnitude of the concordance probability depends on the length of the time interval used for evaluation (defined e.g. by the follow-up time, which might differ considerably between studies). To address this issue, we propose a set of methods for random-effects meta-regression that incorporate time directly as covariate in the model equation. In addition to analyzing nonlinear time trends via fractional polynomial, spline, and exponential decay models, we provide recommendations on suitable transformations of the C-index before meta-regression. Our results suggest that the C-index is best meta-analyzed using fractional polynomial meta-regression with logit-transformed C-index values. Classical random-effects meta-analysis (not considering time as covariate) is demonstrated to be a suitable alternative when follow-up times are small. Our findings have implications for the reporting of C-index values in future studies, which should include information on the length of the time interval underlying the calculations.
△ Less
Submitted 3 December, 2022;
originally announced December 2022.
-
Anisotropic multidimensional smoothing using Bayesian tensor product P-splines
Authors:
Paul Bach,
Nadja Klein
Abstract:
We introduce a highly efficient fully Bayesian approach for anisotropic multidimensional smoothing. The main challenge in this context is the Markov chain Monte Carlo update of the smoothing parameters as their full conditional posterior comprises a pseudo-determinant that appears to be intractable at first sight. As a consequence, most existing implementations are computationally feasible only fo…
▽ More
We introduce a highly efficient fully Bayesian approach for anisotropic multidimensional smoothing. The main challenge in this context is the Markov chain Monte Carlo update of the smoothing parameters as their full conditional posterior comprises a pseudo-determinant that appears to be intractable at first sight. As a consequence, most existing implementations are computationally feasible only for the estimation of two-dimensional tensor product smooths, which is, however, too restrictive for many applications. In this paper, we break this barrier and derive closed-form expressions for the log-pseudo-determinant and its first and second order partial derivatives. These expressions are valid for arbitrary dimension and very efficient to evaluate, which allows us to set up an efficient MCMC sampler with adaptive Metropolis-Hastings updates for the smoothing parameters. We investigate different priors for the smoothing parameters and discuss the efficient derivation of lower-dimensional effects such as one-dimensional main effects and two-dimensional interactions. We show that the suggested approach outperforms previous suggestions in the literature in terms of accuracy, scalability and computational cost and demonstrate its applicability by consideration of an illustrating temperature data example from spatio-temporal statistics.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
A 4/3-Approximation Algorithm for Half-Integral Cycle Cut Instances of the TSP
Authors:
Billy Jin,
Nathan Klein,
David P. Williamson
Abstract:
A long-standing conjecture for the traveling salesman problem (TSP) states that the integrality gap of the standard linear programming relaxation of the TSP is at most 4/3. Despite significant efforts, the conjecture remains open.
We consider the half-integral case, in which the LP has solution values in $\{0, 1/2, 1\}$. Such instances have been conjectured to be the most difficult instances for…
▽ More
A long-standing conjecture for the traveling salesman problem (TSP) states that the integrality gap of the standard linear programming relaxation of the TSP is at most 4/3. Despite significant efforts, the conjecture remains open.
We consider the half-integral case, in which the LP has solution values in $\{0, 1/2, 1\}$. Such instances have been conjectured to be the most difficult instances for the overall four-thirds conjecture. Karlin, Klein, and Oveis Gharan, in a breakthrough result, were able to show that in the half-integral case, the integrality gap is at most 1.49993. This result led to the first significant progress on the overall conjecture in decades; the same authors showed the integrality gap is at most $1.5- 10^{-36}$ in the non-half-integral case. For the half-integral case, the current best-known ratio is 1.4983, a result by Gupta et al.
With the improvements on the 3/2 bound remaining very incremental even in the half-integral case, we turn the question around and look for a large class of half-integral instances for which we can prove that the 4/3 conjecture is correct.
The previous works on the half-integral case perform induction on a hierarchy of critical tight sets in the support graph of the LP solution, in which some of the sets correspond to "cycle cuts" and the others to "degree cuts". We show that if all the sets in the hierarchy correspond to cycle cuts, then we can find a distribution of tours whose expected cost is at most 4/3 times the value of the half-integral LP solution; sampling from the distribution gives us a randomized 4/3-approximation algorithm. We note that the known bad cases for the integrality gap have a gap of 4/3 and have a half-integral LP solution in which all the critical tight sets in the hierarchy are cycle cuts; thus our result is tight.
△ Less
Submitted 8 July, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Informed Priors for Knowledge Integration in Trajectory Prediction
Authors:
Christian Schlauch,
Nadja Klein,
Christian Wirth
Abstract:
Informed machine learning methods allow the integration of prior knowledge into learning systems. This can increase accuracy and robustness or reduce data needs. However, existing methods often assume hard constraining knowledge, that does not require to trade-off prior knowledge with observations, but can be used to directly reduce the problem space. Other approaches use specific, architectural c…
▽ More
Informed machine learning methods allow the integration of prior knowledge into learning systems. This can increase accuracy and robustness or reduce data needs. However, existing methods often assume hard constraining knowledge, that does not require to trade-off prior knowledge with observations, but can be used to directly reduce the problem space. Other approaches use specific, architectural changes as representation of prior knowledge, limiting applicability. We propose an informed machine learning method, based on continual learning. This allows the integration of arbitrary, prior knowledge, potentially from multiple sources, and does not require specific architectures. Furthermore, our approach enables probabilistic and multi-modal predictions, that can improve predictive accuracy and robustness. We exemplify our approach by applying it to a state-of-the-art trajectory predictor for autonomous driving. This domain is especially dependent on informed learning approaches, as it is subject to an overwhelming large variety of possible environments and very rare events, while requiring robust and accurate predictions. We evaluate our model on a commonly used benchmark dataset, only using data already available in a conventional setup. We show that our method outperforms both non-informed and informed learning methods, that are often used in the literature. Furthermore, we are able to compete with a conventional baseline, even using half as many observation examples.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Denoising neural networks for magnetic resonance spectroscopy
Authors:
Natalie Klein,
Amber J. Day,
Harris Mason,
Michael W. Malone,
Sinead A. Williamson
Abstract:
In many scientific applications, measured time series are corrupted by noise or distortions. Traditional denoising techniques often fail to recover the signal of interest, particularly when the signal-to-noise ratio is low or when certain assumptions on the signal and noise are violated. In this work, we demonstrate that deep learning-based denoising methods can outperform traditional techniques w…
▽ More
In many scientific applications, measured time series are corrupted by noise or distortions. Traditional denoising techniques often fail to recover the signal of interest, particularly when the signal-to-noise ratio is low or when certain assumptions on the signal and noise are violated. In this work, we demonstrate that deep learning-based denoising methods can outperform traditional techniques while exhibiting greater robustness to variation in noise and signal characteristics. Our motivating example is magnetic resonance spectroscopy, in which a primary goal is to detect the presence of short-duration, low-amplitude radio frequency signals that are often obscured by strong interference that can be difficult to separate from the signal using traditional methods. We explore various deep learning architecture choices to capture the inherently complex-valued nature of magnetic resonance signals. On both synthetic and experimental data, we show that our deep learning-based approaches can exceed performance of traditional techniques, providing a powerful new class of methods for analysis of scientific time series data.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Distributional Adaptive Soft Regression Trees
Authors:
Nikolaus Umlauf,
Nadja Klein
Abstract:
Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of hyperparameters. They are built via aggregation of multiple regression trees during training and are usually calculated recursively using hard splitting rules. Recently…
▽ More
Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of hyperparameters. They are built via aggregation of multiple regression trees during training and are usually calculated recursively using hard splitting rules. Recently regression forests have been incorporated into the framework of distributional regression, a nowadays popular regression approach aiming at estimating complete conditional distributions rather than relating the mean of an output variable to input features only - as done classically. This article proposes a new type of a distributional regression tree using a multivariate soft split rule. One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree while the complexity of the function is controlled adaptive by information criteria. Moreover, the search for the optimal split variable is obsolete. We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods, especially in the presence of complex non-linear feature interactions. Finally, we illustrate the usefulness of our approach with an example on probabilistic forecasts for the Sun's activity.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Boosting Multivariate Structured Additive Distributional Regression Models
Authors:
Annika Strömer,
Nadja Klein,
Christian Staerk,
Hannah Klinkhammer,
Andreas Mayr
Abstract:
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dime…
▽ More
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modelling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Actor Heterogeneity and Explained Variance in Network Models -- A Scalable Approach through Variational Approximations
Authors:
Nadja Klein,
Göran Kauermann
Abstract:
The analysis of network data has gained considerable interest in recent years. This also includes the analysis of large, high-dimensional networks with hundreds and thousands of nodes. While exponential random graph models serve as workhorse for network data analyses, their applicability to very large networks is problematic via classical inference such as maximum likelihood or exact Bayesian esti…
▽ More
The analysis of network data has gained considerable interest in recent years. This also includes the analysis of large, high-dimensional networks with hundreds and thousands of nodes. While exponential random graph models serve as workhorse for network data analyses, their applicability to very large networks is problematic via classical inference such as maximum likelihood or exact Bayesian estimation owing to scaling and instability issues. The latter trace from the fact that classical network statistics consider nodes as exchangeable, i.e., actors in the network are assumed to be homogeneous. This is often questionable. One way to circumvent the restrictive assumption is to include actor-specific random effects, which account for unobservable heterogeneity. However, this increases the number of unknowns considerably, thus making the model highly-parameterized. As a solution even for very large networks, we propose a scalable approach based on variational approximations, which not only leads to numerically stable estimation but is also applicable to high-dimensional directed as well as undirected networks. We furthermore demonstrate that including node-specific covariates can reduce node heterogeneity, which we facilitate through versatile prior formulations and a new measure that we call posterior explained variance. We illustrate our approach in three diverse examples, covering network data from the Italian Parliament, international arms trading, and Facebook; and conduct detailed simulation studies.
△ Less
Submitted 12 September, 2023; v1 submitted 29 April, 2022;
originally announced April 2022.
-
A note on co-dimension 2 defects in N=4,d=7 gauged supergravity
Authors:
Michael Gutperle,
Nicholas Klein
Abstract:
In this note we present a solution of $N=4,d=7$ gauged supergravity which is holographically dual to a co-dimension two defect living in a six dimensional SCFT. The solution is obtained by double analytic continuation of a two charge supersymmetric black hole solution. The condition that no conical deficits are present in the bulk and on the boundary is satisfied by a one parameter family of solut…
▽ More
In this note we present a solution of $N=4,d=7$ gauged supergravity which is holographically dual to a co-dimension two defect living in a six dimensional SCFT. The solution is obtained by double analytic continuation of a two charge supersymmetric black hole solution. The condition that no conical deficits are present in the bulk and on the boundary is satisfied by a one parameter family of solutions for which some holographic observables are computed.
△ Less
Submitted 30 April, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Boosting Distributional Copula Regression
Authors:
Nicolai Hans,
Nadja Klein,
Florian Faschingbauer,
Michael Schneider,
Andreas Mayr
Abstract:
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup…
▽ More
Capturing complex dependence structures between outcome variables (e.g., study endpoints) is of high relevance in contemporary biomedical data problems and medical research. Distributional copula regression provides a flexible tool to model the joint distribution of multiple outcome variables by disentangling the marginal response distributions and their dependence structure. In a regression setup each parameter of the copula model, i.e. the marginal distribution parameters and the copula dependence parameters, can be related to covariates via structured additive predictors. We propose a framework to fit distributional copula regression models via a model-based boosting algorithm. Model-based boosting is a modern estimation technique that incorporates useful features like an intrinsic variable selection mechanism, parameter shrinkage and the capability to fit regression models in high dimensional data setting, i.e. situations with more covariates than observations. Thus, model-based boosting does not only complement existing Bayesian and maximum-likelihood based estimation frameworks for this model class but rather enables unique intrinsic mechanisms that can be helpful in many applied problems. The performance of our boosting algorithm in the context of copula regression models with continuous margins is evaluated in simulation studies that cover low- and high-dimensional data settings and situations with and without dependence between the responses. Moreover, distributional copula boosting is used to jointly analyze and predict the length and the weight of newborns conditional on sonographic measurements of the fetus before delivery together with other clinical variables.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Deselection of Base-Learners for Statistical Boosting -- with an Application to Distributional Regression
Authors:
Annika Strömer,
Christian Staerk,
Nadja Klein,
Leonie Weinhold,
Stephanie Titze,
Andreas Mayr
Abstract:
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include to…
▽ More
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data (p<n), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance. We analyze the impact of the new approach on variable selection and prediction performance in comparison to alternative methods including boosting with earlier stopping as well as twin boosting. We illustrate our approach with data of an ongoing cohort study for chronic kidney disease patients, where the most influential predictors for the health-related quality of life measure are selected in a distributional regression approach based on beta regression.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Matroid Partition Property and the Secretary Problem
Authors:
Dorna Abdolazimi,
Anna R. Karlin,
Nathan Klein,
Shayan Oveis Gharan
Abstract:
A matroid $\mathcal{M}$ on a set $E$ of elements has the $α$-partition property, for some $α>0$, if it is possible to (randomly) construct a partition matroid $\mathcal{P}$ on (a subset of) elements of $\mathcal{M}$ such that every independent set of $\mathcal{P}$ is independent in $\mathcal{M}$ and for any weight function $w:E\to\mathbb{R}_{\geq 0}$, the expected value of the optimum of the matro…
▽ More
A matroid $\mathcal{M}$ on a set $E$ of elements has the $α$-partition property, for some $α>0$, if it is possible to (randomly) construct a partition matroid $\mathcal{P}$ on (a subset of) elements of $\mathcal{M}$ such that every independent set of $\mathcal{P}$ is independent in $\mathcal{M}$ and for any weight function $w:E\to\mathbb{R}_{\geq 0}$, the expected value of the optimum of the matroid secretary problem on $\mathcal{P}$ is at least an $α$-fraction of the optimum on $\mathcal{M}$. We show that the complete binary matroid, ${\cal B}_d$ on $\mathbb{F}_2^d$ does not satisfy the $α$-partition property for any constant $α>0$ (independent of $d$).
Furthermore, we refute a recent conjecture of Bérczi, Schwarcz, and Yamaguchi by showing the same matroid is $2^d/d$-colorable but cannot be reduced to an $α2^d/d$-colorable partition matroid for any $α$ that is sublinear in $d$.
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
Marginally calibrated response distributions for end-to-end learning in autonomous driving
Authors:
Clara Hoffmann,
Nadja Klein
Abstract:
End-to-end learners for autonomous driving are deep neural networks that predict the instantaneous steering angle directly from images of the ahead-lying street. These learners must provide reliable uncertainty estimates for their predictions in order to meet safety requirements and initiate a switch to manual control in areas of high uncertainty. Yet end-to-end learners typically only deliver poi…
▽ More
End-to-end learners for autonomous driving are deep neural networks that predict the instantaneous steering angle directly from images of the ahead-lying street. These learners must provide reliable uncertainty estimates for their predictions in order to meet safety requirements and initiate a switch to manual control in areas of high uncertainty. Yet end-to-end learners typically only deliver point predictions, since distributional predictions are associated with large increases in training time or additional computational resources during prediction. To address this shortcoming we investigate efficient and scalable approximate inference for the implicit copula neural linear model of Klein, Nott and Smith (2021) in order to quantify uncertainty for the predictions of end-to-end learners. The result are densities for the steering angle that are marginally calibrated, i.e.~the average of the estimated densities equals the empirical distribution of steering angles. To ensure the scalability to large $n$ regimes, we develop efficient estimation based on variational inference as a fast alternative to computationally intensive, exact inference via Hamiltonian Monte Carlo. We demonstrate the accuracy and speed of the variational approach in comparison to Hamiltonian Monte Carlo on two end-to-end learners trained for highway driving using the comma2k19 data set. The implicit copula neural linear model delivers accurate calibration, high-quality prediction intervals and allows to identify overconfident learners. Our approach also contributes to the explainability of black-box end-to-end learners, since predictive densities can be used to understand which steering actions the end-to-end learner sees as valid.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Posterior Concentration Rates for Bayesian Penalized Splines
Authors:
Paul Bach,
Nadja Klein
Abstract:
Despite their widespread use in practice, the asymptotic properties of Bayesian penalized splines have not been investigated so far. We close this gap and study posterior concentration rates for Bayesian penalized splines in a Gaussian nonparametric regression model. A key feature of the approach is the hyperprior on the smoothing variance, which allows for adaptive smoothing in practice but compl…
▽ More
Despite their widespread use in practice, the asymptotic properties of Bayesian penalized splines have not been investigated so far. We close this gap and study posterior concentration rates for Bayesian penalized splines in a Gaussian nonparametric regression model. A key feature of the approach is the hyperprior on the smoothing variance, which allows for adaptive smoothing in practice but complicates the theoretical analysis considerably as it destroys conjugacy and precludes analytic expressions for the posterior moments. To derive our theoretical results, we rely on several new concepts including a carefully defined proper version of the partially improper penalized splines prior as well as an innovative spline estimator that projects the observations onto the first basis functions of a Demmler-Reinsch basis. Our results show that posterior concentration at near optimal rate can be achieved if the hyperprior on the smoothing variance strikes a fine balance between oversmoothing and undersmoothing, which can for instance be met by a Weibull hyperprior with shape parameter 1/2. We complement our theoretical results with empirical evidence demonstrating the adaptivity of the hyperprior in practice.
△ Less
Submitted 23 March, 2022; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Electrodeposited WS$_2$ Monolayers on Fabricated Graphene Electrodes
Authors:
Yasir J Noori,
Shibin Thomas,
Sami Ramadan,
Victoria K. Greenacre,
Nema M. Abdelazim,
Yisong Han,
Jiapei Zhang,
Richard Beanland,
Andrew L. Hector,
Norbert Klein,
Gillian Reid,
Philip Bartlett,
Cornelis H. de Groot
Abstract:
The development of scalable techniques to make 2D material heterostructures is a major obstacle that needs to be overcome before these materials can be implemented in device technologies industrially. Electrodeposition is an industrially compatible deposition technique that offers unique advantages in scaling 2D heterostructures. In this work, we demonstrate the electrodeposition of atomic layers…
▽ More
The development of scalable techniques to make 2D material heterostructures is a major obstacle that needs to be overcome before these materials can be implemented in device technologies industrially. Electrodeposition is an industrially compatible deposition technique that offers unique advantages in scaling 2D heterostructures. In this work, we demonstrate the electrodeposition of atomic layers of WS$_2$ over graphene electrodes using a single source precursor. Using conventional microfabrication techniques, graphene was patterned to create micro-electrodes where WS$_2$ was site-selectively deposited to form 2D heterostructures. We used various characterisation techniques, including atomic force microscopy, transmission electron microscopy, Raman spectroscopy and x-ray photoelectron spectroscopy to show that our electrodeposited WS$_2$ layers are highly uniform and can be grown over graphene at a controllable deposition rate. This technique to selectively deposit TMDCs over microfabricated graphene electrodes paves the way towards wafer-scale production of 2D material heterostructures for nanodevice applications.
△ Less
Submitted 31 August, 2021;
originally announced September 2021.
-
Neural density estimation and uncertainty quantification for laser induced breakdown spectroscopy spectra
Authors:
Katiana Kontolati,
Natalie Klein,
Nishant Panda,
Diane Oyen
Abstract:
Constructing probability densities for inference in high-dimensional spectral data is often intractable. In this work, we use normalizing flows on structured spectral latent spaces to estimate such densities, enabling downstream inference tasks. In addition, we evaluate a method for uncertainty quantification when predicting unobserved state vectors associated with each spectrum. We demonstrate th…
▽ More
Constructing probability densities for inference in high-dimensional spectral data is often intractable. In this work, we use normalizing flows on structured spectral latent spaces to estimate such densities, enabling downstream inference tasks. In addition, we evaluate a method for uncertainty quantification when predicting unobserved state vectors associated with each spectrum. We demonstrate the capability of this approach on laser-induced breakdown spectroscopy data collected by the ChemCam instrument on the Mars rover Curiosity. Using our approach, we are able to generate realistic spectral samples and to accurately predict state vectors with associated well-calibrated uncertainties. We anticipate that this methodology will enable efficient probabilistic modeling of spectral data, leading to potential advances in several areas, including out-of-distribution detection and sensitivity analysis.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
A multivariate Gaussian random field prior against spatial confounding
Authors:
Isa Marques,
Thomas Kneib,
Nadja Klein
Abstract:
Spatial models are used in a variety research areas, such as environmental sciences, epidemiology, or physics. A common phenomenon in many spatial regression models is spatial confounding. This phenomenon takes place when spatially indexed covariates modeling the mean of the response are correlated with the spatial random effect. As a result, estimates for regression coefficients of the covariates…
▽ More
Spatial models are used in a variety research areas, such as environmental sciences, epidemiology, or physics. A common phenomenon in many spatial regression models is spatial confounding. This phenomenon takes place when spatially indexed covariates modeling the mean of the response are correlated with the spatial random effect. As a result, estimates for regression coefficients of the covariates can be severely biased and interpretation of these is no longer valid. Recent literature has shown that typical solutions for reducing spatial confounding can lead to misleading and counterintuitive results. In this paper, we develop a computationally efficient spatial model in a Bayesian framework integrating novel prior structure that reduces spatial confounding. Starting from the univariate case, we extend our prior structure to case of multiple spatially confounded covariates. In a simulation study, we show that our novel model flexibly detects and reduces spatial confounding in spatial datasets, and it performs better than typically used methods such as restricted spatial regression. These results are promising for any applied researcher who wishes to interpret covariate effects in spatial regression models. As a real data illustration, we study the effect of elevation and temperature on the mean of daily precipitation in Germany.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
A Quasipolynomial $(2+\varepsilon)$-Approximation for Planar Sparsest Cut
Authors:
Vincent Cohen-Addad,
Anupam Gupta,
Philip N. Klein,
Jason Li
Abstract:
The (non-uniform) sparsest cut problem is the following graph-partitioning problem: given a "supply" graph, and demands on pairs of vertices, delete some subset of supply edges to minimize the ratio of the supply edges cut to the total demand of the pairs separated by this deletion. Despite much effort, there are only a handful of nontrivial classes of supply graphs for which constant-factor appro…
▽ More
The (non-uniform) sparsest cut problem is the following graph-partitioning problem: given a "supply" graph, and demands on pairs of vertices, delete some subset of supply edges to minimize the ratio of the supply edges cut to the total demand of the pairs separated by this deletion. Despite much effort, there are only a handful of nontrivial classes of supply graphs for which constant-factor approximations are known.
We consider the problem for planar graphs, and give a $(2+\varepsilon)$-approximation algorithm that runs in quasipolynomial time. Our approach defines a new structural decomposition of an optimal solution using a
"patching" primitive. We combine this decomposition with a Sherali-Adams-style linear programming relaxation of the problem, which we then round. This should be compared with the polynomial-time approximation algorithm of Rao (1999), which uses the metric linear programming relaxation and $\ell_1$-embeddings, and achieves an $O(\sqrt{\log n})$-approximation in polynomial time.
△ Less
Submitted 31 May, 2021;
originally announced May 2021.
-
Bayesian Effect Selection for Additive Quantile Regression with an Analysis to Air Pollution Thresholds
Authors:
Nadja Klein,
Jorge Mateu
Abstract:
Statistical techniques used in air pollution modelling usually lack the possibility to understand which predictors affect air pollution in which functional form; and are not able to regress on exceedances over certain thresholds imposed by authorities directly. The latter naturally induce conditional quantiles and reflect the seriousness of particular events. In the present paper we focus on this…
▽ More
Statistical techniques used in air pollution modelling usually lack the possibility to understand which predictors affect air pollution in which functional form; and are not able to regress on exceedances over certain thresholds imposed by authorities directly. The latter naturally induce conditional quantiles and reflect the seriousness of particular events. In the present paper we focus on this important aspect by developing quantile regression models further. We propose a general Bayesian effect selection approach for additive quantile regression within a highly interpretable framework. We place separate normal beta prime spike and slab priors on the scalar importance parameters of effect parts and implement a fast Gibbs sampling scheme. Specifically, it enables to study quantile-specific covariate effects, allows these covariates to be of general functional form using additive predictors, and facilitates the analysts' decision whether an effect should be included linearly, non-linearly or not at all in the quantiles of interest. In a detailed analysis on air pollution data in Madrid (Spain) we find the added value of modelling extreme nitrogen dioxide (NO2) concentrations and how thresholds are driven differently by several climatological variables and traffic as a spatial proxy. Our results underpin the need of enhanced statistical models to support short-term decisions and enable local authorities to mitigate or even prevent exceedances of NO2 concentration limits.
△ Less
Submitted 23 May, 2021;
originally announced May 2021.