Search | arXiv e-print repository

Protecting Onion Service Users Against Phishing

Authors: Benjamin Güldenring, Volker Roth

Abstract: Phishing websites are a common phenomenon among Tor onion services, and phishers exploit that it is tremendously difficult to distinguish phishing from authentic onion domain names. Operators of onion services devised several strategies to protect their users against phishing. But as we show in this work, none protect users against phishing without producing traces about visited services - somethi… ▽ More Phishing websites are a common phenomenon among Tor onion services, and phishers exploit that it is tremendously difficult to distinguish phishing from authentic onion domain names. Operators of onion services devised several strategies to protect their users against phishing. But as we show in this work, none protect users against phishing without producing traces about visited services - something that particularly vulnerable users might want to avoid. In search of a solution we review prior research addressing this problem, and find that only two known approaches, hash visualization and PAKE, are capable of solving this problem. Hash visualization requires users to recognize large hash values. In order to make hash visualization more practical we design a novel mechanism called recognizer, which substantially reduces the amount of information that users must recognize. We analyze the security and privacy properties of our system formally, and report on our prototype implementation as a browser extension for the Tor web browser. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 17 pages, 3 figures

ACM Class: D.4.6; K.6.5; E.3; E.4

arXiv:2406.09116 [pdf, other]

Injective Flows for parametric hypersurfaces

Authors: Marcello Massimo Negri, Jonathan Aellen, Volker Roth

Abstract: Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for para… ▽ More Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for parametric hypersurfaces and show that for such manifolds we can compute the Jacobian determinant exactly and efficiently, with the same cost as NFs. Furthermore, we show that for the subclass of star-like manifolds we can extend the proposed framework to always allow for a Cartesian representation of the density. We showcase the relevance of modeling densities on hypersurfaces in two settings. Firstly, we introduce a novel Objective Bayesian approach to penalized likelihood models by interpreting level-sets of the penalty as star-like manifolds. Secondly, we consider Bayesian mixture models and introduce a general method for variational inference by defining the posterior of mixture weights on the probability simplex. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2308.09571 [pdf]

doi 10.1016/j.jocs.2024.102355

Physics-Informed Boundary Integral Networks (PIBI-Nets): A Data-Driven Approach for Solving Partial Differential Equations

Authors: Monika Nagy-Huber, Volker Roth

Abstract: Partial differential equations (PDEs) are widely used to describe relevant phenomena in dynamical systems. In real-world applications, we commonly need to combine formal PDE models with (potentially noisy) observations. This is especially relevant in settings where we lack information about boundary or initial conditions, or where we need to identify unknown model parameters. In recent years, Phys… ▽ More Partial differential equations (PDEs) are widely used to describe relevant phenomena in dynamical systems. In real-world applications, we commonly need to combine formal PDE models with (potentially noisy) observations. This is especially relevant in settings where we lack information about boundary or initial conditions, or where we need to identify unknown model parameters. In recent years, Physics-Informed Neural Networks (PINNs) have become a popular tool for this kind of problems. In high-dimensional settings, however, PINNs often suffer from computational problems because they usually require dense collocation points over the entire computational domain. To address this problem, we present Physics-Informed Boundary Integral Networks (PIBI-Nets) as a data-driven approach for solving PDEs in one dimension less than the original problem space. PIBI-Nets only require points at the computational domain boundary, while still achieving highly accurate results. Moreover, PIBI-Nets clearly outperform PINNs in several practical settings. Exploiting elementary properties of fundamental solutions of linear differential operators, we present a principled and simple way to handle point sources in inverse problems. We demonstrate the excellent performance of PIBI- Nets for the Laplace and Poisson equations, both on artificial datasets and within a real-world application concerning the reconstruction of groundwater flows. △ Less

Submitted 3 July, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Report number: volume 81, special issue "Machine Learning and Data Assimilation for Dynamical Systems II"

Journal ref: Journal of Computational Science, Elsevier, 2024

arXiv:2306.07255 [pdf, other]

Conditional Matrix Flows for Gaussian Graphical Models

Authors: Marcello Massimo Negri, F. Arend Torres, Volker Roth

Abstract: Studying conditional independence among many variables with few observations is a challenging task. Gaussian Graphical Models (GGMs) tackle this problem by encouraging sparsity in the precision matrix through $l_q$ regularization with $q\leq1$. However, most GMMs rely on the $l_1$ norm because the objective is highly non-convex for sub-$l_1$ pseudo-norms. In the frequentist formulation, the $l_1$… ▽ More Studying conditional independence among many variables with few observations is a challenging task. Gaussian Graphical Models (GGMs) tackle this problem by encouraging sparsity in the precision matrix through $l_q$ regularization with $q\leq1$. However, most GMMs rely on the $l_1$ norm because the objective is highly non-convex for sub-$l_1$ pseudo-norms. In the frequentist formulation, the $l_1$ norm relaxation provides the solution path as a function of the shrinkage parameter $λ$. In the Bayesian formulation, sparsity is instead encouraged through a Laplace prior, but posterior inference for different $λ$ requires repeated runs of expensive Gibbs samplers. Here we propose a general framework for variational inference with matrix-variate Normalizing Flow in GGMs, which unifies the benefits of frequentist and Bayesian frameworks. As a key improvement on previous work, we train with one flow a continuum of sparse regression models jointly for all regularization parameters $λ$ and all $l_q$ norms, including non-convex sub-$l_1$ pseudo-norms. Within one model we thus have access to (i) the evolution of the posterior for any $λ$ and any $l_q$ (pseudo-) norm, (ii) the marginal log-likelihood for model selection, and (iii) the frequentist solution paths through simulated annealing in the MAP limit. △ Less

Submitted 16 November, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: NeurIPS23 version

arXiv:2305.16846 [pdf, other]

Lagrangian Flow Networks for Conservation Laws

Authors: F. Arend Torres, Marcello Massimo Negri, Marco Inversi, Jonathan Aellen, Volker Roth

Abstract: We introduce Lagrangian Flow Networks (LFlows) for modeling fluid densities and velocities continuously in space and time. By construction, the proposed LFlows satisfy the continuity equation, a PDE describing mass conservation in its differentiable form. Our model is based on the insight that solutions to the continuity equation can be expressed as time-dependent density transformations via diffe… ▽ More We introduce Lagrangian Flow Networks (LFlows) for modeling fluid densities and velocities continuously in space and time. By construction, the proposed LFlows satisfy the continuity equation, a PDE describing mass conservation in its differentiable form. Our model is based on the insight that solutions to the continuity equation can be expressed as time-dependent density transformations via differentiable and invertible maps. This follows from classical theory of the existence and uniqueness of Lagrangian flows for smooth vector fields. Hence, we model fluid densities by transforming a base density with parameterized diffeomorphisms conditioned on time. The key benefit compared to methods relying on numerical ODE solvers or PINNs is that the analytic expression of the velocity is always consistent with changes in density. Furthermore, we require neither expensive numerical solvers, nor additional penalties to enforce the PDE. LFlows show higher predictive accuracy in density modeling tasks compared to competing models in 2D and 3D, while being computationally efficient. As a real-world application, we model bird migration based on sparse weather radar measurements. △ Less

Submitted 13 December, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2206.01545 [pdf, other]

Mesh-free Eulerian Physics-Informed Neural Networks

Authors: Fabricio Arend Torres, Marcello Massimo Negri, Monika Nagy-Huber, Maxim Samarin, Volker Roth

Abstract: Physics-informed Neural Networks (PINNs) have recently emerged as a principled way to include prior physical knowledge in form of partial differential equations (PDEs) into neural networks. Although PINNs are generally viewed as mesh-free, current approaches still rely on collocation points within a bounded region, even in settings with spatially sparse signals. Furthermore, if the boundaries are… ▽ More Physics-informed Neural Networks (PINNs) have recently emerged as a principled way to include prior physical knowledge in form of partial differential equations (PDEs) into neural networks. Although PINNs are generally viewed as mesh-free, current approaches still rely on collocation points within a bounded region, even in settings with spatially sparse signals. Furthermore, if the boundaries are not known, the selection of such a region is difficult and often results in a large proportion of collocation points being selected in areas of low relevance. To resolve this severe drawback of current methods, we present a mesh-free and adaptive approach termed particle-density PINN (pdPINN), which is inspired by the microscopic viewpoint of fluid dynamics. The method is based on the Eulerian formulation and, different from classical mesh-free method, does not require the introduction of Lagrangian updates. We propose to sample directly from the distribution over the particle positions, eliminating the need to introduce boundaries while adaptively focusing on the most relevant regions. This is achieved by interpreting a non-negative physical quantity (such as the density or temperature) as an unnormalized probability distribution from which we sample with dynamic Monte Carlo methods. The proposed method leads to higher sample efficiency and improved performance of PINNs. These advantages are demonstrated on various experiments based on the continuity equations, Fokker-Planck equations, and the heat equation. △ Less

Submitted 1 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Preprint

arXiv:2204.07009 [pdf, other]

Learning Invariances with Generalised Input-Convex Neural Networks

Authors: Vitali Nesterov, Fabricio Arend Torres, Monika Nagy-Huber, Maxim Samarin, Volker Roth

Abstract: Considering smooth mappings from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such mappings. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as… ▽ More Considering smooth mappings from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such mappings. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as building grid-based approximations, sampling points along the level curves, or finding trajectories on the manifold. However, global parameterisations can only exist if the level sets are connected. For this purpose, we introduce a novel and flexible class of neural networks that generalise input-convex networks. These networks represent functions that are guaranteed to have connected level sets forming smooth manifolds on the input space. We further show that global parameterisations of these level sets can be always found efficiently. Lastly, we demonstrate that our novel technique for characterising invariances is a powerful generative data exploration tool in real-world applications, such as computational chemistry. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2111.13185 [pdf, other]

doi 10.1007/978-3-030-92659-5_24

Learning Conditional Invariance through Cycle Consistency

Authors: Maxim Samarin, Vitali Nesterov, Mario Wieser, Aleksander Wieczorek, Sonali Parbhoo, Volker Roth

Abstract: Identifying meaningful and independent factors of variation in a dataset is a challenging learning task frequently addressed by means of deep latent variable models. This task can be viewed as learning symmetry transformations preserving the value of a chosen property along latent dimensions. However, existing approaches exhibit severe drawbacks in enforcing the invariance property in the latent s… ▽ More Identifying meaningful and independent factors of variation in a dataset is a challenging learning task frequently addressed by means of deep latent variable models. This task can be viewed as learning symmetry transformations preserving the value of a chosen property along latent dimensions. However, existing approaches exhibit severe drawbacks in enforcing the invariance property in the latent space. We address these shortcomings with a novel approach to cycle consistency. Our method involves two separate latent subspaces for the target property and the remaining input information, respectively. In order to enforce invariance as well as sparsity in the latent space, we incorporate semantic knowledge by using cycle consistency constraints relying on property side information. The proposed method is based on the deep information bottleneck and, in contrast to other approaches, allows using continuous target properties and provides inherent model selection capabilities. We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models with improved invariance properties. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: 16 pages, 3 figures, published at the DAGM German Conference on Pattern Recognition, Sep. 28 - Oct. 1, 2021

arXiv:2010.06477 [pdf, other]

3DMolNet: A Generative Network for Molecular Structures

Authors: Vitali Nesterov, Mario Wieser, Volker Roth

Abstract: With the recent advances in machine learning for quantum chemistry, it is now possible to predict the chemical properties of compounds and to generate novel molecules. Existing generative models mostly use a string- or graph-based representation, but the precise three-dimensional coordinates of the atoms are usually not encoded. First attempts in this direction have been proposed, where autoregres… ▽ More With the recent advances in machine learning for quantum chemistry, it is now possible to predict the chemical properties of compounds and to generate novel molecules. Existing generative models mostly use a string- or graph-based representation, but the precise three-dimensional coordinates of the atoms are usually not encoded. First attempts in this direction have been proposed, where autoregressive or GAN-based models generate atom coordinates. Those either lack a latent space in the autoregressive setting, such that a smooth exploration of the compound space is not possible, or cannot generalize to varying chemical compositions. We propose a new approach to efficiently generate molecular structures that are not restricted to a fixed size or composition. Our model is based on the variational autoencoder which learns a translation-, rotation-, and permutation-invariant low-dimensional representation of molecules. Our experiments yield a mean reconstruction error below 0.05 Angstrom, outperforming the current state-of-the-art methods by a factor of four, and which is even lower than the spatial quantization error of most chemical descriptors. The compositional and structural validity of newly generated molecules has been confirmed by quantum chemical methods in a set of experiments. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:2006.13645 [pdf, other]

On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures

Authors: Maxim Samarin, Volker Roth, David Belius

Abstract: The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of wi… ▽ More The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models, has been confirmed empirically for certain wide architectures. It remains an open question how well NTK theory models standard neural network architectures of widths common in practice, trained on complex datasets such as ImageNet. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet, and find that their behavior deviates significantly from their finite-width NTK counterparts. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 10 pages

arXiv:2002.02782 [pdf, other]

Inverse Learning of Symmetries

Authors: Mario Wieser, Sonali Parbhoo, Aleksander Wieczorek, Volker Roth

Abstract: Symmetry transformations induce invariances which are frequently described with deep latent variable models. In many complex domains, such as the chemical space, invariances can be observed, yet the corresponding symmetry transformation cannot be formulated analytically. We propose to learn the symmetry transformation with a model consisting of two latent subspaces, where the first subspace captur… ▽ More Symmetry transformations induce invariances which are frequently described with deep latent variable models. In many complex domains, such as the chemical space, invariances can be observed, yet the corresponding symmetry transformation cannot be formulated analytically. We propose to learn the symmetry transformation with a model consisting of two latent subspaces, where the first subspace captures the target and the second subspace the remaining invariant information. Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser. Unlike previous methods, we focus on the challenging task of minimising mutual information in continuous domains. To this end, we base the calculation of mutual information on correlation matrices in combination with a bijective variable transformation. Extensive experiments demonstrate that our model outperforms state-of-the-art methods on artificial and molecular datasets. △ Less

Submitted 22 October, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: Accepted for publication at NeurIPS 2020

arXiv:2002.00815 [pdf, other]

Learning Extremal Representations with Deep Archetypal Analysis

Authors: Sebastian Mathias Keller, Maxim Samarin, Fabricio Arend Torres, Mario Wieser, Volker Roth

Abstract: Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature sp… ▽ More Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature space. Learning an appropriate feature space and identifying suitable archetypes simultaneously addresses this problem. This paper introduces a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a variational autoencoder, and an optimal representation with respect to the unknown archetypes can be learned end-to-end. The reformulation of linear Archetypal Analysis as deep variational information bottleneck, allows the incorporation of arbitrarily complex side information during training. Furthermore, an alternative prior, based on a modified Dirichlet distribution, is proposed. The real-world applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. In this experiment, it is demonstrated that exchanging the side information but keeping the same set of molecules, e. g. using as side information the heat capacity of each molecule instead of the band gap energy, will result in the identification of different archetypes. As an application, these learned representations of chemical space might reveal distinct starting points for de novo molecular design. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: Under review for publication at the International Journal of Computer Vision (IJCV). Extended version of our GCPR2019 paper "Deep Archetypal Analysis"

arXiv:1912.13480 [pdf, other]

doi 10.3390/e22020131

On the Difference Between the Information Bottleneck and the Deep Information Bottleneck

Authors: Aleksander Wieczorek, Volker Roth

Abstract: Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data $X$, $Y$ and their… ▽ More Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data $X$, $Y$ and their latent representation $T$ take the form of two Markov chains $T-X-Y$ and $X-T-Y$. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions $P(X,Y,T)$. We therefore show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement $T-X-Y$ is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model. △ Less

Submitted 31 December, 2019; originally announced December 2019.

arXiv:1908.05254 [pdf, other]

Optimizing for Interpretability in Deep Neural Networks with Tree Regularization

Authors: Mike Wu, Sonali Parbhoo, Michael C. Hughes, Volker Roth, Finale Doshi-Velez

Abstract: Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity -- for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interp… ▽ More Deep models have advanced prediction in many domains, but their lack of interpretability remains a key barrier to the adoption in many real world applications. There exists a large body of work aiming to help humans understand these black box functions to varying levels of granularity -- for example, through distillation, gradients, or adversarial examples. These methods however, all tackle interpretability as a separate process after training. In this work, we take a different approach and explicitly regularize deep models so that they are well-approximated by processes that humans can step-through in little time. Specifically, we train several families of deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. The resulting axis-aligned decision functions uniquely make tree regularized models easy for humans to interpret. Moreover, for situations in which a single, global tree is a poor estimator, we introduce a regional tree regularizer that encourages the deep model to resemble a compact, axis-aligned decision tree in predefined, human-interpretable contexts. Using intuitive toy examples as well as medical tasks for patients in critical care and with HIV, we demonstrate that this new family of tree regularizers yield models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1908.04494, arXiv:1711.06178

arXiv:1908.04494 [pdf, other]

Regional Tree Regularization for Interpretability in Black Box Models

Authors: Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez

Abstract: The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regio… ▽ More The lack of interpretability remains a barrier to the adoption of deep neural networks. Recently, tree regularization has been proposed to encourage deep neural networks to resemble compact, axis-aligned decision trees without significant compromises in accuracy. However, it may be unreasonable to expect that a single tree can predict well across all possible inputs. In this work, we propose regional tree regularization, which encourages a deep model to be well-approximated by several separate decision trees specific to predefined regions of the input space. Practitioners can define regions based on domain knowledge of contexts where different decision-making logic is needed. Across many datasets, our approach delivers more accurate predictions than simply training separate decision trees for each region, while producing simpler explanations than other neural net regularization schemes without sacrificing predictive power. Two healthcare case studies in critical care and HIV demonstrate how experts can improve understanding of deep models via our approach. △ Less

Submitted 16 March, 2020; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: AAAI 2020 (Oral)

arXiv:1901.10799 [pdf, other]

Deep Archetypal Analysis

Authors: Sebastian Mathias Keller, Maxim Samarin, Mario Wieser, Volker Roth

Abstract: "Deep Archetypal Analysis" generates latent representations of high-dimensional datasets in terms of fractions of intuitively understandable basic entities called archetypes. The proposed method is an extension of linear "Archetypal Analysis" (AA), an unsupervised method to represent multivariate data points as sparse convex combinations of extremal elements of the dataset. Unlike the original for… ▽ More "Deep Archetypal Analysis" generates latent representations of high-dimensional datasets in terms of fractions of intuitively understandable basic entities called archetypes. The proposed method is an extension of linear "Archetypal Analysis" (AA), an unsupervised method to represent multivariate data points as sparse convex combinations of extremal elements of the dataset. Unlike the original formulation of AA, "Deep AA" can also handle side information and provides the ability for data-driven representation learning which reduces the dependence on expert knowledge. Our method is motivated by studies of evolutionary trade-offs in biology where archetypes are species highly adapted to a single task. Along these lines, we demonstrate that "Deep AA" also lends itself to the supervised exploration of chemical space, marking a distinct starting point for de novo molecular design. In the unsupervised setting we show how "Deep AA" is used on CelebA to identify archetypal faces. These can then be superimposed in order to generate new faces which inherit dominant traits of the archetypes they are based on. △ Less

Submitted 24 January, 2020; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: Published at the German Conference on Pattern Recognition 2019 (GCPR)

Journal ref: 41th German Conference on Pattern Recognition, GCPR 2019

arXiv:1812.06594 [pdf, other]

Computational EEG in Personalized Medicine: A study in Parkinson's Disease

Authors: Sebastian Mathias Keller, Maxim Samarin, Antonia Meyer, Vitalii Kosak, Ute Gschwandtner, Peter Fuhr, Volker Roth

Abstract: Recordings of electrical brain activity carry information about a person's cognitive health. For recording EEG signals, a very common setting is for a subject to be at rest with its eyes closed. Analysis of these recordings often involve a dimensionality reduction step in which electrodes are grouped into 10 or more regions (depending on the number of electrodes available). Then an average over ea… ▽ More Recordings of electrical brain activity carry information about a person's cognitive health. For recording EEG signals, a very common setting is for a subject to be at rest with its eyes closed. Analysis of these recordings often involve a dimensionality reduction step in which electrodes are grouped into 10 or more regions (depending on the number of electrodes available). Then an average over each group is taken which serves as a feature in subsequent evaluation. Currently, the most prominent features used in clinical practice are based on spectral power densities. In our work we consider a simplified grouping of electrodes into two regions only. In addition to spectral features we introduce a secondary, non-redundant view on brain activity through the lens of Tsallis Entropy $S_{q=2}$. We further take EEG measurements not only in an eyes closed (ec) but also in an eyes open (eo) state. For our cohort of healthy controls (HC) and individuals suffering from Parkinson's disease (PD), the question we are asking is the following: How well can one discriminate between HC and PD within this simplified, binary grouping? This question is motivated by the commercial availability of inexpensive and easy to use portable EEG devices. If enough information is retained in this binary grouping, then such simple devices could potentially be used as personal monitoring tools, as standard screening tools by general practitioners or as digital biomarkers for easy long term monitoring during neurological studies. △ Less

Submitted 2 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:811.07216

arXiv:1811.10347 [pdf, other]

Estimating Causal Effects With Partial Covariates For Clinical Interpretability

Authors: Sonali Parbhoo, Mario Wieser, Volker Roth

Abstract: Estimating the causal effects of an intervention in the presence of confounding is a frequently occurring problem in applications such as medicine. The task is challenging since there may be multiple confounding factors, some of which may be missing, and inferences must be made from high-dimensional, noisy measurements. In this paper, we propose a decision-theoretic approach to estimate the causal… ▽ More Estimating the causal effects of an intervention in the presence of confounding is a frequently occurring problem in applications such as medicine. The task is challenging since there may be multiple confounding factors, some of which may be missing, and inferences must be made from high-dimensional, noisy measurements. In this paper, we propose a decision-theoretic approach to estimate the causal effects of interventions where a subset of the covariates is unavailable for some patients during testing. Our approach uses the information bottleneck principle to perform a discrete, low-dimensional sufficient reduction of the covariate data to estimate a distribution over confounders. In doing so, we can estimate the causal effect of an intervention where only partial covariate information is available. Our results on a causal inference benchmark and a real application for treating sepsis show that our method achieves state-of-the-art performance, without sacrificing interpretability. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

arXiv:1811.07969 [pdf, other]

Informed MCMC with Bayesian Neural Networks for Facial Image Analysis

Authors: Adam Kortylewski, Mario Wieser, Andreas Morel-Forster, Aleksander Wieczorek, Sonali Parbhoo, Volker Roth, Thomas Vetter

Abstract: Computer vision tasks are difficult because of the large variability in the data that is induced by changes in light, background, partial occlusion as well as the varying pose, texture, and shape of objects. Generative approaches to computer vision allow us to overcome this difficulty by explicitly modeling the physical image formation process. Using generative object models, the analysis of an ob… ▽ More Computer vision tasks are difficult because of the large variability in the data that is induced by changes in light, background, partial occlusion as well as the varying pose, texture, and shape of objects. Generative approaches to computer vision allow us to overcome this difficulty by explicitly modeling the physical image formation process. Using generative object models, the analysis of an observed image is performed via Bayesian inference of the posterior distribution. This conceptually simple approach tends to fail in practice because of several difficulties stemming from sampling the posterior distribution: high-dimensionality and multi-modality of the posterior distribution as well as expensive simulation of the rendering process. The main difficulty of sampling approaches in a computer vision context is choosing the proposal distribution accurately so that maxima of the posterior are explored early and the algorithm quickly converges to a valid image interpretation. In this work, we propose to use a Bayesian Neural Network for estimating an image dependent proposal distribution. Compared to a standard Gaussian random walk proposal, this accelerates the sampler in finding regions of the posterior with high value. In this way, we can significantly reduce the number of samples needed to perform facial image analysis. △ Less

Submitted 29 November, 2018; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 2018

arXiv:1807.02326 [pdf, other]

Cause-Effect Deep Information Bottleneck For Systematically Missing Covariates

Authors: Sonali Parbhoo, Mario Wieser, Aleksander Wieczorek, Volker Roth

Abstract: Estimating the causal effects of an intervention from high-dimensional observational data is difficult due to the presence of confounding. The task is often complicated by the fact that we may have a systematic missingness in our data at test time. Our approach uses the information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of informat… ▽ More Estimating the causal effects of an intervention from high-dimensional observational data is difficult due to the presence of confounding. The task is often complicated by the fact that we may have a systematic missingness in our data at test time. Our approach uses the information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of information. Based on the sufficiently reduced covariate, we transfer the relevant information to cases where data is missing at test time, allowing us to reliably and accurately estimate the effects of an intervention, even where data is incomplete. Our results on causal inference benchmarks and a real application for treating sepsis show that our method achieves state-of-the art performance, without sacrificing interpretability. △ Less

Submitted 28 February, 2020; v1 submitted 6 July, 2018; originally announced July 2018.

arXiv:1804.06216 [pdf, other]

Learning Sparse Latent Representations with the Deep Copula Information Bottleneck

Authors: Aleksander Wieczorek, Mario Wieser, Damian Murezzan, Volker Roth

Abstract: Deep latent variable models are powerful tools for representation learning. In this paper, we adopt the deep information bottleneck model, identify its shortcomings and propose a model that circumvents them. To this end, we apply a copula transformation which, by restoring the invariance properties of the information bottleneck method, leads to disentanglement of the features in the latent space.… ▽ More Deep latent variable models are powerful tools for representation learning. In this paper, we adopt the deep information bottleneck model, identify its shortcomings and propose a model that circumvents them. To this end, we apply a copula transformation which, by restoring the invariance properties of the information bottleneck method, leads to disentanglement of the features in the latent space. Building on that, we show how this transformation translates to sparsity of the latent space in the new model. We evaluate our method on artificial and real data. △ Less

Submitted 19 April, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

Comments: Published as a conference paper at ICLR 2018. Aleksander Wieczorek and Mario Wieser contributed equally to this work

Journal ref: Conference track - ICLR 2018

arXiv:1801.00933 [pdf, other]

New Directions for Trust in the Certificate Authority Ecosystem

Authors: Jan-Ole Malchow, Benjamin Güldenring, Volker Roth

Abstract: Many of the benefits we derive from the Internet require trust in the authenticity of HTTPS connections. Unfortunately, the public key certification ecosystem that underwrites this trust has failed us on numerous occasions. Towards an exploration of the root causes we present an update to the common knowledge about the Certificate Authority (CA) ecosystem. Based on our findings the certificate eco… ▽ More Many of the benefits we derive from the Internet require trust in the authenticity of HTTPS connections. Unfortunately, the public key certification ecosystem that underwrites this trust has failed us on numerous occasions. Towards an exploration of the root causes we present an update to the common knowledge about the Certificate Authority (CA) ecosystem. Based on our findings the certificate ecosystem currently undergoes a drastic transformation. Big steps towards ubiquitous encryption were made, however, on the expense of trust for authentication of communication partners. Furthermore we describe systemic problems rooted in misaligned incentives between players in the ecosystem. We depict that proposed security extensions do not correctly realign these incentives. As such we argue that it is worth considering alternative methods of authentication. As a first step in this direction we propose an insurance-based mechanism and we demonstrate that it is technically feasible. △ Less

Submitted 3 January, 2018; originally announced January 2018.

arXiv:1711.06178 [pdf, other]

Beyond Sparsity: Tree Regularization of Deep Models for Interpretability

Authors: Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez

Abstract: The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with fe… ▽ More The lack of interpretability remains a key barrier to the adoption of deep models in many applications. In this work, we explicitly regularize deep models so human users might step through the process behind their predictions in little time. Specifically, we train deep time-series models so their class-probability predictions have high accuracy while being closely modeled by decision trees with few nodes. Using intuitive toy examples as well as medical tasks for treating sepsis and HIV, we demonstrate that this new tree regularization yields models that are easier for humans to simulate than simpler L1 or L2 penalties without sacrificing predictive power. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: To appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material

arXiv:1701.06171 [pdf, other]

Greedy Structure Learning of Hierarchical Compositional Models

Authors: Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, Thomas Vetter

Abstract: In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter. Existing approaches to this problem are limited by making strong a-priori assumptions about the object's geometric structure and require segmented training data for learning. In this paper, we propose a… ▽ More In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter. Existing approaches to this problem are limited by making strong a-priori assumptions about the object's geometric structure and require segmented training data for learning. In this paper, we propose a novel framework for learning hierarchical compositional models (HCMs) which do not suffer from the mentioned limitations. We present a generalized formulation of HCMs and describe a greedy structure learning framework that consists of two phases: Bottom-up part learning and top-down model composition. Our framework integrates the foreground-background segmentation problem into the structure learning task via a background model. As a result, we can jointly optimize for the number of layers in the hierarchy, the number of parts per layer and a foreground-background segmentation based on class labels only. We show that the learned HCMs are semantically meaningful and achieve competitive results when compared to other generative object models at object classification on a standard transfer learning dataset. △ Less

Submitted 14 April, 2019; v1 submitted 22 January, 2017; originally announced January 2017.

Comments: CVPR 2019

arXiv:1605.05856 [pdf, other]

doi 10.1145/2987443.2987457

Towards Better Internet Citizenship: Reducing the Footprint of Internet-wide Scans by Topology Aware Prefix Selection

Authors: Johannes Klick, Stephan Lau, Matthias Wählisch, Volker Roth

Abstract: Internet service discovery is an emerging topic to study the deployment of protocols. Towards this end, our community periodically scans the entire advertised IPv4 address space. In this paper, we question this principle. Being good Internet citizens means that we should limit scan traffic to what is necessary. We conducted a study of scan data, which shows that several prefixes do not accommodate… ▽ More Internet service discovery is an emerging topic to study the deployment of protocols. Towards this end, our community periodically scans the entire advertised IPv4 address space. In this paper, we question this principle. Being good Internet citizens means that we should limit scan traffic to what is necessary. We conducted a study of scan data, which shows that several prefixes do not accommodate any host of interest and the network topology is fairly stable. We argue that this allows us to collect representative data by scanning less. In our paper, we explore the idea to scan all prefixes once and then identify prefixes of interest for future scanning. Based on our analysis of the censys.io data set (4.1 TB data encompassing 28 full IPv4 scans within 6 months) we found that we can reduce scan traffic between 25-90% and miss only 1-10% of the hosts, depending on desired trade-offs and protocols. △ Less

Submitted 14 September, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: 7 pages, 6 figures, 1 table. Published in Proc. of ACM IMC, 2016

ACM Class: C.2.5; C.2.1; C.2.3

Journal ref: Proceedings of ACM Internet Measurement Conference (IMC) 2016

arXiv:1510.01485 [pdf, other]

Bayesian Markov Blanket Estimation

Authors: Dinu Kaufmann, Sonali Parbhoo, Aleksander Wieczorek, Sebastian Keller, David Adametz, Volker Roth

Abstract: This paper considers a Bayesian view for estimating a sub-network in a Markov random field. The sub-network corresponds to the Markov blanket of a set of query variables, where the set of potential neighbours here is big. We factorize the posterior such that the Markov blanket is conditionally independent of the network of the potential neighbours. By exploiting this blockwise decoupling, we deriv… ▽ More This paper considers a Bayesian view for estimating a sub-network in a Markov random field. The sub-network corresponds to the Markov blanket of a set of query variables, where the set of potential neighbours here is big. We factorize the posterior such that the Markov blanket is conditionally independent of the network of the potential neighbours. By exploiting this blockwise decoupling, we derive analytic expressions for posterior conditionals. Subsequently, we develop an inference scheme which makes use of the factorization. As a result, estimation of a sub-network is possible without inferring an entire network. Since the resulting Gibbs sampler scales linearly with the number of variables, it can handle relatively large neighbourhoods. The proposed scheme results in faster convergence and superior mixing of the Markov chain than existing Bayesian network estimation techniques. △ Less

Submitted 6 October, 2015; originally announced October 2015.

Comments: 16 pages, 5 figures

arXiv:1504.03701 [pdf, other]

Probabilistic Clustering of Time-Evolving Distance Data

Authors: Julia E. Vogt, Marius Kloft, Stefan Stark, Sudhir S. Raman, Sandhya Prabhakaran, Volker Roth, Gunnar Rätsch

Abstract: We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identi… ▽ More We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time. △ Less

Submitted 14 April, 2015; originally announced April 2015.

arXiv:1301.6263 [pdf, other]

A Secure Submission System for Online Whistleblowing Platforms

Authors: Volker Roth, Benjamin Güldenring, Eleanor Rieffel, Sven Dietrich, Lars Ries

Abstract: Whistleblower laws protect individuals who inform the public or an authority about governmental or corporate misconduct. Despite these laws, whistleblowers frequently risk reprisals and sites such as WikiLeaks emerged to provide a level of anonymity to these individuals. However, as countries increase their level of network surveillance and Internet protocol data retention, the mere act of using a… ▽ More Whistleblower laws protect individuals who inform the public or an authority about governmental or corporate misconduct. Despite these laws, whistleblowers frequently risk reprisals and sites such as WikiLeaks emerged to provide a level of anonymity to these individuals. However, as countries increase their level of network surveillance and Internet protocol data retention, the mere act of using anonymizing software such as Tor, or accessing a whistleblowing website through an SSL channel might be incriminating enough to lead to investigations and repercussions. As an alternative submission system we propose an online advertising network called AdLeaks. AdLeaks leverages the ubiquity of unsolicited online advertising to provide complete sender unobservability when submitting disclosures. AdLeaks ads compute a random function in a browser and submit the outcome to the AdLeaks infrastructure. Such a whistleblower's browser replaces the output with encrypted information so that the transmission is indistinguishable from that of a regular browser. Its back-end design assures that AdLeaks must process only a fraction of the resulting traffic in order to receive disclosures with high probability. We describe the design of AdLeaks and evaluate its performance through analysis and experimentation. △ Less

Submitted 26 January, 2013; originally announced January 2013.

Comments: An abridged version has been accepted for publication in the proceedings of Financial Cryptography and Data Security 2013

arXiv:1206.6433 [pdf]

Copula Mixture Model for Dependency-seeking Clustering

Authors: Melanie Rey, Volker Roth

Abstract: We introduce a copula mixture model to perform dependency-seeking clustering when co-occurring samples from different data sources are available. The model takes advantage of the great flexibility offered by the copulas framework to extend mixtures of Canonical Correlation Analysis to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesia… ▽ More We introduce a copula mixture model to perform dependency-seeking clustering when co-occurring samples from different data sources are available. The model takes advantage of the great flexibility offered by the copulas framework to extend mixtures of Canonical Correlation Analysis to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesian mixture, while providing efficient MCMC inference. Experiments on synthetic and real data demonstrate that the increased flexibility of the copula mixture significantly improves the clustering and the interpretability of the results. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1206.4632 [pdf]

A Complete Analysis of the l_1,p Group-Lasso

Authors: Julia Vogt, Volker Roth

Abstract: The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified activ… ▽ More The Group-Lasso is a well-known tool for joint regularization in machine learning methods. While the l_{1,2} and the l_{1,\infty} version have been studied in detail and efficient algorithms exist, there are still open questions regarding other l_{1,p} variants. We characterize conditions for solutions of the l_{1,p} Group-Lasso for all p-norms with 1 <= p <= \infty, and we present a unified active set algorithm. For all p-norms, a highly efficient projected gradient algorithm is presented. This new algorithm enables us to compare the prediction performance of many variants of the Group-Lasso in a multi-task learning setting, where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We conduct large-scale experiments on synthetic data and on two real-world data sets. In accordance with theoretical characterizations of the different norms we observe that the weak-coupling norms with p between 1.5 and 2 consistently outperform the strong-coupling norms with p >> 2. △ Less

Submitted 18 June, 2012; originally announced June 2012.

Comments: ICML2012

Showing 1–30 of 30 results for author: Roth, V