Search | arXiv e-print repository

Explaining time series models using frequency masking

Authors: Thea Brüsch, Kristoffer K. Wickstrøm, Mikkel N. Schmidt, Tommy S. Alstrøm, Robert Jenssen

Abstract: Time series data is fundamentally important for describing many critical domains such as healthcare, finance, and climate, where explainable models are necessary for safe automated decision-making. To develop eXplainable AI (XAI) in these domains therefore implies explaining salient information in the time series. Current methods for obtaining saliency maps assumes localized information in the raw… ▽ More Time series data is fundamentally important for describing many critical domains such as healthcare, finance, and climate, where explainable models are necessary for safe automated decision-making. To develop eXplainable AI (XAI) in these domains therefore implies explaining salient information in the time series. Current methods for obtaining saliency maps assumes localized information in the raw input space. In this paper, we argue that the salient information of a number of time series is more likely to be localized in the frequency domain. We propose FreqRISE, which uses masking based methods to produce explanations in the frequency and time-frequency domain, which shows the best performance across a number of tasks. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Submitted to the Next Generation of AI Safety workshop at ICML 2024

arXiv:2312.04174 [pdf, other]

Coherent energy and force uncertainty in deep learning force fields

Authors: Peter Bjørn Jørgensen, Jonas Busk, Ole Winther, Mikkel N. Schmidt

Abstract: In machine learning energy potentials for atomic systems, forces are commonly obtained as the negative derivative of the energy function with respect to atomic positions. To quantify aleatoric uncertainty in the predicted energies, a widely used modeling approach involves predicting both a mean and variance for each energy value. However, this model is not differentiable under the usual white nois… ▽ More In machine learning energy potentials for atomic systems, forces are commonly obtained as the negative derivative of the energy function with respect to atomic positions. To quantify aleatoric uncertainty in the predicted energies, a widely used modeling approach involves predicting both a mean and variance for each energy value. However, this model is not differentiable under the usual white noise assumption, so energy uncertainty does not naturally translate to force uncertainty. In this work we propose a machine learning potential energy model in which energy and force aleatoric uncertainty are linked through a spatially correlated noise process. We demonstrate our approach on an equivariant messages passing neural network potential trained on energies and forces on two out-of-equilibrium molecular datasets. Furthermore, we also show how to obtain epistemic uncertainties in this setting based on a Bayesian interpretation of deep ensemble models. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Presented at Advancing Molecular Machine Learning - Overcoming Limitations [ML4Molecules], ELLIS workshop, VIRTUAL, December 8, 2023, unofficial NeurIPS 2023 side-event

arXiv:2307.09614 [pdf, other]

Multi-view self-supervised learning for multivariate variable-channel time series

Authors: Thea Brüsch, Mikkel N. Schmidt, Tommy S. Alstrøm

Abstract: Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with differe… ▽ More Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings. △ Less

Submitted 20 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: To appear in proceedings of 2023 IEEE International workshop on Machine Learning for Signal Processing

arXiv:2306.13263 [pdf, other]

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

Authors: Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

Abstract: In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable corresp… ▽ More In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin. △ Less

Submitted 8 April, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: Accepted at TMLR

arXiv:2305.16325 [pdf, other]

Graph Neural Network Interatomic Potential Ensembles with Calibrated Aleatoric and Epistemic Uncertainty on Energy and Forces

Authors: Jonas Busk, Mikkel N. Schmidt, Ole Winther, Tejs Vegge, Peter Bjørn Jørgensen

Abstract: Inexpensive machine learning potentials are increasingly being used to speed up structural optimization and molecular dynamics simulations of materials by iteratively predicting and applying interatomic forces. In these settings, it is crucial to detect when predictions are unreliable to avoid wrong or misleading results. Here, we present a complete framework for training and recalibrating graph n… ▽ More Inexpensive machine learning potentials are increasingly being used to speed up structural optimization and molecular dynamics simulations of materials by iteratively predicting and applying interatomic forces. In these settings, it is crucial to detect when predictions are unreliable to avoid wrong or misleading results. Here, we present a complete framework for training and recalibrating graph neural network ensemble models to produce accurate predictions of energy and forces with calibrated uncertainty estimates. The proposed method considers both epistemic and aleatoric uncertainty and the total uncertainties are recalibrated post hoc using a nonlinear scaling function to achieve good calibration on previously unseen data, without loss of predictive accuracy. The method is demonstrated and evaluated on two challenging, publicly available datasets, ANI-1x (Smith et al.) and Transition1x (Schreiner et al.), both containing diverse conformations far from equilibrium. A detailed analysis of the predictive performance and uncertainty calibration is provided. In all experiments, the proposed method achieved low prediction error and good uncertainty calibration, with predicted uncertainty correlating with expected error, on energy and forces. To the best of our knowledge, the method presented in this paper is the first to consider a complete framework for obtaining calibrated epistemic and aleatoric uncertainty predictions on both energy and forces in ML potentials. △ Less

Submitted 11 September, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2212.02191 [pdf, other]

On the effectiveness of partial variance reduction in federated learning with heterogeneous data

Authors: Bo Li, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

Abstract: Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first… ▽ More Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers. We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost. We furthermore provide proof for the convergence rate of our algorithm. △ Less

Submitted 9 June, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: Accepted to CVPR 2023

arXiv:2211.14481 [pdf, other]

doi 10.1109/JLT.2023.3251660

End-to-End Learning for VCSEL-based Optical Interconnects: State-of-the-Art, Challenges, and Opportunities

Authors: Muralikrishnan Srinivasan, Jinxiang Song, Alexander Grabowski, Krzysztof Szczerba, Holger K. Iversen, Mikkel N. Schmidt, Darko Zibar, Jochen Schröder, Anders Larsson, Christian Häger, Henk Wymeersch

Abstract: Optical interconnects (OIs) based on vertical-cavity surface-emitting lasers (VCSELs) are the main workhorse within data centers, supercomputers, and even vehicles, providing low-cost, high-rate connectivity. VCSELs must operate under extremely harsh and time-varying conditions, thus requiring adaptive and flexible designs of the communication chain. Such designs can be built based on mathematical… ▽ More Optical interconnects (OIs) based on vertical-cavity surface-emitting lasers (VCSELs) are the main workhorse within data centers, supercomputers, and even vehicles, providing low-cost, high-rate connectivity. VCSELs must operate under extremely harsh and time-varying conditions, thus requiring adaptive and flexible designs of the communication chain. Such designs can be built based on mathematical models (model-based design) or learned from data (machine learning (ML) based design). Various ML techniques have recently come to the forefront, replacing individual components in the transmitters and receivers with deep neural networks. Beyond such component-wise learning, end-to-end (E2E) autoencoder approaches can reach the ultimate performance through co-optimizing entire parameterized transmitters and receivers. This tutorial paper aims to provide an overview of ML for VCSEL-based OIs, with a focus on E2E approaches, dealing specifically with the unique challenges facing VCSELs, such as the wide temperature variations and complex models. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2202.12549 [pdf, other]

doi 10.1039/D2AN00403H

Raman Spectrum Matching with Contrastive Representation Learning

Authors: Bo Li, Mikkel N. Schmidt, Tommy S. Alstrøm

Abstract: Raman spectroscopy is an effective, low-cost, non-intrusive technique often used for chemical identification. Typical approaches are based on matching observations to a reference database, which requires careful preprocessing, or supervised machine learning, which requires a fairly large number of training observations from each class. We propose a new machine learning technique for Raman spectrum… ▽ More Raman spectroscopy is an effective, low-cost, non-intrusive technique often used for chemical identification. Typical approaches are based on matching observations to a reference database, which requires careful preprocessing, or supervised machine learning, which requires a fairly large number of training observations from each class. We propose a new machine learning technique for Raman spectrum matching, based on contrastive representation learning, that requires no preprocessing and works with as little as a single reference spectrum from each class. On three datasets we demonstrate that our approach significantly improves or is on par with the state of the art in prediction accuracy, and we show how to compute conformal prediction sets with specified frequentist coverage. Based on our findings, we believe contrastive representation learning is a promising alternative to existing methods for Raman spectrum matching. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: Under review at Analytical Chemistry

arXiv:2201.06863 [pdf, other]

Programmatic Policy Extraction by Iterative Local Search

Authors: Rasmus Larsen, Mikkel Nørgaard Schmidt

Abstract: Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local se… ▽ More Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local search heuristic, we present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy. After examining our local search heuristic on a programming by example problem, we demonstrate our programmatic policy extraction method on a pendulum swing-up problem. Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2107.06068 [pdf, ps, other]

Calibrated Uncertainty for Molecular Property Prediction using Ensembles of Message Passing Neural Networks

Authors: Jonas Busk, Peter Bjørn Jørgensen, Arghya Bhowmik, Mikkel N. Schmidt, Ole Winther, Tejs Vegge

Abstract: Data-driven methods based on machine learning have the potential to accelerate computational analysis of atomic structures. In this context, reliable uncertainty estimates are important for assessing confidence in predictions and enabling decision making. However, machine learning models can produce badly calibrated uncertainty estimates and it is therefore crucial to detect and handle uncertainty… ▽ More Data-driven methods based on machine learning have the potential to accelerate computational analysis of atomic structures. In this context, reliable uncertainty estimates are important for assessing confidence in predictions and enabling decision making. However, machine learning models can produce badly calibrated uncertainty estimates and it is therefore crucial to detect and handle uncertainty carefully. In this work we extend a message passing neural network designed specifically for predicting properties of molecules and materials with a calibrated probabilistic predictive distribution. The method presented in this paper differs from previous work by considering both aleatoric and epistemic uncertainty in a unified framework, and by recalibrating the predictive distribution on unseen data. Through computer experiments, we show that our approach results in accurate models for predicting molecular formation energies with well calibrated uncertainty in and out of the training data distribution on two public molecular benchmark datasets, QM9 and PC9. The proposed method provides a general framework for training and evaluating neural network ensemble models that are able to produce accurate predictions of properties of molecules with well calibrated uncertainty estimates. △ Less

Submitted 3 November, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:1806.08195 [pdf, other]

Probabilistic PARAFAC2

Authors: Philip J. H. Jørgensen, Søren F. V. Nielsen, Jesper L. Hinrich, Mikkel N. Schmidt, Kristoffer H. Madsen, Morten Mørup

Abstract: The PARAFAC2 is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable in order to improve robustness to noise and provide a well founded principle for determining the number of factors, but ch… ▽ More The PARAFAC2 is a multimodal factor analysis model suitable for analyzing multi-way data when one of the modes has incomparable observation units, for example because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable in order to improve robustness to noise and provide a well founded principle for determining the number of factors, but challenging because the factor loadings are constrained to be orthogonal. We develop two probabilistic formulations of the PARAFAC2 along with variational procedures for inference: In the one approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the other, the factor loadings themselves are orthogonal using a matrix Von Mises-Fisher distribution. We contrast our probabilistic formulation to the conventional direct fitting algorithm based on maximum likelihood. On simulated data and real fluorescence spectroscopy and gas chromatography-mass spectrometry data, we compare our approach to the conventional PARAFAC2 model estimation and find that the probabilistic formulation is more robust to noise and model order misspecification. The probabilistic PARAFAC2 thus forms a promising framework for modeling multi-way data accounting for uncertainty. △ Less

Submitted 21 June, 2018; originally announced June 2018.

Comments: 16 pages (incl. 4 pages of supplemental material), 5 figures

arXiv:1806.03146 [pdf, other]

Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials

Authors: Peter Bjørn Jørgensen, Karsten Wedel Jacobsen, Mikkel N. Schmidt

Abstract: Neural message passing on molecular graphs is one of the most promising methods for predicting formation energy and other properties of molecules and materials. In this work we extend the neural message passing model with an edge update network which allows the information exchanged between atoms to depend on the hidden state of the receiving atom. We benchmark the proposed model on three publicly… ▽ More Neural message passing on molecular graphs is one of the most promising methods for predicting formation energy and other properties of molecules and materials. In this work we extend the neural message passing model with an edge update network which allows the information exchanged between atoms to depend on the hidden state of the receiving atom. We benchmark the proposed model on three publicly available datasets (QM9, The Materials Project and OQMD) and show that the proposed model yields superior prediction of formation energies and other properties on all three datasets in comparison with the best published results. Furthermore we investigate different methods for constructing the graph used to represent crystalline structures and we find that using a graph based on K-nearest neighbors achieves better prediction accuracy than using maximum distance cutoff or the Voronoi tessellation graph. △ Less

Submitted 8 June, 2018; originally announced June 2018.

arXiv:1101.5097 [pdf, ps, other]

Infinite Multiple Membership Relational Modeling for Complex Networks

Authors: Morten Mørup, Mikkel N. Schmidt, Lars Kai Hansen

Abstract: Learning latent structure in complex networks has become an important problem fueled by many types of networked data originating from practically all fields of science. In this paper, we propose a new non-parametric Bayesian multiple-membership latent feature model for networks. Contrary to existing multiple-membership models that scale quadratically in the number of vertices the proposed model sc… ▽ More Learning latent structure in complex networks has become an important problem fueled by many types of networked data originating from practically all fields of science. In this paper, we propose a new non-parametric Bayesian multiple-membership latent feature model for networks. Contrary to existing multiple-membership models that scale quadratically in the number of vertices the proposed model scales linearly in the number of links admitting multiple-membership analysis in large scale networks. We demonstrate a connection between the single membership relational model and multiple membership models and show on "real" size benchmark network data that accounting for multiple memberships improves the learning of latent structure as measured by link prediction while explicitly accounting for multiple membership result in a more compact representation of the latent structure of networks. △ Less

Submitted 26 January, 2011; originally announced January 2011.

Comments: 8 pages, 4 figures

Showing 1–13 of 13 results for author: Schmidt, M N