Search | arXiv e-print repository

posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms

Authors: Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, Bob Carpenter, Aki Vehtari

Abstract: The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of represe… ▽ More The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of representative target models. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for model evaluation and comparison. To provide a wide range of realistic target densities, posteriordb currently comprises 120 representative models and has been instrumental in developing several general inference algorithms. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2309.12269 [pdf, other]

The Cambridge Law Corpus: A Dataset for Legal AI Research

Authors: Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

Abstract: We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases,… ▽ More We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases, done by legal experts. Using our annotated data, we have trained and evaluated case outcome extraction with GPT-3, GPT-4 and RoBERTa models to provide benchmarks. We include an extensive legal and ethical discussion to address the potentially sensitive nature of this material. As a consequence, the corpus will only be released for research purposes under certain restrictions. △ Less

Submitted 1 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Journal ref: Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2023

arXiv:2204.01846 [pdf, other]

Probabilistic Embeddings with Laplacian Graph Priors

Authors: Väinö Yrjänäinen, Måns Magnusson

Abstract: We introduce probabilistic embeddings using Laplacian priors (PELP). The proposed model enables incorporating graph side-information into static word embeddings. We theoretically show that the model unifies several previously proposed embedding methods under one umbrella. PELP generalises graph-enhanced, group, dynamic, and cross-lingual static word embeddings. PELP also enables any combination of… ▽ More We introduce probabilistic embeddings using Laplacian priors (PELP). The proposed model enables incorporating graph side-information into static word embeddings. We theoretically show that the model unifies several previously proposed embedding methods under one umbrella. PELP generalises graph-enhanced, group, dynamic, and cross-lingual static word embeddings. PELP also enables any combination of these previous models in a straightforward fashion. Furthermore, we empirically show that our model matches the performance of previous models as special cases. In addition, we demonstrate its flexibility by applying it to the comparison of political sociolects over time. Finally, we provide code as a TensorFlow implementation enabling flexible estimation in different settings. △ Less

Submitted 25 March, 2022; originally announced April 2022.

arXiv:2009.00666 [pdf, other]

Robust, Accurate Stochastic Optimization for Variational Inference

Authors: Akash Kumar Dhaka, Alejandro Catalina, Michael Riis Andersen, Måns Magnusson, Jonathan H. Huggins, Aki Vehtari

Abstract: We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs… ▽ More We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs to the assumed variational family, common stochastic optimization methods lead to poor variational approximations if the problem dimension is moderately large. We also demonstrate that these methods are not robust across diverse model types. Motivated by these findings, we develop a more robust and accurate stochastic optimization framework by viewing the underlying optimization algorithm as producing a Markov chain. Our approach is theoretically motivated and includes a diagnostic for convergence and a novel stopping rule, both of which are robust to noisy evaluations of the objective function. We show empirically that the proposed framework works well on a diverse set of models: it can automatically detect stochastic optimization failure or inaccurate variational approximation △ Less

Submitted 3 September, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

Journal ref: NeurIPS 2020

arXiv:2008.10859 [pdf, other]

Unbiased estimator for the variance of the leave-one-out cross-validation estimator for a Bayesian normal model with fixed variance

Authors: Tuomas Sivula, Måns Magnusson, Aki Vehtari

Abstract: When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. Considering the uncertainty is important, as the variability of the estimate can be high in some cases. An important result by Bengio and Grandvalet (2004) states that no general unbiased variance estimator can be… ▽ More When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. Considering the uncertainty is important, as the variability of the estimate can be high in some cases. An important result by Bengio and Grandvalet (2004) states that no general unbiased variance estimator can be constructed, that would apply for any utility or loss measure and any model. We show that it is possible to construct an unbiased estimator considering a specific predictive performance measure and model. We demonstrate an unbiased sampling distribution variance estimator for the Bayesian normal model with fixed model variance using the expected log pointwise predictive density (elpd) utility score. This example demonstrates that it is possible to obtain improved, problem-specific, unbiased estimators for assessing the uncertainty in LOO-CV estimation. △ Less

Submitted 15 February, 2022; v1 submitted 25 August, 2020; originally announced August 2020.

Comments: 21 pages, 1 figure. Communications in Statistics - Theory and Methods (2022)

arXiv:2008.10296 [pdf, other]

Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison

Authors: Tuomas Sivula, Måns Magnusson, Asael Alonzo Matamoros, Aki Vehtari

Abstract: Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. As leave-one-out cross-validation is based on finite observed data, there is uncertainty about the expected predictive performance on new data. By modeling this uncertainty when comparing two models, we can compute the probability that one… ▽ More Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. As leave-one-out cross-validation is based on finite observed data, there is uncertainty about the expected predictive performance on new data. By modeling this uncertainty when comparing two models, we can compute the probability that one model has a better predictive performance than the other. Modeling this uncertainty well is not trivial, and for example, it is known that the commonly used standard error estimate is often too small. We study the properties of the Bayesian LOO-CV estimator and the related uncertainty estimates when comparing two models. We provide new results of the properties both theoretically in the linear regression case and empirically for multiple different models and discuss the challenges of modeling the uncertainty. We show that problematic cases include: comparing models with similar predictions, misspecified models, and small data. In these cases, there is a weak connection in the skewness of the individual leave-one-out terms and the distribution of the error of the Bayesian LOO-CV estimator. We show that it is possible that the problematic skewness of the error distribution, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations. Based on the results, we also provide practical recommendations for the users of Bayesian LOO-CV for model comparison. △ Less

Submitted 21 October, 2023; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: 96 pages, 24 figures. Update 2023-10-20: Adding references and clarifying structure

arXiv:2001.00980 [pdf, other]

Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data

Authors: Måns Magnusson, Michael Riis Andersen, Johan Jonasson, Aki Vehtari

Abstract: Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performa… ▽ More Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performance by combining fast approximate LOO surrogates with exact LOO subsampling using the difference estimator and supply proofs with regards to scaling characteristics. The resulting approach can be orders of magnitude more efficient than previous approaches, as well as being better suited to model comparison. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Journal ref: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351, 2020

arXiv:1909.01459 [pdf, other]

Interpretable Word Embeddings via Informative Priors

Authors: Miriam Hurtado Bodell, Martin Arvidsson, Måns Magnusson

Abstract: Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensibl… ▽ More Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensible priors can capture latent semantic concepts better than or on-par with the current state of the art, while retaining the simplicity and generalizability of using priors. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: 10 pages, 2 figures, EMNLP 2019

MSC Class: 68T50 (Primary) 62P25 (Secondary) ACM Class: I.2.7

arXiv:1906.02416 [pdf, other]

Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

Authors: Alexander Terenin, Måns Magnusson, Leif Jonsson

Abstract: To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly spars… ▽ More To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language - an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days. △ Less

Submitted 6 October, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

Journal ref: Conference on Empirical Methods in Natural Language Processing, 2020

arXiv:1904.10679 [pdf, other]

Bayesian leave-one-out cross-validation for large data

Authors: Måns Magnusson, Michael Riis Andersen, Johan Jonasson, Aki Vehtari

Abstract: Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampl… ▽ More Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation for large datasets. We provide both theoretical and empirical results showing good properties for large data. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: Accepted to ICML 2019. This version is the submitted paper

Journal ref: Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253, 2019

arXiv:1704.03581 [pdf, other]

doi 10.1109/TPAMI.2018.2832641

Pólya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Authors: Alexander Terenin, Måns Magnusson, Leif Jonsson, David Draper

Abstract: Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct… ▽ More Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel Pólya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that - contrary to popular belief in the topic modeling literature - partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora. △ Less

Submitted 22 October, 2020; v1 submitted 11 April, 2017; originally announced April 2017.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7):1709-1719, 2019

arXiv:1602.00260 [pdf, other]

DOLDA - a regularized supervised topic model for high-dimensional multi-class regression

Authors: Måns Magnusson, Leif Jonsson, Mattias Villani

Abstract: Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (… ▽ More Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (DO) probit model (Johndrow et al., 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al., 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model's predictive accuracy on two datasets and demonstrate DOLDA's advantage in interpreting the generated predictions. △ Less

Submitted 20 October, 2016; v1 submitted 31 January, 2016; originally announced February 2016.

arXiv:1506.03784 [pdf, other]

Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

Authors: Måns Magnusson, Leif Jonsson, Mattias Villani, David Broman

Abstract: Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-kno… ▽ More Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. △ Less

Submitted 15 August, 2017; v1 submitted 11 June, 2015; originally announced June 2015.

Comments: Accepted for publication in Journal of Computational and Graphical Statistics

Showing 1–13 of 13 results for author: Magnusson, M