-
posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms
Authors:
Måns Magnusson,
Jakob Torgander,
Paul-Christian Bürkner,
Lu Zhang,
Bob Carpenter,
Aki Vehtari
Abstract:
The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of represe…
▽ More
The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of representative target models. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for model evaluation and comparison. To provide a wide range of realistic target densities, posteriordb currently comprises 120 representative models and has been instrumental in developing several general inference algorithms.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
The Cambridge Law Corpus: A Dataset for Legal AI Research
Authors:
Andreas Östling,
Holli Sargeant,
Huiyuan Xie,
Ludwig Bull,
Alexander Terenin,
Leif Jonsson,
Måns Magnusson,
Felix Steffek
Abstract:
We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases,…
▽ More
We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases, done by legal experts. Using our annotated data, we have trained and evaluated case outcome extraction with GPT-3, GPT-4 and RoBERTa models to provide benchmarks. We include an extensive legal and ethical discussion to address the potentially sensitive nature of this material. As a consequence, the corpus will only be released for research purposes under certain restrictions.
△ Less
Submitted 1 January, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Probabilistic Embeddings with Laplacian Graph Priors
Authors:
Väinö Yrjänäinen,
Måns Magnusson
Abstract:
We introduce probabilistic embeddings using Laplacian priors (PELP). The proposed model enables incorporating graph side-information into static word embeddings. We theoretically show that the model unifies several previously proposed embedding methods under one umbrella. PELP generalises graph-enhanced, group, dynamic, and cross-lingual static word embeddings. PELP also enables any combination of…
▽ More
We introduce probabilistic embeddings using Laplacian priors (PELP). The proposed model enables incorporating graph side-information into static word embeddings. We theoretically show that the model unifies several previously proposed embedding methods under one umbrella. PELP generalises graph-enhanced, group, dynamic, and cross-lingual static word embeddings. PELP also enables any combination of these previous models in a straightforward fashion. Furthermore, we empirically show that our model matches the performance of previous models as special cases. In addition, we demonstrate its flexibility by applying it to the comparison of political sociolects over time. Finally, we provide code as a TensorFlow implementation enabling flexible estimation in different settings.
△ Less
Submitted 25 March, 2022;
originally announced April 2022.
-
Robust, Accurate Stochastic Optimization for Variational Inference
Authors:
Akash Kumar Dhaka,
Alejandro Catalina,
Michael Riis Andersen,
Måns Magnusson,
Jonathan H. Huggins,
Aki Vehtari
Abstract:
We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs…
▽ More
We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs to the assumed variational family, common stochastic optimization methods lead to poor variational approximations if the problem dimension is moderately large. We also demonstrate that these methods are not robust across diverse model types. Motivated by these findings, we develop a more robust and accurate stochastic optimization framework by viewing the underlying optimization algorithm as producing a Markov chain. Our approach is theoretically motivated and includes a diagnostic for convergence and a novel stopping rule, both of which are robust to noisy evaluations of the objective function. We show empirically that the proposed framework works well on a diverse set of models: it can automatically detect stochastic optimization failure or inaccurate variational approximation
△ Less
Submitted 3 September, 2020; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Unbiased estimator for the variance of the leave-one-out cross-validation estimator for a Bayesian normal model with fixed variance
Authors:
Tuomas Sivula,
Måns Magnusson,
Aki Vehtari
Abstract:
When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. Considering the uncertainty is important, as the variability of the estimate can be high in some cases. An important result by Bengio and Grandvalet (2004) states that no general unbiased variance estimator can be…
▽ More
When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. Considering the uncertainty is important, as the variability of the estimate can be high in some cases. An important result by Bengio and Grandvalet (2004) states that no general unbiased variance estimator can be constructed, that would apply for any utility or loss measure and any model. We show that it is possible to construct an unbiased estimator considering a specific predictive performance measure and model. We demonstrate an unbiased sampling distribution variance estimator for the Bayesian normal model with fixed model variance using the expected log pointwise predictive density (elpd) utility score. This example demonstrates that it is possible to obtain improved, problem-specific, unbiased estimators for assessing the uncertainty in LOO-CV estimation.
△ Less
Submitted 15 February, 2022; v1 submitted 25 August, 2020;
originally announced August 2020.
-
Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison
Authors:
Tuomas Sivula,
Måns Magnusson,
Asael Alonzo Matamoros,
Aki Vehtari
Abstract:
Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. As leave-one-out cross-validation is based on finite observed data, there is uncertainty about the expected predictive performance on new data. By modeling this uncertainty when comparing two models, we can compute the probability that one…
▽ More
Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. As leave-one-out cross-validation is based on finite observed data, there is uncertainty about the expected predictive performance on new data. By modeling this uncertainty when comparing two models, we can compute the probability that one model has a better predictive performance than the other. Modeling this uncertainty well is not trivial, and for example, it is known that the commonly used standard error estimate is often too small. We study the properties of the Bayesian LOO-CV estimator and the related uncertainty estimates when comparing two models. We provide new results of the properties both theoretically in the linear regression case and empirically for multiple different models and discuss the challenges of modeling the uncertainty. We show that problematic cases include: comparing models with similar predictions, misspecified models, and small data. In these cases, there is a weak connection in the skewness of the individual leave-one-out terms and the distribution of the error of the Bayesian LOO-CV estimator. We show that it is possible that the problematic skewness of the error distribution, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations. Based on the results, we also provide practical recommendations for the users of Bayesian LOO-CV for model comparison.
△ Less
Submitted 21 October, 2023; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data
Authors:
Måns Magnusson,
Michael Riis Andersen,
Johan Jonasson,
Aki Vehtari
Abstract:
Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performa…
▽ More
Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performance by combining fast approximate LOO surrogates with exact LOO subsampling using the difference estimator and supply proofs with regards to scaling characteristics. The resulting approach can be orders of magnitude more efficient than previous approaches, as well as being better suited to model comparison.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.
-
Interpretable Word Embeddings via Informative Priors
Authors:
Miriam Hurtado Bodell,
Martin Arvidsson,
Måns Magnusson
Abstract:
Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensibl…
▽ More
Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensible priors can capture latent semantic concepts better than or on-par with the current state of the art, while retaining the simplicity and generalizability of using priors.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models
Authors:
Alexander Terenin,
Måns Magnusson,
Leif Jonsson
Abstract:
To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly spars…
▽ More
To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language - an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.
△ Less
Submitted 6 October, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Bayesian leave-one-out cross-validation for large data
Authors:
Måns Magnusson,
Michael Riis Andersen,
Johan Jonasson,
Aki Vehtari
Abstract:
Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampl…
▽ More
Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation for large datasets. We provide both theoretical and empirical results showing good properties for large data.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Pólya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler
Authors:
Alexander Terenin,
Måns Magnusson,
Leif Jonsson,
David Draper
Abstract:
Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct…
▽ More
Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel Pólya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that - contrary to popular belief in the topic modeling literature - partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.
△ Less
Submitted 22 October, 2020; v1 submitted 11 April, 2017;
originally announced April 2017.
-
DOLDA - a regularized supervised topic model for high-dimensional multi-class regression
Authors:
Måns Magnusson,
Leif Jonsson,
Mattias Villani
Abstract:
Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (…
▽ More
Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (DO) probit model (Johndrow et al., 2013) together with an efficient Horseshoe prior for variable selection/shrinkage (Carvalho et al., 2010). We propose a computationally efficient parallel Gibbs sampler for the new model. An important advantage of DOLDA is that learned topics are directly connected to individual classes without the need for a reference class. We evaluate the model's predictive accuracy on two datasets and demonstrate DOLDA's advantage in interpreting the generated predictions.
△ Less
Submitted 20 October, 2016; v1 submitted 31 January, 2016;
originally announced February 2016.
-
Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models
Authors:
Måns Magnusson,
Leif Jonsson,
Mattias Villani,
David Broman
Abstract:
Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-kno…
▽ More
Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler.
△ Less
Submitted 15 August, 2017; v1 submitted 11 June, 2015;
originally announced June 2015.