Zum Hauptinhalt springen

Showing 1–50 of 132 results for author: Nielsen, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12961  [pdf, other

    cs.IT cs.LG

    Symplectic Bregman divergences

    Authors: Frank Nielsen

    Abstract: We present a generalization of Bregman divergences in symplectic vector spaces that we term symplectic Bregman divergences. Symplectic Bregman divergences are derived from a symplectic generalization of the Fenchel-Young inequality which relies on the notion of symplectic subdifferentials. The symplectic Fenchel-Young inequality is obtained using the symplectic Fenchel transform which is defined w… ▽ More

    Submitted 28 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 14 pages, 3 figures

  2. arXiv:2408.04175  [pdf, other

    cs.LG cs.CG cs.CV cs.IT

    pyBregMan: A Python library for Bregman Manifolds

    Authors: Frank Nielsen, Alexander Soen

    Abstract: A Bregman manifold is a synonym for a dually flat space in information geometry which admits as a canonical divergence a Bregman divergence. Bregman manifolds are induced by smooth strictly convex functions like the cumulant or partition functions of regular exponential families, the negative entropy of mixture families, or the characteristic functions of regular cones just to list a few such conv… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 28 pages

  3. arXiv:2406.10775  [pdf, other

    cs.LG cs.AI stat.ML

    A Rate-Distortion View of Uncertainty Quantification

    Authors: Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski

    Abstract: In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching de… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Journal ref: International Conference on Machine Learning, 2024

  4. Knowledge graphs for empirical concept retrieval

    Authors: Lenka Tětková, Teresa Karen Scheidt, Maria Mandrup Fogh, Ellen Marie Gaunby Jørgensen, Finn Årup Nielsen, Lars Kai Hansen

    Abstract: Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz.\ as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018)… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Preprint. Accepted to The 2nd World Conference on eXplainable Artificial Intelligence

  5. arXiv:2403.10089  [pdf, other

    cs.IT cs.CV cs.LG

    Approximation and bounding techniques for the Fisher-Rao distances between parametric statistical models

    Authors: Frank Nielsen

    Abstract: The Fisher-Rao distance between two probability distributions of a statistical model is defined as the Riemannian geodesic distance induced by the Fisher information metric. In order to calculate the Fisher-Rao distance in closed-form, we need (1) to elicit a formula for the Fisher-Rao geodesics, and (2) to integrate the Fisher length element along those geodesics. We consider several numerically… ▽ More

    Submitted 21 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 47 pages

  6. arXiv:2402.04163  [pdf, ps, other

    cs.LG

    Tempered Calculus for ML: Application to Hyperbolic Model Embedding

    Authors: Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

    Abstract: Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration… ▽ More

    Submitted 8 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    ACM Class: I.2.6

  7. Divergences induced by dual subtractive and divisive normalizations of exponential families and their convex deformations

    Authors: Frank Nielsen

    Abstract: Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning among others. An exponential family can either be normalized subtractively by its cumulant or free energy function or equivalently normalized divisively by its partition function. Both subtractive and divisive normalizers are strictly convex and smooth functions inducing pai… ▽ More

    Submitted 17 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 19 pages, 3 figures

    Journal ref: Entropy 2024, 26(3), 193

  8. arXiv:2311.13459  [pdf, other

    cs.LG stat.ML

    The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

    Authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

    Abstract: Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities. Calculus on TEMs relies on a deformed algebra of arithmetic operators induced by the deformed logarithms used to define the tempered entropy. In this work, we… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  9. arXiv:2309.04015  [pdf, other

    cs.LG math.OC

    Optimal Transport with Tempered Exponential Measures

    Authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

    Abstract: In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "à-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "à-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans. In this paper, we show that a… ▽ More

    Submitted 16 February, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

  10. arXiv:2307.10644  [pdf, other

    cs.LG stat.ML

    Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

    Authors: Frank Nielsen

    Abstract: Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joi… ▽ More

    Submitted 9 June, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 38 pages, 11 figures

    Journal ref: 2nd Annual Topology, Algebra, and Geometry in Machine Learning Workshop, ICML TAG-ML, 2023

  11. arXiv:2303.15133  [pdf, other

    cs.DL

    Synia: Displaying data from Wikibases

    Authors: Finn Årup Nielsen

    Abstract: I present an agile method and a tool to display data from Wikidata and other Wikibase instances via SPARQL queries. The work-in-progress combines ideas from the Scholia Web application and the Listeria tool.

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: 3 pages, 2 tables, 3 figures, submitted to Wiki Workshop (10th edition)

    ACM Class: H.5.4

  12. arXiv:2303.05910  [pdf, ps, other

    stat.ML cs.LG

    Product Jacobi-Theta Boltzmann machines with score matching

    Authors: Andrea Pasquale, Daniel Krefl, Stefano Carrazza, Frank Nielsen

    Abstract: The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques. Successful applications can be obtained using models inspired by the Boltzmann machine (BM) architecture. In this manuscript, the product Jacobi-Theta Boltzmann machine (pJTBM) is introduced as a restricted version of the Riemann-Theta Boltzmann machine… ▽ More

    Submitted 12 January, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: 7 pages, 3 figures, ACAT22 proceedings

    Report number: TIF-UNIMI-2023-8

  13. arXiv:2302.09738  [pdf, other

    stat.ML cs.LG

    Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning

    Authors: Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations. Here, we simplify such difficulties for a class of sparse or structured symmetric positive-definite matrices with the affine-invariant metric. We do so by proposing a generalized version of the Riem… ▽ More

    Submitted 16 March, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: A long version of the ICML 2023 paper. Updated the main text to emphasize challenges of using existing Riemannian methods to estimate sparse and structured SPD matrices

  14. arXiv:2302.08175  [pdf, other

    cs.IT cs.CV cs.LG

    A numerical approximation method for the Fisher-Rao distance between multivariate normal distributions

    Authors: Frank Nielsen

    Abstract: We present a simple method to approximate Rao's distance between multivariate normal distributions based on discretizing curves joining normal distributions and approximating Rao's distances between successive nearby normal distributions on the curves by the square root of Jeffreys divergence, the symmetrized Kullback-Leibler divergence. We consider experimentally the linear interpolation curves i… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: 46 pages, 19 figures, 3 tables

    Journal ref: Entropy 25.4 (2023): 654

  15. Partial k-means to avoid outliers, mathematical programming formulations, complexity results

    Authors: Nicolas Dupin, Frank Nielsen

    Abstract: A well-known bottleneck of Min-Sum-of-Square Clustering (MSSC, the celebrated $k$-means problem) is to tackle the presence of outliers. In this paper, we propose a Partial clustering variant termed PMSSC which considers a fixed number of outliers to remove. We solve PMSSC by Integer Programming formulations and complexity results extending the ones from MSSC are studied. PMSSC is NP-hard in Euclid… ▽ More

    Submitted 31 May, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

  16. arXiv:2301.10980  [pdf, other

    cs.IT

    Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry

    Authors: Frank Nielsen

    Abstract: We generalize quasi-arithmetic means beyond scalars by considering the gradient map of a Legendre type real-valued function. The gradient map of a Legendre type function is proven strictly comonotone with a global inverse. It thus yields a generalization of strictly mononotone and differentiable functions generating scalar quasi-arithmetic means. Furthermore, the Legendre transformation gives rise… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 21 pages

  17. arXiv:2212.06995  [pdf, other

    q-bio.PE cs.SI physics.soc-ph

    Bifurcations in the Herd Immunity Threshold for Discrete-Time Models of Epidemic Spread

    Authors: Sinan A. Ozbay, Bjarke F. Nielsen, Maximilian M. Nguyen

    Abstract: We performed a thorough sensitivity analysis of the herd immunity threshold for discrete-time SIR compartmental models with a static network structure. We find unexpectedly that these models violate classical intuition which holds that the herd immunity threshold should monotonically increase with the transmission parameter. We find the existence of bifurcations in the herd immunity threshold in t… ▽ More

    Submitted 24 February, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  18. arXiv:2209.07481  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

    Authors: Rob Brekelmans, Frank Nielsen

    Abstract: Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior works have constructed annealing paths using quasi-arithmetic means, and interpreted the resulting… ▽ More

    Submitted 6 February, 2024; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Published in Information Geometry (Info. Geo. 2024)

  19. arXiv:2208.14645  [pdf, other

    cs.AR cs.DC eess.SY

    PaRTAA: A Real-time Multiprocessor for Mixed-Criticality Airborne Systems

    Authors: Shibarchi Majumder, Jens F D Nielsen, Thomas Bak

    Abstract: Mixed-criticality systems, where multiple systems with varying criticality-levels share a single hardware platform, require isolation between tasks with different criticality-levels. Isolation can be achieved with software-based solutions or can be enforced by a hardware level partitioning. An asymmetric multiprocessor architecture offers hardware-based isolation at the cost of underutilized hardw… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

    Journal ref: in IEEE Transactions on Computers, vol. 69, no. 8, pp. 1221-1232, 1 Aug. 2020

  20. Ærø: A Platform Architecture for Mixed-Criticality Airborne Systems

    Authors: Shibarchi Majumder, Jens Frederik Dalsgaard Nielsen, Thomas Bak

    Abstract: Real-time embedded platforms with resource constraints can take the benefits of mixed-criticality system where applications with different criticality-level share computational resources, with isolation in the temporal and spatial domain. A conventional software-based isolation mechanism adds additional overhead and requires certification with the highest level of criticality present in the system… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Journal ref: in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2307-2318, Oct. 2020

  21. Revisiting Chernoff Information with Likelihood Ratio Exponential Families

    Authors: Frank Nielsen

    Abstract: The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications due to its empirical robustness property found in application… ▽ More

    Submitted 5 August, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 41 pages

    Journal ref: Entropy 2022, 24(10), 1400

  22. arXiv:2206.08598  [pdf, other

    cs.LG stat.ML

    On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models

    Authors: Pascal Mattia Esser, Frank Nielsen

    Abstract: A common way to learn and analyze statistical models is to consider operations in the model parameter space. But what happens if we optimize in the parameter space and there is no one-to-one mapping between the parameter space and the underlying statistical model space? Such cases frequently occur for hierarchical models which include statistical mixtures or stochastic neural networks, and these m… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  23. arXiv:2205.13984  [pdf, other

    cs.IT

    Information measures and geometry of the hyperbolic exponential families of Poincaré and hyperboloid distributions

    Authors: Frank Nielsen, Kazuki Okamura

    Abstract: We study various information-theoretic measures and the information geometry of the Poincaré distributions and the related hyperboloid distributions, and prove that their statistical mixture models are universal density estimators of smooth densities in hyperbolic spaces. The Poincaré and the hyperboloid distributions are two types of hyperbolic probability distributions defined using different mo… ▽ More

    Submitted 25 May, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 43 pages

  24. A note on the $f$-divergences between multivariate location-scale families with either prescribed scale matrices or location parameters

    Authors: Frank Nielsen, Kazuki Okamura

    Abstract: We first extend the result of Ali and Silvey [Journal of the Royal Statistical Society: Series B, 28.1 (1966), 131-142] who first reported that any $f$-divergence between two isotropic multivariate Gaussian distributions amounts to a corresponding strictly increasing scalar function of their corresponding Mahalanobis distance. We report sufficient conditions on the standard probability density fun… ▽ More

    Submitted 30 May, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: 17 pages, 1 table, 1 figure

    Journal ref: Statistics and Computing , Volume 34, article number 60, (2024)

  25. arXiv:2203.11434  [pdf, other

    cs.LG

    Non-linear Embeddings in Hilbert Simplex Geometry

    Authors: Frank Nielsen, Ke Sun

    Abstract: A key technique of machine learning and computer vision is to embed discrete weighted graphs into continuous spaces for further downstream processing. Embedding discrete hierarchical structures in hyperbolic geometry has proven very successful since it was shown that any weighted tree can be embedded in that geometry with arbitrary low distortion. Various optimization methods for hyperbolic embedd… ▽ More

    Submitted 16 August, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: 21 pages, 12 figures

    Journal ref: 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML, ICML23 workshop), 2023

  26. The duo Fenchel-Young divergence

    Authors: Frank Nielsen

    Abstract: By calculating the Kullback-Leibler divergence between two probability measures belonging to different exponential families, we end up with a formula that generalizes the ordinary Fenchel-Young divergence. Inspired by this formula, we define the duo Fenchel-Young divergence and reports a majorization condition on its pair of generators which guarantees that this divergence is always non-negative.… ▽ More

    Submitted 17 March, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: 21 pages, 7 figures

    Journal ref: Entropy 2022, 24(3), 421

  27. arXiv:2112.03734  [pdf, other

    cs.LG

    Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds

    Authors: Pascal Mattia Esser, Frank Nielsen

    Abstract: When analyzing parametric statistical models, a useful approach consists in modeling geometrically the parameter space. However, even for very simple and commonly used hierarchical models like statistical mixtures or stochastic deep neural networks, the smoothness assumption of manifolds is violated at singular points which exhibit non-smooth neighborhoods in the parameter space. These singular mo… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Comments: A preliminary version of this work was presented at NeurIPS 2021 as a Spotlight in the 13th Annual Workshop on Optimization for Machine Learning (OPT2021)

  28. On the Kullback-Leibler divergence between discrete normal distributions

    Authors: Frank Nielsen

    Abstract: Discrete normal distributions are defined as the distributions with prescribed means and covariance matrices which maximize entropy on the integer lattice support. The set of discrete normal distributions form an exponential family with cumulant function related to the Riemann theta function. In this paper, we present several formula for common statistical divergences between discrete normal distr… ▽ More

    Submitted 18 October, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: 26 pages

    Journal ref: Journal of the Indian Institute of Science (2022)

  29. arXiv:2107.10884  [pdf, other

    stat.ML cs.LG

    Structured second-order methods via natural gradient descent

    Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces. Natural-gradient descent is an attractive approach to design new algorithms in many settings such as gradient-free, adaptive-gradient, and second-order methods. Our structured methods not only enjoy a structural invar… ▽ More

    Submitted 19 February, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Fixed some typos and added a new figure. ICML 2021 workshop paper. A short version of arXiv:2102.07405 with a focus on optimization tasks

  30. cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Distributions in the Elliptope

    Authors: Gautier Marti, Victor Goubet, Frank Nielsen

    Abstract: We propose a methodology to approximate conditional distributions in the elliptope of correlation matrices based on conditional generative adversarial networks. We illustrate the methodology with an application from quantitative finance: Monte Carlo simulations of correlated returns to compare risk-based portfolio construction methods. Finally, we discuss about current limitations and advocate for… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: International Conference on Geometric Science of Information

    Journal ref: GSI 2021: Geometric Science of Information pp 613-620

  31. Fast approximations of the Jeffreys divergence between univariate Gaussian mixture models via exponential polynomial densities

    Authors: Frank Nielsen

    Abstract: The Jeffreys divergence is a renown symmetrization of the oriented Kullback-Leibler divergence broadly used in information sciences. Since the Jeffreys divergence between Gaussian mixture models is not available in closed-form, various techniques with pros and cons have been proposed in the literature to either estimate, approximate, or lower and upper bound this divergence. In this paper, we prop… ▽ More

    Submitted 22 November, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: 43 pages

    Journal ref: Entropy 2021, 23, 1417

  32. arXiv:2107.00745  [pdf, other

    cs.LG cs.AI stat.ML

    q-Paths: Generalizing the Geometric Annealing Path using Power Means

    Authors: Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood

    Abstract: Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2012.07823

  33. arXiv:2104.13801  [pdf, other

    cs.IT

    The analytic dually flat space of the mixture family of two prescribed distinct Cauchy distributions

    Authors: Frank Nielsen

    Abstract: A smooth and strictly convex function on an open convex domain induces both (1) a Hessian manifold with respect to the standard flat Euclidean connection, and (2) a dually flat space of information geometry. We first review these constructions and illustrate how to instantiate them for (a) full regular exponential families from their partition functions, (b) regular homogeneous cones from their ch… ▽ More

    Submitted 6 January, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: 23 pages, 7 figures

  34. arXiv:2104.10548  [pdf, other

    cs.IT

    A note on some information-theoretic divergences between Zeta distributions

    Authors: Frank Nielsen

    Abstract: We consider the zeta distributions which are discrete power law distributions that can be interpreted as the counterparts of the continuous Pareto distributions with unit scale. The family of zeta distributions forms a discrete exponential family with normalizing constants expressed using the Riemann zeta function. We report several information-theoretic measures between zeta distributions and stu… ▽ More

    Submitted 23 June, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: 20 pages, 3 tables, 5 figures

  35. On a Variational Definition for the Jensen-Shannon Symmetrization of Distances based on the Information Radius

    Authors: Frank Nielsen

    Abstract: We generalize the Jensen-Shannon divergence by considering a variational definition with respect to a generic mean extending thereby the notion of Sibson's information radius. The variational definition applies to any arbitrary distance and yields another way to define a Jensen-Shannon symmetrization of distances. When the variational optimization is further constrained to belong to prescribed pro… ▽ More

    Submitted 1 April, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: 28 pages, 2 figures

    Journal ref: Entropy 2021, 23, 464

  36. arXiv:2102.07405  [pdf, other

    stat.ML cs.LG

    Tractable structured natural gradient descent using local parameterizations

    Authors: Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

    Abstract: Natural-gradient descent (NGD) on structured parameter spaces (e.g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations. We address this issue by using \emph{local-parameter coordinates} to obtain a flexible and efficient NGD method that works well for a wide-variety of structured parameterizations. We show four applications where our method (1) genera… ▽ More

    Submitted 17 January, 2022; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: An extended version of the ICML 2021 paper. Note: A workshop (short) paper with a focus on optimization tasks can be found at arXiv:2107.10884

  37. On $f$-divergences between Cauchy distributions

    Authors: Frank Nielsen, Kazuki Okamura

    Abstract: We prove that the $f$-divergences between univariate Cauchy distributions are all symmetric, and can be expressed as strictly increasing scalar functions of the symmetric chi-squared divergence. We report the corresponding scalar functions for the total variation distance, the Kullback-Leibler divergence, the squared Hellinger divergence, and the Jensen-Shannon divergence among others. Next, we gi… ▽ More

    Submitted 7 December, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: 64 pages, 1 figure, 1 table

    Journal ref: IEEE Transactions on Information Theory, 2022

  38. arXiv:2101.03839  [pdf, other

    cs.IT

    On information projections between multivariate elliptical and location-scale families

    Authors: Frank Nielsen

    Abstract: We study information projections with respect to statistical $f$-divergences between any two location-scale families. We consider a multivariate generalization of the location-scale families which includes the elliptical and the spherical subfamilies. By using the action of the multivariate location-scale group, we show how to reduce the calculation of $f$-divergences between any two location-scal… ▽ More

    Submitted 19 January, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: 23 pages, 2 figures

  39. arXiv:2012.15480  [pdf, other

    cs.LG cs.IT stat.ML

    Likelihood Ratio Exponential Families

    Authors: Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg

    Abstract: The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling. Linking these two ideas, recent work has interpreted the geometric mixture path as an exponential family of distributions to analyze the thermod… ▽ More

    Submitted 15 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry

  40. arXiv:2012.07823  [pdf, other

    cs.LG

    Annealed Importance Sampling with q-Paths

    Authors: Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

    Abstract: Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award)

    Journal ref: Published at UAI 2021 https://arxiv.org/abs/2107.00745

  41. arXiv:2006.07020  [pdf, other

    cs.CG cs.IT cs.LG

    On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

    Authors: Frank Nielsen

    Abstract: We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis' quadratic entropy related to the conformal flattening of the Fisher-Rao curved geometry. We prove that the Voronoi dia… ▽ More

    Submitted 18 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 34 pages, 13 figures

  42. arXiv:2005.03521  [pdf, other

    cs.CL

    The Danish Gigaword Project

    Authors: Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, Daniel Varab

    Abstract: Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialect… ▽ More

    Submitted 12 May, 2021; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: Identical to the NoDaLiDa 2021 version

  43. Hilbert geometry of the Siegel disk: The Siegel-Klein disk model

    Authors: Frank Nielsen

    Abstract: We study the Hilbert geometry induced by the Siegel disk domain, an open bounded convex set of complex square matrices of operator norm strictly less than one. This Hilbert geometry yields a generalization of the Klein disk model of hyperbolic geometry, henceforth called the Siegel-Klein disk model to differentiate it with the classical Siegel upper plane and disk domains. In the Siegel-Klein disk… ▽ More

    Submitted 10 September, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: 42 pages, 7 figures

    Journal ref: Entropy 2020, 22(9), 1019

  44. A note on Onicescu's informational energy and correlation coefficient in exponential families

    Authors: Frank Nielsen

    Abstract: The informational energy of Onicescu is a positive quantity that measures the amount of uncertainty of a random variable. But contrary to Shannon's entropy, the informational energy increases when randomness decreases. We report closed-form formula for Onicescu's informational energy and its associated correlation coefficient when the probability distributions belong to an exponential family. We s… ▽ More

    Submitted 4 February, 2022; v1 submitted 29 March, 2020; originally announced March 2020.

    Comments: 13 pages, 2 tables

    Journal ref: Foundations 2022, 2(2), 362-376

  45. arXiv:2003.02469  [pdf, other

    math.ST cs.CV cs.IT cs.LG

    Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family

    Authors: Frank Nielsen, Richard Nock

    Abstract: It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $α$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family. In this work, we report (dis)similarity formulas which bypass the explicit use of the cumu… ▽ More

    Submitted 7 April, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: 33 pages

  46. arXiv:2002.08345   

    cs.LG stat.ML

    Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances

    Authors: Gaëtan Hadjeres, Frank Nielsen

    Abstract: Distances between probability distributions that take into account the geometry of their sample space,like the Wasserstein or the Maximum Mean Discrepancy (MMD) distances have received a lot of attention in machine learning as they can, for instance, be used to compare probability distributions with disjoint supports. In this paper, we study a class of statistical Hilbert distances that we term th… ▽ More

    Submitted 28 April, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: Most results were already known. The distances described therein do not generalize MMD: it is an MMD with a distance-induced kernel (see [Sejdinovic et al. (2013)]

  47. The α-divergences associated with a pair of strictly comparable quasi-arithmetic means

    Authors: Frank Nielsen

    Abstract: We generalize the family of $α$-divergences using a pair of strictly comparable weighted means. In particular, we obtain the $1$-divergence in the limit case $α\rightarrow 1$ (a generalization of the Kullback-Leibler divergence) and the $0$-divergence in the limit case $α\rightarrow 0$ (a generalization of the reverse Kullback-Leibler divergence). We state the condition for a pair of quasi-arithme… ▽ More

    Submitted 17 February, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

    Comments: 18 pages

    Journal ref: Algorithms 2022, 15(11), 435

  48. arXiv:1912.00610  [pdf, other

    cs.IT math.ST

    On a generalization of the Jensen-Shannon divergence

    Authors: Frank Nielsen

    Abstract: The Jensen-Shannon divergence is a renown bounded symmetrization of the Kullback-Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar $α$-Jensen-Bregman divergences and derive thereof the vector-skew $α$-Jensen-Shannon divergences. We study the properties of these novel divergences and show… ▽ More

    Submitted 19 December, 2019; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 19 pages, 3 figures

    Journal ref: Entropy 2020, 22(2), 221

  49. arXiv:1911.12463  [pdf, other

    cs.LG stat.ML

    Information-Geometric Set Embeddings (IGSE): From Sets to Probability Distributions

    Authors: Ke Sun, Frank Nielsen

    Abstract: This letter introduces an abstract learning problem called the "set embedding": The objective is to map sets into probability distributions so as to lose less information. We relate set union and intersection operations with corresponding interpolations of probability distributions. We also demonstrate a preliminary solution with experimental results on toy set embedding examples.

    Submitted 11 December, 2019; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: To be presented at Sets & Partitions (NeurIPS 2019 workshop)

  50. arXiv:1910.03935  [pdf, other

    cs.CG cs.IT cs.LG

    On geodesic triangles with right angles in a dually flat space

    Authors: Frank Nielsen

    Abstract: The dualistic structure of statistical manifolds in information geometry yields eight types of geodesic triangles passing through three given points, the triangle vertices. The interior angles of geodesic triangles can sum up to $π$ like in Euclidean/Mahalanobis flat geometry, or exhibit otherwise angle excesses or angle defects. In this paper, we initiate the study of geodesic triangles in dually… ▽ More

    Submitted 10 May, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: 40 pages, 22 figures