Search | arXiv e-print repository

Monte Carlo Information Geometry: The dually flat case

Abstract: Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback-Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distr… ▽ More Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback-Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold. △ Less

Submitted 19 March, 2018; originally announced March 2018.

Comments: 25 pages

arXiv:1803.04349 [pdf, other]

Linking ImageNet WordNet Synsets with Wikidata

Authors: Finn Årup Nielsen

Abstract: The linkage of ImageNet WordNet synsets to Wikidata items will leverage deep learning algorithm with access to a rich multilingual knowledge graph. Here I will describe our on-going efforts in linking the two resources and issues faced in matching the Wikidata and WordNet knowledge graphs. I show an example on how the linkage can be used in a deep learning setting with real-time image classificati… ▽ More The linkage of ImageNet WordNet synsets to Wikidata items will leverage deep learning algorithm with access to a rich multilingual knowledge graph. Here I will describe our on-going efforts in linking the two resources and issues faced in matching the Wikidata and WordNet knowledge graphs. I show an example on how the linkage can be used in a deep learning setting with real-time image classification and labeling in a non-English language and discuss what opportunities lies ahead. △ Less

Submitted 5 March, 2018; originally announced March 2018.

Comments: 6 pages, Wiki Workshop 2018

arXiv:1802.05554 [pdf, other]

New way of second quantized theory of fermions with either Clifford or Grassmann coordinates and spin-charge-family theory

Authors: N. S. Mankoc Borstnik, H. B. F. Nielsen

Abstract: Fermions with the internal degrees of freedom described in Clifford space carry in any dimension a half integer spin. There are two kinds of spins in Clifford space. The spin-charge-family theory,assuming even d=13+1, uses one kind of spins to describe in d=3+1 spins and charges of quarks and leptons and antiquarks and antileptons, while the other kind is used to describe families. The new way of… ▽ More Fermions with the internal degrees of freedom described in Clifford space carry in any dimension a half integer spin. There are two kinds of spins in Clifford space. The spin-charge-family theory,assuming even d=13+1, uses one kind of spins to describe in d=3+1 spins and charges of quarks and leptons and antiquarks and antileptons, while the other kind is used to describe families. The new way of second quantization, suggested by the spin-charge-family theory, is presented. It is shown that the creation and annihilation operators of 1-fermion states, written as products of nilpotents and projectors of an odd Clifford character, fulfill the anticommutation relations as required in the second quantization procedure for fermions: 1-fermion states are in Clifford space already second quantized, the creation operators for any n-fermion second quantized vectors are products of one fermion creation operators, operating on the empty vacuum state. It is demonstrated that also in Grassmann space there exist the creation and annihilation operators of an odd Grassmann character, generating "fermions", which fulfill as well the anticommutation relations for fermions, representing correspondingly the second quantized 1-"fermion" states, in this case with integer spins. Grassmann space offers no families. We discuss the new second quantization procedure of the fields in both spaces. For the Grassmann case we present the action, basic states, solutions of the Weyl equation for free massless "fermions" and discrete symmetry operators. A short overview of the achievements of the spin-charge-family theory is done, and open problems of this theory still waiting to be solved are presented. The Grassmann and the Clifford case are compared in order to better understand open questions in physics of elementary fermion and boson fields and in cosmology. △ Less

Submitted 31 August, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: 80 pages; This article is the revised version of the previous arxiv

arXiv:1710.04099 [pdf, other]

Wembedder: Wikidata entity embedding web service

Authors: Finn Årup Nielsen

Abstract: I present a web service for querying an embedding of entities in the Wikidata knowledge graph. The embedding is trained on the Wikidata dump using Gensim's Word2Vec implementation and a simple graph walk. A REST API is implemented. Together with the Wikidata API the web service exposes a multilingual resource for over 600'000 Wikidata items and properties. I present a web service for querying an embedding of entities in the Wikidata knowledge graph. The embedding is trained on the Wikidata dump using Gensim's Word2Vec implementation and a simple graph walk. A REST API is implemented. Together with the Wikidata API the web service exposes a multilingual resource for over 600'000 Wikidata items and properties. △ Less

Submitted 11 October, 2017; originally announced October 2017.

Comments: 3 pages, 2 figures

ACM Class: I.2.4; H.3.5

arXiv:1709.10498 [pdf, other]

A generalization of the Jensen divergence: The chord gap divergence

Authors: Frank Nielsen

Abstract: We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applications. We report an iterative concave-convex procedure for computing centroids, and analyze the performa… ▽ More We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applications. We report an iterative concave-convex procedure for computing centroids, and analyze the performance of the $k$-means++ clustering with respect to that new dissimilarity measure by introducing the Taylor-Lagrange remainder form of the skew Jensen divergences. △ Less

Submitted 12 November, 2017; v1 submitted 29 September, 2017; originally announced September 2017.

Comments: 13 pages, 2 figures

arXiv:1709.06404 [pdf, other]

Interactive Music Generation with Positional Constraints using Anticipation-RNNs

Authors: Gaëtan Hadjeres, Frank Nielsen

Abstract: Recurrent Neural Networks (RNNS) are now widely used on sequence generation tasks due to their ability to learn long-range dependencies and to generate sequences of arbitrary length. However, their left-to-right generation procedure only allows a limited control from a potential user which makes them unsuitable for interactive and creative usages such as interactive music generation. This paper in… ▽ More Recurrent Neural Networks (RNNS) are now widely used on sequence generation tasks due to their ability to learn long-range dependencies and to generate sequences of arbitrary length. However, their left-to-right generation procedure only allows a limited control from a potential user which makes them unsuitable for interactive and creative usages such as interactive music generation. This paper introduces a novel architecture called Anticipation-RNN which possesses the assets of the RNN-based generative models while allowing to enforce user-defined positional constraints. We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J.S. Bach chorale harmonizations. Sampling using the Anticipation-RNN is of the same order of complexity than sampling from the traditional RNN model. This fast and interactive generation of musical sequences opens ways to devise real-time systems that could be used for creative purposes. △ Less

Submitted 19 September, 2017; originally announced September 2017.

Comments: 9 pages, 7 figures

arXiv:1709.00740 [pdf, other]

Deep rank-based transposition-invariant distances on musical sequences

Authors: Gaëtan Hadjeres, Frank Nielsen

Abstract: Distances on symbolic musical sequences are needed for a variety of applications, from music retrieval to automatic music generation. These musical sequences belong to a given corpus (or style) and it is obvious that a good distance on musical sequences should take this information into account; being able to define a distance ex nihilo which could be applicable to all music styles seems implausib… ▽ More Distances on symbolic musical sequences are needed for a variety of applications, from music retrieval to automatic music generation. These musical sequences belong to a given corpus (or style) and it is obvious that a good distance on musical sequences should take this information into account; being able to define a distance ex nihilo which could be applicable to all music styles seems implausible. A distance could also be invariant under some transformations, such as transpositions, so that it can be used as a distance between musical motives rather than musical sequences. However, to our knowledge, none of the approaches to devise musical distances seem to address these issues. This paper introduces a method to build transposition-invariant distances on symbolic musical sequences which are learned from data. It is a hybrid distance which combines learned feature representations of musical sequences with a handcrafted rank distance. This distance depends less on the musical encoding of the data than previous methods and gives perceptually good results. We demonstrate its efficiency on the dataset of chorale melodies by J.S. Bach. △ Less

Submitted 3 September, 2017; originally announced September 2017.

Comments: 13 pages, 7 figures

arXiv:1708.00568 [pdf, other]

On $w$-mixtures: Finite convex combinations of prescribed component distributions

Authors: Frank Nielsen, Richard Nock

Abstract: We consider the space of $w$-mixtures which is defined as the set of finite statistical mixtures sharing the same prescribed component distributions closed under convex combinations. The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold. We show how the Kullback-Leibler (KL) divergence can… ▽ More We consider the space of $w$-mixtures which is defined as the set of finite statistical mixtures sharing the same prescribed component distributions closed under convex combinations. The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold. We show how the Kullback-Leibler (KL) divergence can be recovered from the corresponding Bregman divergence for the negentropy generator: That is, the KL divergence between two $w$-mixtures amounts to a Bregman Divergence (BD) induced by the Shannon negentropy generator. Thus the KL divergence between two Gaussian Mixture Models (GMMs) sharing the same Gaussian components is equivalent to a Bregman divergence. This KL-BD equivalence on a mixture family manifold implies that we can perform optimal KL-averaging aggregation of $w$-mixtures without information loss. More generally, we prove that the statistical skew Jensen-Shannon divergence between $w$-mixtures is equivalent to a skew Jensen divergence between their corresponding parameters. Finally, we state several properties, divergence identities, and inequalities relating to $w$-mixtures. △ Less

Submitted 8 June, 2021; v1 submitted 1 August, 2017; originally announced August 2017.

Comments: 34 pages, extend a preliminary paper (ICASSP 2018)

arXiv:1707.04588 [pdf, other]

GLSR-VAE: Geodesic Latent Space Regularization for Variational AutoEncoder Architectures

Authors: Gaëtan Hadjeres, Frank Nielsen, François Pachet

Abstract: VAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes. In many settings, the data we model possesses continuous attributes that we would like to take into account at generation time. We propose in this paper GLSR-VAE, a Geodesic Latent Space Regularization for the Variational AutoEncoder archi… ▽ More VAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes. In many settings, the data we model possesses continuous attributes that we would like to take into account at generation time. We propose in this paper GLSR-VAE, a Geodesic Latent Space Regularization for the Variational AutoEncoder architecture and its generalizations which allows a fine control on the embedding of the data into the latent space. When augmenting the VAE loss with this regularization, changes in the learned latent space reflects changes of the attributes of the data. This deeper understanding of the VAE latent space structure offers the possibility to modulate the attributes of the generated data in a continuous way. We demonstrate its efficiency on a monophonic music generation task where we manage to generate variations of discrete sequences in an intended and playful way. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: 11 pages

arXiv:1705.10749 [pdf, other]

F(750), We Miss You as a Bound State of 6 Top and 6 Antitop Quarks, Multiple Point Principle

Authors: Holger F. Bech Nielsen, D. L. Bennett, C. R. Das, C. D. Froggatt, L. V. Laperashvili

Abstract: We review our speculation, that in the pure Standard Model the exchange of Higgses, including also the ones "eaten by $W^{\pm}$ and Z", and of gluons together make a bound state of 6 top plus 6 anti top quarks bind so strongly that its mass gets down to about 1/3 of the mass of the collective mass 12 $m_t$ of the 12 constituent quarks. The true importance of this speculated bound state is that it… ▽ More We review our speculation, that in the pure Standard Model the exchange of Higgses, including also the ones "eaten by $W^{\pm}$ and Z", and of gluons together make a bound state of 6 top plus 6 anti top quarks bind so strongly that its mass gets down to about 1/3 of the mass of the collective mass 12 $m_t$ of the 12 constituent quarks. The true importance of this speculated bound state is that it makes it possible to uphold, even inside the Standard Mode, our proposal for what is really a new law of nature saying that there are several phases of empty space, vacua, all having very small energy densities (of the order of the present energy density in the universe). The reason suggested for believing in this new law called the "Multiple (Criticality) Point Principle" is, that estimating the mass of the speculated bound state using the "Multiple Point Principle" leads to two consistent mass-values; and they even agree with a crude bag-model like estimate of the mass of this bound state. Very, unfortunately, the statistical fluctuation so popular last year, when interpreted as the digamma resonance F(750), turned out not to be a real resonance, because our estimated bound state mass is just around the mass of 750 GeV. △ Less

Submitted 30 May, 2017; originally announced May 2017.

Comments: 25 pages, 11 figures, Corfu Summer Institute 2016 "School and Workshops on Elementary Particle Physics and Gravity", 31 August - 23 September, 2016, Corfu, Greece

arXiv:1704.02708 [pdf, other]

Evolving a Vector Space with any Generating Set

Authors: Richard Nock, Frank Nielsen

Abstract: In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $ε$-close as desired from the optimal one, for any required $ε>0$. Several previous positive results exist that can be related to evolving a vector space, but each former result imposes disproportionate… ▽ More In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $ε$-close as desired from the optimal one, for any required $ε>0$. Several previous positive results exist that can be related to evolving a vector space, but each former result imposes disproportionate representations or restrictions on (re)initialisations, distributions, performance functions and/or the mutator. In this paper, we show that all it takes to evolve a normed vector space is merely a set that generates the space. Furthermore, it takes only $\tilde{O}(1/ε^2)$ steps and it is essentially stable, agnostic and handles target drifts that rival some proven in fairly restricted settings. Our algorithm can be viewed as a close relative to a popular fifty-years old gradient-free optimization method for which little is still known from the convergence standpoint: Nelder-Mead simplex method. △ Less

Submitted 31 December, 2017; v1 submitted 10 April, 2017; originally announced April 2017.

ACM Class: I.2.6; G.1.6

arXiv:1704.00454 [pdf, other]

doi 10.1007/978-3-030-02520-5_11

Clustering in Hilbert simplex geometry

Authors: Frank Nielsen, Ke Sun

Abstract: Clustering categorical distributions in the finite-dimensional probability simplex is a fundamental task met in many applications dealing with normalized histograms. Traditionally, the differential-geometric structures of the probability simplex have been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the… ▽ More Clustering categorical distributions in the finite-dimensional probability simplex is a fundamental task met in many applications dealing with normalized histograms. Traditionally, the differential-geometric structures of the probability simplex have been used either by (i) setting the Riemannian metric tensor to the Fisher information matrix of the categorical distributions, or (ii) defining the dualistic information-geometric structure induced by a smooth dissimilarity measure, the Kullback-Leibler divergence. In this work, we introduce for clustering tasks a novel computationally-friendly framework for modeling geometrically the probability simplex: The {\em Hilbert simplex geometry}. In the Hilbert simplex geometry, the distance is the non-separable Hilbert's metric distance which satisfies the property of information monotonicity with distance level set functions described by polytope boundaries. We show that both the Aitchison and Hilbert simplex distances are norm distances on normalized logarithmic representations with respect to the $\ell_2$ and variation norms, respectively. We discuss the pros and cons of those different statistical modelings, and benchmark experimentally these different kind of geometries for center-based $k$-means and $k$-center clustering. Furthermore, since a canonical Hilbert distance can be defined on any bounded convex subset of the Euclidean space, we also consider Hilbert's geometry of the elliptope of correlation matrices and study its clustering performances compared to Fröbenius and log-det divergences. △ Less

Submitted 19 November, 2021; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: 48 pages

Journal ref: Geometric Structures of Information, Springer, 2019 (pp. 297-331)

arXiv:1703.09699

Preface to the 19th workshop "What comes beyond the standard models", Bled July 11--19, 2016, and links to the talks in the proceedings

Authors: N. S. Mankoc Borstnik, H. F. B. Nielsen, M. Y. Khlopov, D. Lukman

Abstract: The contribution contains the preface to the Proceedings to the 19th Workshop "What Comes Beyond the Standard Models", Bled, July 11 - 19, 2016, published in Bled workshops in physics, Vol.17, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2016, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2016. The contribution contains the preface to the Proceedings to the 19th Workshop "What Comes Beyond the Standard Models", Bled, July 11 - 19, 2016, published in Bled workshops in physics, Vol.17, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2016, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2016. △ Less

Submitted 5 April, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

Comments: "Bled workshops in physics", Vol.17, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2016, xvii+231 pages, corrected arxiv link

arXiv:1703.04222 [pdf, other]

Scholia and scientometrics with Wikidata

Authors: Finn Årup Nielsen, Daniel Mietchen, Egon Willighagen

Abstract: Scholia is a tool to handle scientific bibliographic information in Wikidata. The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service. Among several display formats available in Scholia are lists of publications for… ▽ More Scholia is a tool to handle scientific bibliographic information in Wikidata. The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service. Among several display formats available in Scholia are lists of publications for individual researchers and organizations, publications per year, employment timelines, as well as co-author networks and citation graphs. The Python package implementing the Web service is also able to format Wikidata bibliographic entries for use in LaTeX/BIBTeX. △ Less

Submitted 13 April, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

Comments: 16 pages, 5 figures, Scientometrics 2017

Journal ref: Joint Proceedings of the 1st International Workshop on Scientometrics and 1st International Workshop on Enabling Decentralised Scholarly Communication (2017)

arXiv:1703.00485 [pdf, ps, other]

doi 10.1007/978-3-030-65459-7

A review of two decades of correlations, hierarchies, networks and clustering in financial markets

Authors: Gautier Marti, Frank Nielsen, Mikołaj Bińkowski, Philippe Donnat

Abstract: We review the state of the art of clustering financial time series and the study of their correlations alongside other interaction networks. The aim of this review is to gather in one place the relevant material from different fields, e.g. machine learning, information geometry, econophysics, statistical physics, econometrics, behavioral finance. We hope it will help researchers to use more effect… ▽ More We review the state of the art of clustering financial time series and the study of their correlations alongside other interaction networks. The aim of this review is to gather in one place the relevant material from different fields, e.g. machine learning, information geometry, econophysics, statistical physics, econometrics, behavioral finance. We hope it will help researchers to use more effectively this alternative modeling of the financial time series. Decision makers and quantitative researchers may also be able to leverage its insights. Finally, we also hope that this review will form the basis of an open toolbox to study correlations, hierarchies, networks and clustering in financial markets. △ Less

Submitted 3 November, 2020; v1 submitted 1 March, 2017; originally announced March 2017.

Journal ref: Chapter in Progress in Information Geometry: Theory and Applications, 245-274, 2021

arXiv:1703.00339 [pdf, other]

Regularization of ill-posed point neuron models

Authors: Bjørn Fredrik Nielsen

Abstract: Point neuron models with a Heaviside firing rate function can be ill-posed. That is, the initial-condition-to-solution map might become discontinuous in finite time. If a Lipschitz continuous, but steep, firing rate function is employed, then standard ODE theory implies that such models are well-posed and can thus, approximately, be solved with finite precision arithmetic. We investigate whether t… ▽ More Point neuron models with a Heaviside firing rate function can be ill-posed. That is, the initial-condition-to-solution map might become discontinuous in finite time. If a Lipschitz continuous, but steep, firing rate function is employed, then standard ODE theory implies that such models are well-posed and can thus, approximately, be solved with finite precision arithmetic. We investigate whether the solution of this well-posed model converges to a solution of the ill-posed limit problem as the steepness parameter, of the firing rate function, tends to infinity. Our argument employs the Arzelà-Ascoli theorem and also yields the existence of a solution of the limit problem. However, we only obtain convergence of a subsequence of the regularized solutions. This is consistent with the fact that we show that models with a Heaviside firing rate function can have several solutions. Our analysis assumes that the Lebesgue measure of the time the limit function, provided by the Arzelà-Ascoli theorem, equals the threshold value for firing, is zero. If this assumption does not hold, we argue that the regularized solutions may not converge to a solution of the limit problem with a Heaviside firing function. △ Less

Submitted 1 March, 2017; originally announced March 2017.

arXiv:1702.04877 [pdf, ps, other]

Generalizing Jensen and Bregman divergences with comparative convexity and the statistical Bhattacharyya distances with comparable means

Authors: Frank Nielsen, Richard Nock

Abstract: Comparative convexity is a generalization of convexity relying on abstract notions of means. We define the Jensen divergence and the Jensen diversity from the viewpoint of comparative convexity, and show how to obtain the generalized Bregman divergences as limit cases of skewed Jensen divergences. In particular, we report explicit formula of these generalized Bregman divergences when considering q… ▽ More Comparative convexity is a generalization of convexity relying on abstract notions of means. We define the Jensen divergence and the Jensen diversity from the viewpoint of comparative convexity, and show how to obtain the generalized Bregman divergences as limit cases of skewed Jensen divergences. In particular, we report explicit formula of these generalized Bregman divergences when considering quasi-arithmetic means. Finally, we introduce a generalization of the Bhattacharyya statistical distances based on comparative means using relative convexity. △ Less

Submitted 3 May, 2017; v1 submitted 16 February, 2017; originally announced February 2017.

Comments: 24 pages

arXiv:1701.03916 [pdf, other]

doi 10.3390/e19030122

On Hölder projective divergences

Authors: Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet

Abstract: We describe a framework to build distances by measuring the tightness of inequalities, and introduce the notion of proper statistical divergences and improper pseudo-divergences. We then consider the Hölder ordinary and reverse inequalities, and present two novel classes of Hölder divergences and pseudo-divergences that both encapsulate the special case of the Cauchy-Schwarz divergence. We report… ▽ More We describe a framework to build distances by measuring the tightness of inequalities, and introduce the notion of proper statistical divergences and improper pseudo-divergences. We then consider the Hölder ordinary and reverse inequalities, and present two novel classes of Hölder divergences and pseudo-divergences that both encapsulate the special case of the Cauchy-Schwarz divergence. We report closed-form formulas for those statistical dissimilarities when considering distributions belonging to the same exponential family provided that the natural parameter space is a cone (e.g., multivariate Gaussians), or affine (e.g., categorical distributions). Those new classes of Hölder distances are invariant to rescaling, and thus do not require distributions to be normalized. Finally, we show how to compute statistical Hölder centroids with respect to those divergences, and carry out center-based clustering toy experiments on a set of Gaussian distributions that demonstrate empirically that symmetrized Hölder divergences outperform the symmetric Cauchy-Schwarz divergence. △ Less

Submitted 14 January, 2017; originally announced January 2017.

Comments: 25 pages

arXiv:1612.04555 [pdf, ps, other]

Scalable Group Level Probabilistic Sparse Factor Analysis

Authors: Jesper L. Hinrich, Søren F. V. Nielsen, Nicolai A. B. Riis, Casper T. Eriksen, Jacob Frøsig, Marco D. F. Kristensen, Mikkel N. Schmidt, Kristoffer H. Madsen, Morten Mørup

Abstract: Many data-driven approaches exist to extract neural representations of functional magnetic resonance imaging (fMRI) data, but most of them lack a proper probabilistic formulation. We propose a group level scalable probabilistic sparse factor analysis (psFA) allowing spatially sparse maps, component pruning using automatic relevance determination (ARD) and subject specific heteroscedastic spatial n… ▽ More Many data-driven approaches exist to extract neural representations of functional magnetic resonance imaging (fMRI) data, but most of them lack a proper probabilistic formulation. We propose a group level scalable probabilistic sparse factor analysis (psFA) allowing spatially sparse maps, component pruning using automatic relevance determination (ARD) and subject specific heteroscedastic spatial noise modeling. For task-based and resting state fMRI, we show that the sparsity constraint gives rise to components similar to those obtained by group independent component analysis. The noise modeling shows that noise is reduced in areas typically associated with activation by the experimental design. The psFA model identifies sparse components and the probabilistic setting provides a natural way to handle parameter uncertainties. The variational Bayesian framework easily extends to more complex noise models than the presently considered. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: 10 pages plus 5 pages appendix, Submitted to ICASSP 17

arXiv:1612.02954 [pdf, other]

A series of maximum entropy upper bounds of the differential entropy

Authors: Frank Nielsen, Richard Nock

Abstract: We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series. We then show how to use those generic bounds for upper bounding the differential entropy of Gaussian mixture models. This requires to calculate the raw moments and raw absolute moments of Gaussian mixtures in closed-form th… ▽ More We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series. We then show how to use those generic bounds for upper bounding the differential entropy of Gaussian mixture models. This requires to calculate the raw moments and raw absolute moments of Gaussian mixtures in closed-form that may also be handy in statistical machine learning and information theory. We report on our experiments and discuss on the tightness of those bounds. △ Less

Submitted 9 December, 2016; originally announced December 2016.

Comments: 18 pages

arXiv:1612.01010 [pdf, other]

DeepBach: a Steerable Model for Bach Chorales Generation

Authors: Gaëtan Hadjeres, François Pachet, Frank Nielsen

Abstract: This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. DeepBach's strength comes from the use of pseudo-Gibbs sampling coupled with an adapted representation of… ▽ More This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. DeepBach's strength comes from the use of pseudo-Gibbs sampling coupled with an adapted representation of musical data. This is in contrast with many automatic music composition approaches which tend to compose music sequentially. Our model is also steerable in the sense that a user can constrain the generation by imposing positional constraints such as notes, rhythms or cadences in the generated score. We also provide a plugin on top of the MuseScore music editor making the interaction with DeepBach easy to use. △ Less

Submitted 17 June, 2017; v1 submitted 3 December, 2016; originally announced December 2016.

Comments: 10 pages, ICML2017 version

Journal ref: Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1362-1371, 2017

arXiv:1610.09659 [pdf, other]

Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering

Authors: Gautier Marti, Sebastien Andler, Frank Nielsen, Philippe Donnat

Abstract: We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers… ▽ More We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset. The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables. Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns. Finally, we illustrate and benchmark the methodology on several datasets. Code and numerical experiments are available online for reproducible research. △ Less

Submitted 30 October, 2016; originally announced October 2016.

arXiv:1610.00364 [pdf, ps, other]

doi 10.1142/S0217751X16501864

F(750), We Miss You, as Bound State of 6 Top and 6 Anti top

Authors: Holger Frits Bech Nielsen

Abstract: We collect and estimate support for our long speculated "multiple point principle" saying that there should be several vacua all having (compared to the scales of high energy physics) very low energy densities. In pure Standard Model we suggest there being three by "multiple point principle" low energy density vacua, "present", "condensate" and "high field" vacuum. We fit the mass of the in our pi… ▽ More We collect and estimate support for our long speculated "multiple point principle" saying that there should be several vacua all having (compared to the scales of high energy physics) very low energy densities. In pure Standard Model we suggest there being three by "multiple point principle" low energy density vacua, "present", "condensate" and "high field" vacuum. We fit the mass of the in our picture since long speculated bound state of six top and six anti top quarks in three quite {\em independent ways} and get remarkably within our crude accuracy the {\em same} mass in all three fits! The new point of the present article is to estimate the bound state mass in what we could call a bag model estimation. The two other fits, which we review, obtain the mass of the bound state by fitting to the multiple point principle prediction of degenerate vacua. Our remarkable agreement of our three mass-fits can be interpreted to mean, that we have calculated at the end the energy densities of the two extra speculated vacua and found that they are indeed very small!. Unfortunately the recently much discussed statistical fluctuation peak F(750) has now been revealed to be just a fluctuation, very accidentally matches our fitted mass of the bound state remarkably well with the mass of this fluctuation 750 GeV. △ Less

Submitted 5 October, 2016; v1 submitted 2 October, 2016; originally announced October 2016.

Comments: minor corrections in calculation and commas and a few references added. arXiv admin note: text overlap with arXiv:1607.07907, adding few citations

arXiv:1609.07082 [pdf, other]

Large Margin Nearest Neighbor Classification using Curved Mahalanobis Distances

Authors: Frank Nielsen, Boris Muzellec, Richard Nock

Abstract: We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework. We report on our experimental results, and further consider the case of learning a mixed curved Mahala… ▽ More We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework. We report on our experimental results, and further consider the case of learning a mixed curved Mahalanobis distance. Besides, we show that the Cayley-Klein Voronoi diagrams are affine, and can be built from an equivalent (clipped) power diagrams, and that Cayley-Klein balls have Mahalanobis shapes with displaced centers. △ Less

Submitted 26 September, 2016; v1 submitted 22 September, 2016; originally announced September 2016.

Comments: 21 pages, 8 figures, 5 tables, extend ICIP 2016 paper entitled "classification With Mixtures of Curved Mahalanobis Metrics"

arXiv:1609.04495 [pdf, other]

Tsallis Regularized Optimal Transport and Ecological Inference

Authors: Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen

Abstract: Optimal transport is a powerful framework for computing distances between probability distributions. We unify the two main approaches to optimal transport, namely Monge-Kantorovitch and Sinkhorn-Cuturi, into what we define as Tsallis regularized optimal transport (\trot). \trot~interpolates a rich family of distortions from Wasserstein to Kullback-Leibler, encompassing as well Pearson, Neyman and… ▽ More Optimal transport is a powerful framework for computing distances between probability distributions. We unify the two main approaches to optimal transport, namely Monge-Kantorovitch and Sinkhorn-Cuturi, into what we define as Tsallis regularized optimal transport (\trot). \trot~interpolates a rich family of distortions from Wasserstein to Kullback-Leibler, encompassing as well Pearson, Neyman and Hellinger divergences, to name a few. We show that metric properties known for Sinkhorn-Cuturi generalize to \trot, and provide efficient algorithms for finding the optimal transportation plan with formal convergence proofs. We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences. \trot~provides a convenient framework for ecological inference by allowing to compute the joint distribution --- that is, the optimal transportation plan itself --- when side information is available, which is \textit{e.g.} typically what census represents in political science. Experiments on data from the 2012 US presidential elections display the potential of \trot~in delivering a faithful reconstruction of the joint distribution of ethnic groups and voter preferences. △ Less

Submitted 14 September, 2016; originally announced September 2016.

ACM Class: G.1.6

arXiv:1606.06069 [pdf, other]

Relative Natural Gradient for Learning Large Complex Models

Authors: Ke Sun, Frank Nielsen

Abstract: Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks. However related analysis becomes more and more difficult as the learner's structure turns large and complex. This paper makes a preliminary step towards a new direction. We extract a local component of a large neuron system, and defines its relative Fisher information metric that descri… ▽ More Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks. However related analysis becomes more and more difficult as the learner's structure turns large and complex. This paper makes a preliminary step towards a new direction. We extract a local component of a large neuron system, and defines its relative Fisher information metric that describes accurately this small component, and is invariant to the other parts of the system. This concept is important because the geometry structure is much simplified and it can be easily applied to guide the learning of neural networks. We provide an analysis on a list of commonly used components, and demonstrate how to use this concept to further improve optimization. △ Less

Submitted 20 June, 2016; originally announced June 2016.

Comments: 24 pages, 5 figures

arXiv:1606.05850 [pdf, other]

doi 10.3390/e18120442

Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities

Authors: Frank Nielsen, Ke Sun

Abstract: Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks. Since the Kullback-Leibler divergence of mixtures provably does not admit a closed-form formula, it is in practice either estimated using costly Monte-Carlo stochastic integration, approximated, or bounded using variou… ▽ More Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks. Since the Kullback-Leibler divergence of mixtures provably does not admit a closed-form formula, it is in practice either estimated using costly Monte-Carlo stochastic integration, approximated, or bounded using various techniques. We present a fast and generic method that builds algorithmically closed-form lower and upper bounds on the entropy, the cross-entropy and the Kullback-Leibler divergence of mixtures. We illustrate the versatile method by reporting on our experiments for approximating the Kullback-Leibler divergence between univariate exponential mixtures, Gaussian mixtures, Rayleigh mixtures, and Gamma mixtures. △ Less

Submitted 16 August, 2016; v1 submitted 19 June, 2016; originally announced June 2016.

Comments: 20 pages, 3 figures

arXiv:1605.03406

Preface to the 18th workshop "What comes beyond the standard models", Bled July 11--19, 2015, and links to the talks in the proceedings

Authors: N. S. Mankoc Borstnik, H. F. B. Nielsen, M. Y. Khlopov, D. Lukman

Abstract: The contribution contains the preface to the Proceedings to the 18th Workshop "What Comes Beyond the Standard Models", Bled, July 11 - 19, 2015, published in Bled workshops in physics, Vol.16, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2015, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2015. The contribution contains the preface to the Proceedings to the 18th Workshop "What Comes Beyond the Standard Models", Bled, July 11 - 19, 2015, published in Bled workshops in physics, Vol.16, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2015, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2015. △ Less

Submitted 29 November, 2016; v1 submitted 10 May, 2016; originally announced May 2016.

Comments: (link added; minor corrections) "Bled workshops in physics", Vol.16, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2015, xv+187 pages

arXiv:1604.08634 [pdf, other]

doi 10.1109/SSP.2016.7551770

Optimal Transport vs. Fisher-Rao distance between Copulas for Clustering Multivariate Time Series

Authors: Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat

Abstract: We present a methodology for clustering N objects which are described by multivariate time series, i.e. several sequences of real-valued random variables. This clustering methodology leverages copulas which are distributions encoding the dependence structure between several random variables. To take fully into account the dependence information while clustering, we need a distance between copulas.… ▽ More We present a methodology for clustering N objects which are described by multivariate time series, i.e. several sequences of real-valued random variables. This clustering methodology leverages copulas which are distributions encoding the dependence structure between several random variables. To take fully into account the dependence information while clustering, we need a distance between copulas. In this work, we compare renowned distances between distributions: the Fisher-Rao geodesic distance, related divergences and optimal transport, and discuss their advantages and disadvantages. Applications of such methodology can be found in the clustering of financial assets. A tutorial, experiments and implementation for reproducible research can be found at www.datagrapple.com/Tech. △ Less

Submitted 14 November, 2016; v1 submitted 28 April, 2016; originally announced April 2016.

Comments: Accepted at IEEE Workshop on Statistical Signal Processing (SSP 2016)

arXiv:1604.01592 [pdf, other]

Fast $(1+ε)$-approximation of the Löwner extremal matrices of high-dimensional symmetric matrices

Authors: Frank Nielsen, Richard Nock

Abstract: Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues. The space of symmetric matrices can be partially ordered using the Löwner ordering,… ▽ More Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues. The space of symmetric matrices can be partially ordered using the Löwner ordering, and computing extremal matrices dominating a given set of matrices is a basic primitive used in matrix-valued signal processing. In this letter, we design a fast and easy-to-implement iterative algorithm to approximate arbitrarily finely these extremal matrices. Finally, we discuss on extensions to matrix clustering. △ Less

Submitted 6 April, 2016; originally announced April 2016.

Comments: 10 pages

arXiv:1603.08558 [pdf, ps, other]

Degrees of freedom of massless boson and fermion fields in any even dimension

Authors: N. S. Mankoc Borstnik, H. B. F. Nielsen

Abstract: This is a discussion on degrees of freedom of massless fermion and boson fields, if they are free or weakly interacting. We generalize the gauge fields of $S^{ab}$ - $ω_{abc}$ - and of $\tilde{S}^{ab}$ - $ \tildeω_{abc}$ - of the spin-charge-family to the gauge fields of all possible products of $γ^a$'s and of all possible products of $\tildeγ^a$'s, the first taking care in the {\it spin-charge-fa… ▽ More This is a discussion on degrees of freedom of massless fermion and boson fields, if they are free or weakly interacting. We generalize the gauge fields of $S^{ab}$ - $ω_{abc}$ - and of $\tilde{S}^{ab}$ - $ \tildeω_{abc}$ - of the spin-charge-family to the gauge fields of all possible products of $γ^a$'s and of all possible products of $\tildeγ^a$'s, the first taking care in the {\it spin-charge-family} theory of the spins and charges quantum numbers ($τ^{Ai}=\sum_{a,b} c^{Ai}{}_{ab} \,S^{ab}$) of fermions, the second ($\tildeτ^{Ai}= \sum_{a,b} \tilde{c}^{Ai}{}_{ab}\, \tilde{S}^{ab}$) taking care of the families quantum numbers. △ Less

Submitted 9 February, 2016; originally announced March 2016.

Comments: 11 pages, revtex, This is the talk published in the Proceedings to the $18^{th}$ Workshop "What Comes Beyond the Standard Models", Bled, 11-19 of July, 2015

arXiv:1603.07822 [pdf, other]

On clustering financial time series: a need for distances between dependent random variables

Authors: Gautier Marti, Frank Nielsen, Philippe Donnat, Sébastien Andler

Abstract: The following working document summarizes our work on the clustering of financial time series. It was written for a workshop on information geometry and its application for image and signal processing. This workshop brought several experts in pure and applied mathematics together with applied researchers from medical imaging, radar signal processing and finance. The authors belong to the latter gr… ▽ More The following working document summarizes our work on the clustering of financial time series. It was written for a workshop on information geometry and its application for image and signal processing. This workshop brought several experts in pure and applied mathematics together with applied researchers from medical imaging, radar signal processing and finance. The authors belong to the latter group. This document was written as a long introduction to further development of geometric tools in financial applications such as risk or portfolio analysis. Indeed, risk and portfolio analysis essentially rely on covariance matrices. Besides that the Gaussian assumption is known to be inaccurate, covariance matrices are difficult to estimate from empirical data. To filter noise from the empirical estimate, Mantegna proposed using hierarchical clustering. In this work, we first show that this procedure is statistically consistent. Then, we propose to use clustering with a much broader application than the filtering of empirical covariance matrices from the estimate correlation coefficients. To be able to do that, we need to obtain distances between the financial time series that incorporate all the available information in these cross-dependent random processes. △ Less

Submitted 25 March, 2016; originally announced March 2016.

Comments: Work presented during a workshop on Information Geometry at the International Centre for Mathematical Sciences, Edinburgh, UK

arXiv:1603.04139 [pdf, other]

doi 10.1109/ICIP.2016.7533102

SSSC-AM: A Unified Framework for Video Co-Segmentation by Structured Sparse Subspace Clustering with Appearance and Motion Features

Authors: Junlin Yao, Frank Nielsen

Abstract: Video co-segmentation refers to the task of jointly segmenting common objects appearing in a given group of videos. In practice, high-dimensional data such as videos can be conceptually thought as being drawn from a union of subspaces corresponding to categories rather than from a smooth manifold. Therefore, segmenting data into respective subspaces --- subspace clustering --- finds widespread app… ▽ More Video co-segmentation refers to the task of jointly segmenting common objects appearing in a given group of videos. In practice, high-dimensional data such as videos can be conceptually thought as being drawn from a union of subspaces corresponding to categories rather than from a smooth manifold. Therefore, segmenting data into respective subspaces --- subspace clustering --- finds widespread applications in computer vision, including co-segmentation. State-of-the-art methods via subspace clustering seek to solve the problem in two steps: First, an affinity matrix is built from data, with appearance features or motion patterns. Second, the data are segmented by applying spectral clustering to the affinity matrix. However, this process is insufficient to obtain an optimal solution since it does not take into account the {\em interdependence} of the affinity matrix with the segmentation. In this work, we present a novel unified video co-segmentation framework inspired by the recent Structured Sparse Subspace Clustering ($\mathrm{S^{3}C}$) based on the {\em self-expressiveness} model. Our method yields more consistent segmentation results. In order to improve the detectability of motion features with missing trajectories due to occlusion or tracked points moving out of frames, we add an extra-dimensional signature to the motion trajectories. Moreover, we reformulate the $\mathrm{S^{3}C}$ algorithm by adding the affine subspace constraint in order to make it more suitable to segment rigid motions lying in affine subspaces of dimension at most $3$. Our experiments on MOViCS dataset show that our framework achieves the highest overall performance among baseline algorithms and demonstrate its robustness to heavy noise. △ Less

Submitted 28 September, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

Comments: 19 pages, 6 figures, 5 tables, extend ICIP 2016

Journal ref: IEEE International Conference on Image Processing (ICIP), 2016

arXiv:1603.04017 [pdf, other]

Clustering Financial Time Series: How Long is Enough?

Authors: Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat

Abstract: Researchers have used from 30 days to several years of daily returns as source data for clustering financial time series based on their correlations. This paper sets up a statistical framework to study the validity of such practices. We first show that clustering correlated random variables from their observed values is statistically consistent. Then, we also give a first empirical answer to the m… ▽ More Researchers have used from 30 days to several years of daily returns as source data for clustering financial time series based on their correlations. This paper sets up a statistical framework to study the validity of such practices. We first show that clustering correlated random variables from their observed values is statistically consistent. Then, we also give a first empirical answer to the much debated question: How long should the time series be? If too short, the clusters found can be spurious; if too long, dynamics can be smoothed out. △ Less

Submitted 14 April, 2016; v1 submitted 13 March, 2016; originally announced March 2016.

Comments: Accepted at IJCAI 2016

arXiv:1602.03175 [pdf, other]

Fermionization in an Arbitrary Number of Dimensions

Authors: N. S. Mankoc Borstnik, H. B. F. Nielsen

Abstract: One purpose of this proceedings-contribution is to show that at least for free massless particles it is possible to construct an explicit boson theory which is exactly equivalent in terms of momenta and energy to a fermion theory. The fermions come as $2^{d/2-1}$ families and the to this whole system of fermions corresponding bosons come as a whole series of the Kalb-Ramond fields, one set of comp… ▽ More One purpose of this proceedings-contribution is to show that at least for free massless particles it is possible to construct an explicit boson theory which is exactly equivalent in terms of momenta and energy to a fermion theory. The fermions come as $2^{d/2-1}$ families and the to this whole system of fermions corresponding bosons come as a whole series of the Kalb-Ramond fields, one set of components for each number of indexes on the tensor fields. Since Kalb-Ramond fields naturally (only) couple to the extended objects or branes, we suspect that inclusion of interaction into such for a bosonization prepared system - except for the lowest dimensions - without including branes or something like that is not likely to be possible. The need for the families is easily seen just by using the theorem long ago put forward by Aratyn and one of us (H.B.F.N.), which says that to have the statistical mechanics of the fermion system and the boson system to match one needs to have the number of the field components in the ratio $\frac{2^{d-1}-1}{2^{d-1}}= \frac{\# bosons}{\# fermions}$, enforcing that the number of fermion components must be a multiple of $2^{d-1}$, where $d$ is the space-time dimension. This "explanation" of the number of dimension is potentially useful for the explanation for the number of dimension put forward by one of us (S.N.M.B.) since long in the spin-charge-family theory, and leads like the latter to typically (a multiple of) $4$ families. And this is the second purpose for our work on the fermionization in an arbitrary number of dimensions - namely to learn how "natural" is the inclusion of the families in the way the spin-charge-family theory does. △ Less

Submitted 9 February, 2016; originally announced February 2016.

Comments: 22 pages, revtex, This is the talk published in the Proceedings to the $18^{th}$ Workshop "What Comes Beyond the Standard Models", Bled, 11-19 of July, 2015

arXiv:1602.02450 [pdf, ps, other]

Loss factorization, weakly supervised learning and label noise robustness

Authors: Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

Abstract: We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator --the focal quantity of this work-- which we characterize as the sufficient statistic… ▽ More We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator --the focal quantity of this work-- which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results. △ Less

Submitted 9 February, 2016; v1 submitted 7 February, 2016; originally announced February 2016.

arXiv:1602.01228 [pdf, other]

Image and Information

Authors: Frank Nielsen

Abstract: A well-known old adage says that {\em "A picture is worth a thousand words!"} (attributed to the Chinese philosopher Confucius ca 500 years BC). But more precisely, what do we mean by information in images? And how can it be retrieved effectively by machines? We briefly highlight these puzzling questions in this column. But first of all, let us start by defining more precisely what is meant by an… ▽ More A well-known old adage says that {\em "A picture is worth a thousand words!"} (attributed to the Chinese philosopher Confucius ca 500 years BC). But more precisely, what do we mean by information in images? And how can it be retrieved effectively by machines? We briefly highlight these puzzling questions in this column. But first of all, let us start by defining more precisely what is meant by an "Image." △ Less

Submitted 3 February, 2016; originally announced February 2016.

Comments: 9 pages, 7 figures. to be published in french by Belin publisher for a collaborative book project on "Image and Communication"

arXiv:1602.01198 [pdf, other]

k-variates++: more pluses in the k-means++

Authors: Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen

Abstract: k-means++ seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates++, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, and a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee,… ▽ More k-means++ seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates++, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, and a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a bias+variance approximation bound of the global optimum. This approximation exhibits a reduced dependency on the "noise" component with respect to the optimal potential --- actually approaching the statistical lower bound. We show that k-variates++ reduces to efficient (biased seeding) clustering algorithms tailored to specific frameworks; these include distributed, streaming and on-line clustering, with direct approximation results for these algorithms. Finally, we present a novel application of k-variates++ to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds --- state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is \textit{no} closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art. △ Less

Submitted 12 February, 2016; v1 submitted 3 February, 2016; originally announced February 2016.

ACM Class: H.3.3; I.5.3

arXiv:1601.00496 [pdf, other]

Nonparametric Modeling of Dynamic Functional Connectivity in fMRI Data

Authors: Søren F. V. Nielsen, Kristoffer H. Madsen, Rasmus Røge, Mikkel N. Schmidt, Morten Mørup

Abstract: Dynamic functional connectivity (FC) has in recent years become a topic of interest in the neuroimaging community. Several models and methods exist for both functional magnetic resonance imaging (fMRI) and electroencephalography (EEG), and the results point towards the conclusion that FC exhibits dynamic changes. The existing approaches modeling dynamic connectivity have primarily been based on ti… ▽ More Dynamic functional connectivity (FC) has in recent years become a topic of interest in the neuroimaging community. Several models and methods exist for both functional magnetic resonance imaging (fMRI) and electroencephalography (EEG), and the results point towards the conclusion that FC exhibits dynamic changes. The existing approaches modeling dynamic connectivity have primarily been based on time-windowing the data and k-means clustering. We propose a non-parametric generative model for dynamic FC in fMRI that does not rely on specifying window lengths and number of dynamic states. Rooted in Bayesian statistical modeling we use the predictive likelihood to investigate if the model can discriminate between a motor task and rest both within and across subjects. We further investigate what drives dynamic states using the model on the entire data collated across subjects and task/rest. We find that the number of states extracted are driven by subject variability and preprocessing differences while the individual states are almost purely defined by either task or rest. This questions how we in general interpret dynamic FC and points to the need for more research on what drives dynamic FC. △ Less

Submitted 8 June, 2016; v1 submitted 4 January, 2016; originally announced January 2016.

Comments: 8 pages, 1 figure. Presented at the Machine Learning and Interpretation in Neuroimaging Workshop (MLINI-2015), 2015 (arXiv:1605.04435)

Report number: MLINI/2015/08

arXiv:1509.08144 [pdf, other]

Optimal Copula Transport for Clustering Multivariate Time Series

Authors: Gautier Marti, Frank Nielsen, Philippe Donnat

Abstract: This paper presents a new methodology for clustering multivariate time series leveraging optimal transport between copulas. Copulas are used to encode both (i) intra-dependence of a multivariate time series, and (ii) inter-dependence between two time series. Then, optimal copula transport allows us to define two distances between multivariate time series: (i) one for measuring intra-dependence dis… ▽ More This paper presents a new methodology for clustering multivariate time series leveraging optimal transport between copulas. Copulas are used to encode both (i) intra-dependence of a multivariate time series, and (ii) inter-dependence between two time series. Then, optimal copula transport allows us to define two distances between multivariate time series: (i) one for measuring intra-dependence dissimilarity, (ii) another one for measuring inter-dependence dissimilarity based on a new multivariate dependence coefficient which is robust to noise, deterministic, and which can target specified dependencies. △ Less

Submitted 11 January, 2016; v1 submitted 27 September, 2015; originally announced September 2015.

Comments: Accepted at ICASSP 2016

arXiv:1509.05475 [pdf, other]

A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series

Authors: Gautier Marti, Philippe Very, Philippe Donnat, Frank Nielsen

Abstract: We present in this paper an empirical framework motivated by the practitioner point of view on stability. The goal is to both assess clustering validity and yield market insights by providing through the data perturbations we propose a multi-view of the assets' clustering behaviour. The perturbation framework is illustrated on an extensive credit default swap time series database available online… ▽ More We present in this paper an empirical framework motivated by the practitioner point of view on stability. The goal is to both assess clustering validity and yield market insights by providing through the data perturbations we propose a multi-view of the assets' clustering behaviour. The perturbation framework is illustrated on an extensive credit default swap time series database available online at www.datagrapple.com. △ Less

Submitted 17 September, 2015; originally announced September 2015.

Comments: Accepted at ICMLA 2015

arXiv:1507.08137 [pdf, other]

HCMapper: An interactive visualization tool to compare partition-based flat clustering extracted from pairs of dendrograms

Authors: Gautier Marti, Philippe Donnat, Frank Nielsen, Philippe Very

Abstract: We describe a new visualization tool, dubbed HCMapper, that visually helps to compare a pair of dendrograms computed on the same dataset by displaying multiscale partition-based layered structures. The dendrograms are obtained by hierarchical clustering techniques whose output reflects some hypothesis on the data and HCMapper is specifically designed to grasp at first glance both whether the two c… ▽ More We describe a new visualization tool, dubbed HCMapper, that visually helps to compare a pair of dendrograms computed on the same dataset by displaying multiscale partition-based layered structures. The dendrograms are obtained by hierarchical clustering techniques whose output reflects some hypothesis on the data and HCMapper is specifically designed to grasp at first glance both whether the two compared hypotheses broadly agree and the data points on which they do not concur. Leveraging juxtaposition and explicit encodings, HCMapper focus on two selected partitions while displaying coarser ones in context areas for understanding multiscale structure and eventually switching the selected partitions. HCMapper utility is shown through the example of testing whether the prices of credit default swap financial time series only undergo correlation. This use case is detailed in the supplementary material as well as experiments with code on toy-datasets for reproducible research. HCMapper is currently released as a visualization tool on the DataGrapple time series and clustering analysis platorm at www.datagrapple.com. △ Less

Submitted 22 February, 2016; v1 submitted 29 July, 2015; originally announced July 2015.

arXiv:1506.09163 [pdf, other]

Comment partitionner automatiquement des marches aléatoires ? Avec application à la finance quantitative

Authors: Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat

Abstract: We present in this paper a novel non-parametric approach useful for clustering Markov processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any. An associated metric is defined where th… ▽ More We present in this paper a novel non-parametric approach useful for clustering Markov processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal \url{http://www.datagrapple.com}. △ Less

Submitted 30 June, 2015; originally announced June 2015.

Comments: in French

arXiv:1506.06494 [pdf, other]

Robust preconditioners for PDE-constrained optimization with limited observations

Authors: Kent-André Mardal, Bjørn Fredrik Nielsen, Magne Nordaas

Abstract: Regularization robust preconditioners for PDE-constrained optimization problems have been successfully developed. These methods, however, typically assume that observation data is available throughout the entire domain of the state equation. For many inverse problems, this is an unrealistic assumption. In this paper we propose and analyze preconditioners for PDE-constrained optimization problems w… ▽ More Regularization robust preconditioners for PDE-constrained optimization problems have been successfully developed. These methods, however, typically assume that observation data is available throughout the entire domain of the state equation. For many inverse problems, this is an unrealistic assumption. In this paper we propose and analyze preconditioners for PDE-constrained optimization problems with limited observation data, e.g. observations are only available at the boundary of the solution domain. Our methods are robust with respect to both the regularization parameter and the mesh size. That is, the condition number of the preconditioned optimality system is uniformly bounded, independently of the size of these two parameters. We first consider a prototypical elliptic control problem and thereafter more general PDE-constrained optimization problems. Our theoretical findings are illuminated by several numerical results. △ Less

Submitted 22 June, 2015; originally announced June 2015.

MSC Class: 65F08; 65N21; 65K10

arXiv:1502.06786

Preface to the 17th workshop "What comes beyond the standard models", Bled July 20--28, 2014, and links to the talks in the proceedings

Authors: N. S. Mankoc Borstnik, H. F. B. Nielsen, M. Y. Khlopov, D. Lukman

Abstract: The contribution contains the preface to the Proceedings to the 17th Workshop "What Comes Beyond the Standard Models", Bled, July 20 - 28, 2014, published in Bled workshops in physics, Vol.15, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2014, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2014. The contribution contains the preface to the Proceedings to the 17th Workshop "What Comes Beyond the Standard Models", Bled, July 20 - 28, 2014, published in Bled workshops in physics, Vol.15, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2014, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2014. △ Less

Submitted 24 February, 2015; originally announced February 2015.

Comments: "Bled workshops in physics", Vol.15, No. 2, DMFA-Zaloznistvo, Ljubljana, Dec. 2014, 293 pages

arXiv:1410.1036 [pdf, other]

Further results on the hyperbolic Voronoi diagrams

Authors: Frank Nielsen, Richard Nock

Abstract: In Euclidean geometry, it is well-known that the $k$-order Voronoi diagram in $\mathbb{R}^d$ can be computed from the vertical projection of the $k$-level of an arrangement of hyperplanes tangent to a convex potential function in $\mathbb{R}^{d+1}$: the paraboloid. Similarly, we report for the Klein ball model of hyperbolic geometry such a {\em concave} potential function: the northern hemisphere.… ▽ More In Euclidean geometry, it is well-known that the $k$-order Voronoi diagram in $\mathbb{R}^d$ can be computed from the vertical projection of the $k$-level of an arrangement of hyperplanes tangent to a convex potential function in $\mathbb{R}^{d+1}$: the paraboloid. Similarly, we report for the Klein ball model of hyperbolic geometry such a {\em concave} potential function: the northern hemisphere. Furthermore, we also show how to build the hyperbolic $k$-order diagrams as equivalent clipped power diagrams in $\mathbb{R}^d$. We investigate the hyperbolic Voronoi diagram in the hyperboloid model and show how it reduces to a Klein-type model using central projections. △ Less

Submitted 4 October, 2014; originally announced October 2014.

Comments: 6 pages, 2 figures (ISVD 2014)

arXiv:1406.6314 [pdf, other]

Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

Authors: Frank Nielsen, Richard Nock

Abstract: Finding the optimal $k$-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the $k$-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when… ▽ More Finding the optimal $k$-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the $k$-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when $k$ or $d$ increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the $k$-means objective function. Second, we describe a novel heuristic, merge-and-split $k$-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the $k$-means objective. This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum. We show empirically that this merge-and-split $k$-means improves over the Hartigan's heuristic which is the {\em de facto} method of choice. Finally, we propose the $(k,l)$-means objective that generalizes the $k$-means objective by associating the data points to their $l$ closest cluster centers, and show how to either directly convert or iteratively relax the $(k,l)$-means into a $k$-means in order to reach better local minima. △ Less

Submitted 22 June, 2014; originally announced June 2014.

Comments: 14 pages

arXiv:1403.4441

Preface to the 16th workshop "What comes beyond the standard models", Bled July 14--21, 2013, and links to the talks in the proceedings

Authors: N. S. Mankoč Borštnik, H. F. B. Nielsen, M. Y. Khlopov, D. Lukman

Abstract: The contribution contains the preface to the Proceedings to the 16th Workshop What Comes Beyond the Standard Models, Bled, July 14 - 21, 2013, published in Bled workshops in physics, Vol.14, No. 2, DMFA-Založnistvo, Ljubljana, Dec. 2013, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2013. The contribution contains the preface to the Proceedings to the 16th Workshop What Comes Beyond the Standard Models, Bled, July 14 - 21, 2013, published in Bled workshops in physics, Vol.14, No. 2, DMFA-Založnistvo, Ljubljana, Dec. 2013, links to (most of) the published contributions and section (by M.Yu. Khlopov) on VIA at Bled 2013. △ Less

Submitted 18 March, 2014; originally announced March 2014.

Comments: "Bled workshops in physics", Vol.14, No. 2, DMFA-Založnistvo, Ljubljana, Dec. 2013

arXiv:1403.2485 [pdf, other]

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

Authors: Frank Nielsen, Richard Nock

Abstract: We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals. This case includes 1D Euclidean $k$-means, $k$-medoids, $k$-medians, $k$-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate $k$ by model selection. Finally, we illustrate and refine the method o… ▽ More We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals. This case includes 1D Euclidean $k$-means, $k$-medoids, $k$-medians, $k$-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate $k$ by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood. △ Less

Submitted 25 May, 2014; v1 submitted 11 March, 2014; originally announced March 2014.

Comments: 10 pages, 3 figures

arXiv:1403.1407 [pdf, other]

Small Representation Principle

Authors: H. B. F. Nielsen

Abstract: In a previous article Don Bennett and I looked for, found and proposed a game in which the Standard Model Gauge Group $S(U(2) \times U(3))$ gets singled out as the "winner". This "game" means that the by Nature chosen gauge group should be just that one, which has the maximal value for a quantity, which is a modification of the ratio of the quadratic Casimir for the adjoint representation and that… ▽ More In a previous article Don Bennett and I looked for, found and proposed a game in which the Standard Model Gauge Group $S(U(2) \times U(3))$ gets singled out as the "winner". This "game" means that the by Nature chosen gauge group should be just that one, which has the maximal value for a quantity, which is a modification of the ratio of the quadratic Casimir for the adjoint representation and that for a "smallest" faithful representation. In a recent article I proposed to extend this "game" to construct a corresponding game between different potential dimensions for space-time. The idea is to formulate, how the same competition as the one between the potential gauge groups would run out, if restricted to the potential Lorentz or Poincare groups achievable for different dimensions of space-time $d$. The remarkable point is, that it is the experimental space-time dimension 4, which wins. It follows that the whole Standard Model is specified by requiring SMALLEST REPRESENTATIONS! Speculatively we even argue that our principle found suggests the group of gauge transformations and some manifold(suggestive of say general relativity). △ Less

Submitted 6 March, 2014; originally announced March 2014.

Comments: Appeared in Proceedings to the 16th Workshop "What Comes Beyond the Standard Models?", Bled 14-21 of July 2013, Vol. 14, No. 2, DMFA Zaloznistvo, Ljubljana, Dec. 2013

Showing 101–150 of 188 results for author: Nielsen, F