-
Learning Sparse Codes with Entropy-Based ELBOs
Authors:
Dmytro Velychko,
Simon Damm,
Asja Fischer,
Jörg Lücke
Abstract:
Standard probabilistic sparse coding assumes a Laplace prior, a linear mapping from latents to observables, and Gaussian observable distributions. We here derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilis…
▽ More
Standard probabilistic sparse coding assumes a Laplace prior, a linear mapping from latents to observables, and Gaussian observable distributions. We here derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference; (B) unlike for previous non-trivial approximations, the novel objective is fully analytical; and (C) the objective allows for a novel principled form of annealing. The objective is derived by first showing that the standard ELBO objective converges to a sum of entropies, which matches similar recent results for generative models with Gaussian priors. The conditions under which the ELBO becomes equal to entropies are then shown to have analytical solutions, which leads to the fully analytical objective. Numerical experiments are used to demonstrate the feasibility of learning with such entropy-based ELBOs. We investigate different posterior approximations including Gaussians with correlated latents and deep amortized approximations. Furthermore, we numerically investigate entropy-based annealing which results in improved learning. Our main contributions are theoretical, however, and they are twofold: (1) for non-trivial posterior approximations, we provide the (to the knowledge of the authors) first analytical ELBO objective for standard probabilistic sparse coding; and (2) we provide the first demonstration on how a recently shown convergence of the ELBO to entropy sums can be used for learning.
△ Less
Submitted 9 April, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Strange and charm contributions to the HVP from C* boundary conditions
Authors:
Anian Altherr,
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Roman Gruber,
Javad Komijani,
Jens Lücke,
Marina Krstić Marinković,
Sofie Martins,
Agostino Patella,
Nazario Tantalo,
Paola Tavella
Abstract:
We present preliminary results for the determination of the leading strange and charm quark-connected contributions to the hadronic vacuum polarization contribution to the muon's g-2. Measurements are performed on the RC* collaboration's QCD ensembles, with 3+1 flavors of O(a) improved Wilson fermions and C* boundary conditions. The HVP is computed on a single value of the lattice spacing and two…
▽ More
We present preliminary results for the determination of the leading strange and charm quark-connected contributions to the hadronic vacuum polarization contribution to the muon's g-2. Measurements are performed on the RC* collaboration's QCD ensembles, with 3+1 flavors of O(a) improved Wilson fermions and C* boundary conditions. The HVP is computed on a single value of the lattice spacing and two lattice volumes at unphysical pion mass. In addition, we compare the signal-to-noise ratio for different lattice discretizations of the vector current.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Hadronic vacuum polarization with C* boundary conditions
Authors:
Anian Altherr,
Roman Gruber,
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Javad Komijani,
Jens Lücke,
Marina Krstić Marinković,
Sofie Martins,
Agostino Patella,
Nazario Tantalo,
Paola Tavella
Abstract:
We present a progress report on the calculation of the connected hadronic contribution to the muon g-2 with C* boundary conditions. For that purpose we use a QCD gauge ensemble with 3+1 flavors and two QCD+QED gauge ensembles with 1+2+1 flavors of dynamical quarks generated by the RC* collaboration. We detail the calculation of the vector mass and elaborate on both statistical and systematic error…
▽ More
We present a progress report on the calculation of the connected hadronic contribution to the muon g-2 with C* boundary conditions. For that purpose we use a QCD gauge ensemble with 3+1 flavors and two QCD+QED gauge ensembles with 1+2+1 flavors of dynamical quarks generated by the RC* collaboration. We detail the calculation of the vector mass and elaborate on both statistical and systematic errors.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Tuning of QCD+QED simulations with C$^{\star}$ boundary conditions
Authors:
Anian Altherr,
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Roman Gruber,
Jens Lücke,
Marina Marinkovic,
Agostino Patella,
Nazario Tantalo,
Paola Tavella
Abstract:
We give an update on the ongoing effort of the RC$^\star$ collaboration to generate fully dynamical QCD+QED ensembles with C$^\star$ boundary conditions using the openQ$^\star$D code. The simulations were tuned to the U-symmetric point ($m_d = m_s$) with pions at $m_{π^{\pm}} \approx 400$ MeV. The splitting of the light mesons is used as one of three tuning observables and fixed to…
▽ More
We give an update on the ongoing effort of the RC$^\star$ collaboration to generate fully dynamical QCD+QED ensembles with C$^\star$ boundary conditions using the openQ$^\star$D code. The simulations were tuned to the U-symmetric point ($m_d = m_s$) with pions at $m_{π^{\pm}} \approx 400$ MeV. The splitting of the light mesons is used as one of three tuning observables and fixed to $m_{K^{0}} - m_{K^{\pm}} \approx 5$ MeV and $m_{K^{0}} - m_{K^{\pm}} \approx 25$ MeV on ensembles with renormalized electromagnetic coupling $α_{\text{R}} \approx α_{\text{phys}}$ and $α_R \approx 5.5α_{phys}$ respectively. The tuning of the three independent quark masses to the desired lines of constant physics is particularly challenging. We will define the chosen hadronic renormalization scheme, and we will present a tuning strategy based on a combination of mass reweighting and linear interpolation to explore the parameter space. We will comment on finite-volume effects comparing meson masses on two different volumes with $m_{π^{\pm}} L \approx 3.2$ and $m_{π^{\pm}} L \approx 5.1$. We will also provide some technical details on our updated strategy to calculate the sign of the fermionic Pfaffian, which arises in presence of C$^\star$ boundary conditions in place of the standard fermionic determinant. More technical details on the generation of the configurations can be found in J. Lücke's proceedings
△ Less
Submitted 22 March, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.
-
$N_f = 1+2+1$ QCD+QED simulations with C$^\star$ boundary conditions
Authors:
Anian Altherr,
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Roman Gruber,
Jens Lücke,
Marina Krstić Marinković,
Agostino Patella,
Nazario Tantalo,
Paola Tavella
Abstract:
We give an update on the ongoing effort of the RC$^\star$ collaboration to generate fully dynamical QCD+QED configurations with C$^\star$ boundary conditions using the openQ$^\star$D code. The simulations are tuned to the U-symmetric point ($m_d=m_s$) with pions at $m_{π^\pm}\approx 400$ MeV. The splitting of the light mesons is used as one of three tuning observables and fixed to…
▽ More
We give an update on the ongoing effort of the RC$^\star$ collaboration to generate fully dynamical QCD+QED configurations with C$^\star$ boundary conditions using the openQ$^\star$D code. The simulations are tuned to the U-symmetric point ($m_d=m_s$) with pions at $m_{π^\pm}\approx 400$ MeV. The splitting of the light mesons is used as one of three tuning observables and fixed to $m_{K^0} - m_{K^\pm} \approx 5$ MeV and $m_{K^0} - m_{K^\pm} \approx 25$ MeV on ensembles with renormalized electromagnetic coupling $α_\mathrm{R} \approx α_\mathrm{phys.}$ and $α_\mathrm{R}\approx 5.5 α_\mathrm{phys.}$ respectively. We will discuss some details concerning our tuning strategy and present the calculation of the meson and baryon masses. Finally, we will also present a cost analysis for our simulations. More technical details on finite-volume effects and the tuning can be found in A. Cotellucci's proceedings.
△ Less
Submitted 21 March, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
First results on QCD+QED with C* boundary conditions
Authors:
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Jens Lücke,
Marina Krstić Marinković,
Agostino Patella,
Nazario Tantalo
Abstract:
Accounting for isospin-breaking corrections is critical for achieving subpercent precision in lattice computations of hadronic observables. A way to include QED and strong-isospin-breaking corrections in lattice QCD calculations is to impose C$^\star$ boundary conditions in space. Here, we demonstrate the computation of a selection of meson and baryon masses on two QCD and five QCD+QED gauge ensem…
▽ More
Accounting for isospin-breaking corrections is critical for achieving subpercent precision in lattice computations of hadronic observables. A way to include QED and strong-isospin-breaking corrections in lattice QCD calculations is to impose C$^\star$ boundary conditions in space. Here, we demonstrate the computation of a selection of meson and baryon masses on two QCD and five QCD+QED gauge ensembles in this setup, which preserves locality, gauge and translational invariance all through the calculation. The generation of the gauge ensembles is performed for two volumes, and three different values of the renormalized fine-structure constant at the U-symmetric point, corresponding to the SU(3)-symmetric QCD in the two ensembles where the electromagnetic coupling is turned off. We also present our tuning strategy and, to the extent possible, a cost analysis of the simulations with C$^\star$ boundary conditions.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
On the Convergence of the ELBO to Entropy Sums
Authors:
Jörg Lücke,
Jan Warnken
Abstract:
The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. During learning such algorithms change model parameters to increase the variational lower bound. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely the…
▽ More
The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many established as well as many novel algorithms for unsupervised learning. During learning such algorithms change model parameters to increase the variational lower bound. Learning usually proceeds until parameters have converged to values close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For standard machine learning models with one set of latents and one set of observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distribution. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary point (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many well-known generative models. As concrete examples we discuss Sigmoid Belief Networks, probabilistic PCA and (Gaussian and non-Gaussian) mixture models. The result also applies for standard (Gaussian) variational autoencoders, a special case that has been shown previously (Damm et al., 2023). The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family, and the model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
△ Less
Submitted 29 April, 2024; v1 submitted 7 September, 2022;
originally announced September 2022.
-
Implementing noise reduction techniques into theOpenQ*D package
Authors:
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Jens Lücke,
Marina Krstić Marinković,
Agostino Patella,
Mike Peardon,
Nazario Tantalo
Abstract:
We present the results of testing a new technique for stochastic noise reduction in the calculation of propagators by implementing it in OpenQ*D for two ensembles with O(a) improved Wilson fermion action, with periodic boundary conditions and pion masses of 437 MeV and 331 MeV, for the connected vector and pseudoscalar correlators. We find that the technique yields no speedup compared to tradition…
▽ More
We present the results of testing a new technique for stochastic noise reduction in the calculation of propagators by implementing it in OpenQ*D for two ensembles with O(a) improved Wilson fermion action, with periodic boundary conditions and pion masses of 437 MeV and 331 MeV, for the connected vector and pseudoscalar correlators. We find that the technique yields no speedup compared to traditional methods, owning to the failure of its underlying assumption that the spectra of the spatial Laplacian and Dirac operators are sufficiently similar for the technique's purposes.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Baryon masses from full QCD+QED${}_\text{C}$ simulations
Authors:
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Jens Lücke,
Marina Krstić Marinković,
Agostino Patella,
Nazario Tantalo
Abstract:
In these proceedings we present preliminary results for the masses of the proton, neutron and $Ω^-$ baryons obtained from QCD+QED lattice simulations performed with four dynamical quarks using C$^*$ boundary conditions. These results are part of the ongoing effort of the RC${}^*$ collaboration discussed in the companion proceedings, and have been obtained on a single ensemble in which the renormal…
▽ More
In these proceedings we present preliminary results for the masses of the proton, neutron and $Ω^-$ baryons obtained from QCD+QED lattice simulations performed with four dynamical quarks using C$^*$ boundary conditions. These results are part of the ongoing effort of the RC${}^*$ collaboration discussed in the companion proceedings, and have been obtained on a single ensemble in which the renormalised electromagnetic coupling is $α_{\text{em}}\sim 0.04$, the physical volume is $L\sim 1.7$ fm and the masses of the four dynamical quarks have been tuned at the $U$--spin symmetric point $m_d=m_s$. We demonstrate on this unphysical ensemble that baryon masses can be calculated with satisfactory precision when including QED without the need for gauge--fixing and perturbation theory. This makes us confident in the effectiveness of the strategy presented here also in the case of simulations closer to the physical point.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
An update on QCD+QED simulations with C* boundary conditions
Authors:
Lucius Bushnaq,
Isabel Campos,
Marco Catillo,
Alessandro Cotellucci,
Madeleine Dale,
Patrick Fritzsch,
Jens Lücke,
Marina Krstić Marinković,
Agostino Patella,
Nazario Tantalo
Abstract:
We present two novelties in our analysis of fully dynamical QCD+QED ensembles with C* boundary conditions. The first one is the explicit computation of the sign of the Pfaffian. We present an algorithm that provides a significant speedup compared to traditional methods. The second one is a reweighting of the mass in the context of the RHMC. We have tested the techniques on both pure QCD and QCD+QE…
▽ More
We present two novelties in our analysis of fully dynamical QCD+QED ensembles with C* boundary conditions. The first one is the explicit computation of the sign of the Pfaffian. We present an algorithm that provides a significant speedup compared to traditional methods. The second one is a reweighting of the mass in the context of the RHMC. We have tested the techniques on both pure QCD and QCD+QED ensembles with pions at $m_{π^\pm}\approx400$ MeV, a lattice spacing of $a\approx0.05$ fm, a fine-structure constant of $α_{\mathrm{R}}=0$ and $0.04$.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
Fibered simple knots
Authors:
Joshua Evan Greene,
John Luecke
Abstract:
We prove that a simple knot in the lens space $L(p,q)$ fibers if and only if its order in homology does not divide any remainder occurring in the Euclidean algorithm applied to the pair $(p,q)$. One corollary is that if $p=m^2$ is a perfect square, then any simple knot of order $m$ fibers, answering a question of Cebanu. More generally, we compute the leading coefficient of the Alexander polynomia…
▽ More
We prove that a simple knot in the lens space $L(p,q)$ fibers if and only if its order in homology does not divide any remainder occurring in the Euclidean algorithm applied to the pair $(p,q)$. One corollary is that if $p=m^2$ is a perfect square, then any simple knot of order $m$ fibers, answering a question of Cebanu. More generally, we compute the leading coefficient of the Alexander polynomial of a simple knot, and we describe how to construct a minimum complexity Seifert surface for one. The methods are direct, combinatorial, and geometric.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Evolutionary Variational Optimization of Generative Models
Authors:
Jakob Drefs,
Enrico Guiraud,
Jörg Lücke
Abstract:
We combine two popular optimization approaches to derive learning algorithms for generative models: variational optimization and evolutionary algorithms. The combination is realized for generative models with discrete latents by using truncated posteriors as the family of variational distributions. The variational parameters of truncated posteriors are sets of latent states. By interpreting these…
▽ More
We combine two popular optimization approaches to derive learning algorithms for generative models: variational optimization and evolutionary algorithms. The combination is realized for generative models with discrete latents by using truncated posteriors as the family of variational distributions. The variational parameters of truncated posteriors are sets of latent states. By interpreting these states as genomes of individuals and by using the variational lower bound to define a fitness, we can apply evolutionary algorithms to realize the variational loop. The used variational distributions are very flexible and we show that evolutionary algorithms can effectively and efficiently optimize the variational bound. Furthermore, the variational loop is generally applicable ("black box") with no analytical derivations required. To show general applicability, we apply the approach to three generative models (we use noisy-OR Bayes Nets, Binary Sparse Coding, and Spike-and-Slab Sparse Coding). To demonstrate effectiveness and efficiency of the novel variational approach, we use the standard competitive benchmarks of image denoising and inpainting. The benchmarks allow quantitative comparisons to a wide range of methods including probabilistic approaches, deep deterministic and generative networks, and non-local image processing methods. In the category of "zero-shot" learning (when only the corrupted image is used for training), we observed the evolutionary variational algorithm to significantly improve the state-of-the-art in many benchmark settings. For one well-known inpainting benchmark, we also observed state-of-the-art performance across all categories of algorithms although we only train on the corrupted image. In general, our investigations highlight the importance of research on optimization methods for generative models to achieve performance improvements.
△ Less
Submitted 16 April, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents
Authors:
Enrico Guiraud,
Jakob Drefs,
Jörg Lücke
Abstract:
Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to…
▽ More
Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.
△ Less
Submitted 24 March, 2023; v1 submitted 27 November, 2020;
originally announced November 2020.
-
The ELBO of Variational Autoencoders Converges to a Sum of Three Entropies
Authors:
Simon Damm,
Dennis Forster,
Dmytro Velychko,
Zhenwen Dai,
Asja Fischer,
Jörg Lücke
Abstract:
The central objective function of a variational autoencoder (VAE) is its variational lower bound (the ELBO). Here we show that for standard (i.e., Gaussian) VAEs the ELBO converges to a value given by the sum of three entropies: the (negative) entropy of the prior distribution, the expected (negative) entropy of the observable distribution, and the average entropy of the variational distributions…
▽ More
The central objective function of a variational autoencoder (VAE) is its variational lower bound (the ELBO). Here we show that for standard (i.e., Gaussian) VAEs the ELBO converges to a value given by the sum of three entropies: the (negative) entropy of the prior distribution, the expected (negative) entropy of the observable distribution, and the average entropy of the variational distributions (the latter is already part of the ELBO). Our derived analytical results are exact and apply for small as well as for intricate deep networks for encoder and decoder. Furthermore, they apply for finitely and infinitely many data points and at any stationary point (including local maxima and saddle points). The result implies that the ELBO can for standard VAEs often be computed in closed-form at stationary points while the original ELBO requires numerical approximations of integrals. As a main contribution, we provide the proof that the ELBO for VAEs is at stationary points equal to entropy sums. Numerical experiments then show that the obtained analytical results are sufficiently precise also in those vicinities of stationary points that are reached in practice. Furthermore, we discuss how the novel entropy form of the ELBO can be used to analyze and understand learning behavior. More generally, we believe that our contributions can be useful for future theoretical and practical studies on VAE learning as they provide novel information on those points in parameters space that optimization of VAEs converges to.
△ Less
Submitted 20 April, 2023; v1 submitted 28 October, 2020;
originally announced October 2020.
-
Phase transition for parameter learning of Hidden Markov Models
Authors:
Nikita Rau,
Jörg Lücke,
Alexander K. Hartmann
Abstract:
We study a phase transition in parameter learning of Hidden Markov Models (HMMs). We do this by generating sequences of observed symbols from given discrete HMMs with uniformly distributed transition probabilities and a noise level encoded in the output probabilities. By using the Baum-Welch (BW) algorithm, an Expectation-Maximization algorithm from the field of Machine Learning, we then try to es…
▽ More
We study a phase transition in parameter learning of Hidden Markov Models (HMMs). We do this by generating sequences of observed symbols from given discrete HMMs with uniformly distributed transition probabilities and a noise level encoded in the output probabilities. By using the Baum-Welch (BW) algorithm, an Expectation-Maximization algorithm from the field of Machine Learning, we then try to estimate the parameters of each investigated realization of an HMM. We study HMMs with n=4, 8 and 16 states. By changing the amount of accessible learning data and the noise level, we observe a phase-transition-like change in the performance of the learning algorithm. For bigger HMMs and more learning data, the learning behavior improves tremendously below a certain threshold in the noise strength. For a noise level above the threshold, learning is not possible. Furthermore, we use an overlap parameter applied to the results of a maximum-a-posteriori (Viterbi) algorithm to investigate the accuracy of the hidden state estimation around the phase transition.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Generic Unsupervised Optimization for a Latent Variable Model With Exponential Family Observables
Authors:
Hamid Mousavi,
Jakob Drefs,
Florian Hirschberger,
Jörg Lücke
Abstract:
Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables. Prominent examples of LVMs for unsupervised learning are probabilistic PCA or probabilistic SC which both assume a weighted linear summation of the latents to determine the mean of a Gaussian distribution for the observables. In many cases, however, observables do not follow a Gaussian distri…
▽ More
Latent variable models (LVMs) represent observed variables by parameterized functions of latent variables. Prominent examples of LVMs for unsupervised learning are probabilistic PCA or probabilistic SC which both assume a weighted linear summation of the latents to determine the mean of a Gaussian distribution for the observables. In many cases, however, observables do not follow a Gaussian distribution. For unsupervised learning, LVMs which assume specific non-Gaussian observables have therefore been considered. Already for specific choices of distributions, parameter optimization is challenging and only a few previous contributions considered LVMs with more generally defined observable distributions. Here, we consider LVMs that are defined for a range of different distributions, i.e., observables can follow any (regular) distribution of the exponential family. The novel class of LVMs presented is defined for binary latents, and it uses maximization in place of summation to link the latents to observables. To derive an optimization procedure, we follow an EM approach for maximum likelihood parameter estimation. We show that a set of very concise parameter update equations can be derived which feature the same functional form for all exponential family distributions. The derived generic optimization can consequently be applied to different types of metric data as well as to different types of discrete data. Also, the derived optimization equations can be combined with a recently suggested variational acceleration which is likewise generically applicable to the LVMs considered here. So, the combination maintains generic and direct applicability of the derived optimization procedure, but, crucially, enables efficient scalability. We numerically verify our analytical results and discuss some potential applications such as learning of variance structure, noise type estimation and denoising.
△ Less
Submitted 15 December, 2023; v1 submitted 4 March, 2020;
originally announced March 2020.
-
ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions
Authors:
Georgios Exarchakis,
Jörg Bornschein,
Abdul-Saboor Sheikh,
Zhenwen Dai,
Marc Henniges,
Jakob Drefs,
Jörg Lücke
Abstract:
ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are e…
▽ More
ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are especially well-suited in cases when data consist of components that combine non-linearly and/or for data requiring flexible prior distributions. Furthermore, the implemented algorithms go beyond standard approaches by inferring prior and noise parameters of the data, and they provide rich a-posteriori approximations for inference. The library is designed to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis (MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding (GSC, a recent spike-and-slab sparse coding approach). The algorithms are scalable due to a combination of variational approximations and parallelization. Implementations of all algorithms allow for parallel execution on multiple CPUs and multiple machines for medium to large-scale applications. Typical large-scale runs of the algorithms can use hundreds of CPUs to learn hundreds of dictionary elements from data with tens of millions of floating-point numbers such that models with several hundred thousand parameters can be optimized. The library is designed to have minimal dependencies and to be easy to use. It targets users of dictionary learning algorithms and Machine Learning researchers.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Large Scale Clustering with Variational EM for Gaussian Mixture Models
Authors:
Florian Hirschberger,
Dennis Forster,
Jörg Lücke
Abstract:
This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for isotropic Gaussian mixture models (GMMs). Further developments of the algorithm for GMMs with diagonal covariance matrices (instead of isotropic clusters) and their corresponding benchmarking results have been published by TPAMI (doi:10.1109/TPAMI.2021.3133763) in the paper "A Variational EM Accele…
▽ More
This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for isotropic Gaussian mixture models (GMMs). Further developments of the algorithm for GMMs with diagonal covariance matrices (instead of isotropic clusters) and their corresponding benchmarking results have been published by TPAMI (doi:10.1109/TPAMI.2021.3133763) in the paper "A Variational EM Acceleration for Efficient Clustering at Very Large Scales". We kindly refer the reader to the TPAMI paper instead of this much earlier arXiv version (the TPAMI paper is also open access). Publicly available source code accompanies the paper (see https://github.com/variational-sublinear-clustering). Please note that the TPAMI paper does not contain the benchmark on the 80 Million Tiny Images dataset anymore because we followed the call of the dataset creators to discontinue the use of that dataset.
The aim of the project (which resulted in this arXiv version and the later TPAMI paper) is the exploration of the current efficiency and large-scale limits in fitting a parametric model for clustering to data distributions. To reduce computational complexity, we used a clustering objective based on truncated variational EM (which reduces complexity for many clusters) in combination with coreset objectives (which reduce complexity for many data points). We used efficient coreset construction and efficient seeding to translate the theoretical sublinear complexity gains into an efficient algorithm. In applications to standard large-scale benchmarks for clustering, we then observed substantial wall-clock speedups compared to already highly efficient clustering approaches. To demonstrate that the observed efficiency enables applications previously considered unfeasible, we clustered the entire and unscaled 80 Million Tiny Images dataset into up to 32,000 clusters.
△ Less
Submitted 21 June, 2022; v1 submitted 1 October, 2018;
originally announced October 2018.
-
Truncated Variational Sampling for "Black Box" Optimization of Generative Models
Authors:
Jörg Lücke,
Zhenwen Dai,
Georgios Exarchakis
Abstract:
We investigate the optimization of two probabilistic generative models with binary latent variables using a novel variational EM approach. The approach distinguishes itself from previous variational approaches by using latent states as variational parameters. Here we use efficient and general purpose sampling procedures to vary the latent states, and investigate the "black box" applicability of th…
▽ More
We investigate the optimization of two probabilistic generative models with binary latent variables using a novel variational EM approach. The approach distinguishes itself from previous variational approaches by using latent states as variational parameters. Here we use efficient and general purpose sampling procedures to vary the latent states, and investigate the "black box" applicability of the resulting optimization procedure. For general purpose applicability, samples are drawn from approximate marginal distributions of the considered generative model as well as from the model's prior distribution. As such, variational sampling is defined in a generic form, and is directly executable for a given model. As a proof of concept, we then apply the novel procedure (A) to Binary Sparse Coding (a model with continuous observables), and (B) to basic Sigmoid Belief Networks (which are models with binary observables). Numerical experiments verify that the investigated approach efficiently as well as effectively increases a variational free energy objective without requiring any additional analytical steps.
△ Less
Submitted 22 February, 2018; v1 submitted 21 December, 2017;
originally announced December 2017.
-
Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and $k$-means
Authors:
Dennis Forster,
Jörg Lücke
Abstract:
One iteration of standard $k$-means (i.e., Lloyd's algorithm) or standard EM for Gaussian mixture models (GMMs) scales linearly with the number of clusters $C$, data points $N$, and data dimensionality $D$. In this study, we explore whether one iteration of $k$-means or EM for GMMs can scale sublinearly with $C$ at run-time, while improving the clustering objective remains effective. The tool we a…
▽ More
One iteration of standard $k$-means (i.e., Lloyd's algorithm) or standard EM for Gaussian mixture models (GMMs) scales linearly with the number of clusters $C$, data points $N$, and data dimensionality $D$. In this study, we explore whether one iteration of $k$-means or EM for GMMs can scale sublinearly with $C$ at run-time, while improving the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically used to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is to use a partial variational E-step which reduces the linear complexity of $\mathcal{O}(NCD)$ required for a full E-step to a sublinear complexity. Our main observation is that the linear dependency on $C$ can be reduced to a dependency on a much smaller parameter $G$ which relates to cluster neighborhood relations. We focus on two versions of partial variational EM for clustering: variational GMM, scaling with $\mathcal{O}(NG^2D)$, and variational $k$-means, scaling with $\mathcal{O}(NGD)$ per iteration. Empirical results show that these algorithms still require comparable numbers of iterations to improve the clustering objective to same values as $k$-means. For data with many clusters, we consequently observe reductions of net computational demands between two and three orders of magnitude. More generally, our results provide substantial empirical evidence in favor of clustering to scale sublinearly with $C$.
△ Less
Submitted 17 April, 2018; v1 submitted 9 November, 2017;
originally announced November 2017.
-
Asymmetric L-space knots
Authors:
Kenneth L. Baker,
John Luecke
Abstract:
We construct the first examples of asymmetric L-space knots in $S^3$. More specifically, we exhibit a construction of hyperbolic knots in $S^3$ with both (i) a surgery that may be realized as a surgery on a strongly invertible link such that the result of the surgery is the double branched cover of an alternating link and (ii) trivial isometry group. In particular, this produces L-space knots in…
▽ More
We construct the first examples of asymmetric L-space knots in $S^3$. More specifically, we exhibit a construction of hyperbolic knots in $S^3$ with both (i) a surgery that may be realized as a surgery on a strongly invertible link such that the result of the surgery is the double branched cover of an alternating link and (ii) trivial isometry group. In particular, this produces L-space knots in $S^3$ which are not strongly invertible. The construction also immediately extends to produce asymmetric L-space knots in any lens space, including $S^1 \times S^2$.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
$k$-means as a variational EM approximation of Gaussian mixture models
Authors:
Jörg Lücke,
Dennis Forster
Abstract:
We show that $k$-means (Lloyd's algorithm) is obtained as a special case when truncated variational EM approximations are applied to Gaussian Mixture Models (GMM) with isotropic Gaussians. In contrast to the standard way to relate $k$-means and GMMs, the provided derivation shows that it is not required to consider Gaussians with small variances or the limit case of zero variances. There are a num…
▽ More
We show that $k$-means (Lloyd's algorithm) is obtained as a special case when truncated variational EM approximations are applied to Gaussian Mixture Models (GMM) with isotropic Gaussians. In contrast to the standard way to relate $k$-means and GMMs, the provided derivation shows that it is not required to consider Gaussians with small variances or the limit case of zero variances. There are a number of consequences that directly follow from our approach: (A) $k$-means can be shown to increase a free energy associated with truncated distributions and this free energy can directly be reformulated in terms of the $k$-means objective; (B) $k$-means generalizations can directly be derived by considering the 2nd closest, 3rd closest etc. cluster in addition to just the closest one; and (C) the embedding of $k$-means into a free energy framework allows for theoretical interpretations of other $k$-means generalizations in the literature. In general, truncated variational EM provides a natural and rigorous quantitative link between $k$-means-like clustering and GMM clustering algorithms which may be very relevant for future theoretical and empirical studies.
△ Less
Submitted 6 June, 2019; v1 submitted 16 April, 2017;
originally announced April 2017.
-
Truncated Variational EM for Semi-Supervised Neural Simpletrons
Authors:
Dennis Forster,
Jörg Lücke
Abstract:
Inference and learning for probabilistic generative networks is often very challenging and typically prevents scalability to as large networks as used for deep discriminative approaches. To obtain efficiently trainable, large-scale and well performing generative networks for semi-supervised learning, we here combine two recent developments: a neural network reformulation of hierarchical Poisson mi…
▽ More
Inference and learning for probabilistic generative networks is often very challenging and typically prevents scalability to as large networks as used for deep discriminative approaches. To obtain efficiently trainable, large-scale and well performing generative networks for semi-supervised learning, we here combine two recent developments: a neural network reformulation of hierarchical Poisson mixtures (Neural Simpletrons), and a novel truncated variational EM approach (TV-EM). TV-EM provides theoretical guarantees for learning in generative networks, and its application to Neural Simpletrons results in particularly compact, yet approximately optimal, modifications of learning equations. If applied to standard benchmarks, we empirically find, that learning converges in fewer EM iterations, that the complexity per EM iteration is reduced, and that final likelihood values are higher on average. For the task of classification on data sets with few labels, learning improvements result in consistently lower error rates if compared to applications without truncation. Experiments on the MNIST data set herein allow for comparison to standard and state-of-the-art models in the semi-supervised setting. Further experiments on the NIST SD19 data set show the scalability of the approach when a manifold of additional unlabeled data is available.
△ Less
Submitted 7 February, 2017;
originally announced February 2017.
-
Truncated Variational Expectation Maximization
Authors:
Jörg Lücke
Abstract:
We derive a novel variational expectation maximization approach based on truncated posterior distributions. Truncated distributions are proportional to exact posteriors within subsets of a discrete state space and equal zero otherwise. The treatment of the distributions' subsets as variational parameters distinguishes the approach from previous variational approaches. The specific structure of tru…
▽ More
We derive a novel variational expectation maximization approach based on truncated posterior distributions. Truncated distributions are proportional to exact posteriors within subsets of a discrete state space and equal zero otherwise. The treatment of the distributions' subsets as variational parameters distinguishes the approach from previous variational approaches. The specific structure of truncated distributions allows for deriving novel and mathematically grounded results, which in turn can be used to formulate novel efficient algorithms to optimize the parameters of probabilistic generative models. Most centrally, we find the variational lower bounds that correspond to truncated distributions to be given by very concise and efficiently computable expressions, while update equations for model parameters remain in their standard form. Based on these findings, we show how efficient and easily applicable meta-algorithms can be formulated that guarantee a monotonic increase of the variational bound. Example applications of the here derived framework provide novel theoretical results and learning procedures for latent variable models as well as mixture models. Furthermore, we show that truncated variation EM naturally interpolates between standard EM with full posteriors and EM based on the maximum a-posteriori state (MAP). The approach can, therefore, be regarded as a generalization of the popular `hard EM' approach towards a similarly efficient method which can capture more of the true posterior structure.
△ Less
Submitted 11 July, 2019; v1 submitted 10 October, 2016;
originally announced October 2016.
-
Boundary-reducing surgeries and bridge number
Authors:
Kenneth L. Baker,
R. Sean Bowman,
John Luecke
Abstract:
Let $M$ be a $3$--dimensional handlebody of genus $g$. This paper gives examples of hyperbolic knots in $M$ with arbitrarily large genus $g$ bridge number which admit Dehn surgeries which are boundary-reducible manifolds.
Let $M$ be a $3$--dimensional handlebody of genus $g$. This paper gives examples of hyperbolic knots in $M$ with arbitrarily large genus $g$ bridge number which admit Dehn surgeries which are boundary-reducible manifolds.
△ Less
Submitted 31 December, 2015;
originally announced December 2015.
-
Neural Simpletrons - Minimalistic Directed Generative Networks for Learning with Few Labels
Authors:
Dennis Forster,
Abdul-Saboor Sheikh,
Jörg Lücke
Abstract:
Classifiers for the semi-supervised setting often combine strong supervised models with additional learning objectives to make use of unlabeled data. This results in powerful though very complex models that are hard to train and that demand additional labels for optimal parameter tuning, which are often not given when labeled data is very sparse. We here study a minimalistic multi-layer generative…
▽ More
Classifiers for the semi-supervised setting often combine strong supervised models with additional learning objectives to make use of unlabeled data. This results in powerful though very complex models that are hard to train and that demand additional labels for optimal parameter tuning, which are often not given when labeled data is very sparse. We here study a minimalistic multi-layer generative neural network for semi-supervised learning in a form and setting as similar to standard discriminative networks as possible. Based on normalized Poisson mixtures, we derive compact and local learning and neural activation rules. Learning and inference in the network can be scaled using standard deep learning tools for parallelized GPU implementation. With the single objective of likelihood optimization, both labeled and unlabeled data are naturally incorporated into learning. Empirical evaluations on standard benchmarks show, that for datasets with few labels the derived minimalistic network improves on all classical deep learning approaches and is competitive with their recent variants without the need of additional labels for parameter tuning. Furthermore, we find that the studied network is the best performing monolithic (`non-hybrid') system for few labels, and that it can be applied in the limit of very few labels, where no other system has been reported to operate so far.
△ Less
Submitted 18 November, 2016; v1 submitted 28 June, 2015;
originally announced June 2015.
-
GP-select: Accelerating EM using adaptive subspace preselection
Authors:
Jacquelyn A. Shelton,
Jan Gasthaus,
Zhenwen Dai,
Joerg Luecke,
Arthur Gretton
Abstract:
We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large. The approach is based on iterative latent variable preselection, where we alternate between learning a 'selection function' to reveal the relevant latent variables, and use this to obtain a compact approximation of the posterior distribution for EM; this can…
▽ More
We propose a nonparametric procedure to achieve fast inference in generative graphical models when the number of latent states is very large. The approach is based on iterative latent variable preselection, where we alternate between learning a 'selection function' to reveal the relevant latent variables, and use this to obtain a compact approximation of the posterior distribution for EM; this can make inference possible where the number of possible latent states is e.g. exponential in the number of latent variables, whereas an exact approach would be computationally unfeasible. We learn the selection function entirely from the observed data and current EM state via Gaussian process regression. This is by contrast with earlier approaches, where selection functions were manually-designed for each problem setting. We show that our approach performs as well as these bespoke selection functions on a wide variety of inference problems: in particular, for the challenging case of a hierarchical model for object localization with occlusion, we achieve results that match a customized state-of-the-art selection method, at a far lower computational cost.
△ Less
Submitted 17 July, 2016; v1 submitted 10 December, 2014;
originally announced December 2014.
-
Infinitely many knots admitting the same integer surgery and a 4-dimensional extension
Authors:
Tetsuya Abe,
In Dae Jong,
John Luecke,
John Osoinach
Abstract:
We prove that for any integer $n$ there exist infinitely many different knots in $S^3$ such that $n$-surgery on those knots yields the same 3-manifold. In particular, when $|n|=1$ homology spheres arise from these surgeries. This answers Problem 3.6(D) on the Kirby problem list. We construct two families of examples, the first by a method of twisting along an annulus and the second by a generaliza…
▽ More
We prove that for any integer $n$ there exist infinitely many different knots in $S^3$ such that $n$-surgery on those knots yields the same 3-manifold. In particular, when $|n|=1$ homology spheres arise from these surgeries. This answers Problem 3.6(D) on the Kirby problem list. We construct two families of examples, the first by a method of twisting along an annulus and the second by a generalization of this procedure. The latter family also solves a stronger version of Problem 3.6(D), that for any integer $n$, there exist infinitely many mutually distinct knots such that 2-handle addition along each with framing $n$ yields the same 4-manifold.
△ Less
Submitted 19 February, 2015; v1 submitted 16 September, 2014;
originally announced September 2014.
-
Infinitely many knots admitting the same integer surgery
Authors:
John Luecke,
John Osoinach
Abstract:
The construction of knots via annular twisting has been used to create families of knots yielding the same manifold via Dehn surgery. Prior examples have all involved Dehn surgery where the surgery slope is an integral multiple of 2. In this note we prove that for any integer $n$ there exist infinitely many different knots in $S^3$ such that $n$-surgery on those knots yields the same manifold. In…
▽ More
The construction of knots via annular twisting has been used to create families of knots yielding the same manifold via Dehn surgery. Prior examples have all involved Dehn surgery where the surgery slope is an integral multiple of 2. In this note we prove that for any integer $n$ there exist infinitely many different knots in $S^3$ such that $n$-surgery on those knots yields the same manifold. In particular, when $|n|=1$ homology spheres arise from these surgeries. In addition, when $n \neq 0$ the bridge numbers of the knots constructed tend to infinity as the number of twists along the annulus increases.
△ Less
Submitted 6 July, 2014;
originally announced July 2014.
-
Bridge number and integral Dehn surgery
Authors:
Ken Baker,
Cameron Gordon,
John Luecke
Abstract:
In a 3-manifold M, let K be a knot and R be an annulus which meets K transversely. We define the notion of the pair (R,K) being caught by a surface Q in the exterior of the link given by K and the boundary curves of R. For a caught pair (R,K), we consider the knot K^n gotten by twisting K n times along R and give a lower bound on the bridge number of K^n with respect to Heegaard splittings of M --…
▽ More
In a 3-manifold M, let K be a knot and R be an annulus which meets K transversely. We define the notion of the pair (R,K) being caught by a surface Q in the exterior of the link given by K and the boundary curves of R. For a caught pair (R,K), we consider the knot K^n gotten by twisting K n times along R and give a lower bound on the bridge number of K^n with respect to Heegaard splittings of M -- as a function of n, the genus of the splitting, and the catching surface Q. As a result, the bridge number of K^n tends to infinity with n. In application, we look at a family of knots K^n found by Teragaito that live in a small Seifert fiber space M and where each K^n admits a Dehn surgery giving the 3-sphere. We show that the bridge number of K^n with respect to any genus 2 Heegaard splitting of M tends to infinity with n. This contrasts with other work of the authors as well as with the conjectured picture for knots in lens spaces that admit Dehn surgeries giving the 3-sphere.
△ Less
Submitted 27 March, 2013;
originally announced March 2013.
-
A Truncated EM Approach for Spike-and-Slab Sparse Coding
Authors:
Abdul-Saboor Sheikh,
Jacquelyn A. Shelton,
Jörg Lücke
Abstract:
We study inference and learning based on a sparse coding model with `spike-and-slab' prior. As in standard sparse coding, the model used assumes independent latent sources that linearly combine to generate data points. However, instead of using a standard sparse prior such as a Laplace distribution, we study the application of a more flexible `spike-and-slab' distribution which models the absence…
▽ More
We study inference and learning based on a sparse coding model with `spike-and-slab' prior. As in standard sparse coding, the model used assumes independent latent sources that linearly combine to generate data points. However, instead of using a standard sparse prior such as a Laplace distribution, we study the application of a more flexible `spike-and-slab' distribution which models the absence or presence of a source's contribution independently of its strength if it contributes. We investigate two approaches to optimize the parameters of spike-and-slab sparse coding: a novel truncated EM approach and, for comparison, an approach based on standard factored variational distributions. The truncated approach can be regarded as a variational approach with truncated posteriors as variational distributions. In applications to source separation we find that both approaches improve the state-of-the-art in a number of standard benchmarks, which argues for the use of `spike-and-slab' priors for the corresponding data domains. Furthermore, we find that the truncated EM approach improves on the standard factored approach in source separation tasks$-$which hints to biases introduced by assuming posterior independence in the factored variational approach. Likewise, on a standard benchmark for image denoising, we find that the truncated EM approach improves on the factored variational approach. While the performance of the factored approach saturates with increasing numbers of hidden dimensions, the performance of the truncated approach improves the state-of-the-art for higher noise levels.
△ Less
Submitted 3 September, 2014; v1 submitted 15 November, 2012;
originally announced November 2012.
-
Obtaining genus 2 Heegaard splittings from Dehn surgery
Authors:
Kenneth L Baker,
Cameron Gordon,
John Luecke
Abstract:
Let K' be a hyperbolic knot in S^3 and suppose that some Dehn surgery on K' with distance at least 3 from the meridian yields a 3-manifold M of Heegaard genus 2. We show that if M does not contain an embedded Dyck's surface (the closed non-orientable surface of Euler characteristic -1), then the knot dual to the surgery is either 0-bridge or 1-bridge with respect to a genus 2 Heegaard splitting of…
▽ More
Let K' be a hyperbolic knot in S^3 and suppose that some Dehn surgery on K' with distance at least 3 from the meridian yields a 3-manifold M of Heegaard genus 2. We show that if M does not contain an embedded Dyck's surface (the closed non-orientable surface of Euler characteristic -1), then the knot dual to the surgery is either 0-bridge or 1-bridge with respect to a genus 2 Heegaard splitting of M. In the case M does contain an embedded Dyck's surface, we obtain similar results. As a corollary, if M does not contain an incompressible genus 2 surface, then the tunnel number of K' is at most 2.
△ Less
Submitted 30 April, 2012;
originally announced May 2012.
-
Bridge number, Heegaard genus and non-integral Dehn surgery
Authors:
Kenneth L. Baker,
Cameron Gordon,
John Luecke
Abstract:
We show there exists a linear function w: N->N with the following property. Let K be a hyperbolic knot in a hyperbolic 3-manifold M admitting a non-longitudinal S^3 surgery. If K is put into thin position with respect to a strongly irreducible, genus g Heegaard splitting of M then K intersects a thick level at most 2w(g) times. Typically, this shows that the bridge number of K with respect to this…
▽ More
We show there exists a linear function w: N->N with the following property. Let K be a hyperbolic knot in a hyperbolic 3-manifold M admitting a non-longitudinal S^3 surgery. If K is put into thin position with respect to a strongly irreducible, genus g Heegaard splitting of M then K intersects a thick level at most 2w(g) times. Typically, this shows that the bridge number of K with respect to this Heegaard splitting is at most w(g), and the tunnel number of K is at most w(g) + g-1.
△ Less
Submitted 18 November, 2013; v1 submitted 1 February, 2012;
originally announced February 2012.
-
Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach
Authors:
Zhenwen Dai,
Jörg Lücke
Abstract:
We study the task of cleaning scanned text documents that are strongly corrupted by dirt such as manual line strokes, spilled ink etc. We aim at autonomously removing dirt from a single letter-size page based only on the information the page contains. Our approach, therefore, has to learn character representations without supervision and requires a mechanism to distinguish learned representations…
▽ More
We study the task of cleaning scanned text documents that are strongly corrupted by dirt such as manual line strokes, spilled ink etc. We aim at autonomously removing dirt from a single letter-size page based only on the information the page contains. Our approach, therefore, has to learn character representations without supervision and requires a mechanism to distinguish learned representations from irregular patterns. To learn character representations, we use a probabilistic generative model parameterizing pattern features, feature variances, the features' planar arrangements, and pattern frequencies. The latent variables of the model describe pattern class, pattern position, and the presence or absence of individual pattern features. The model parameters are optimized using a novel variational EM approximation. After learning, the parameters represent, independently of their absolute position, planar feature arrangements and their variances. A quality measure defined based on the learned representation then allows for an autonomous discrimination between regular character patterns and the irregular patterns making up the dirt. The irregular patterns can thus be removed to clean the document. For a full Latin alphabet we found that a single page does not contain sufficiently many character examples. However, even if heavily corrupted by dirt, we show that a page containing a lower number of character types can efficiently and autonomously be cleaned solely based on the structural regularity of the characters it contains. In different examples using characters from different alphabets, we demonstrate generality of the approach and discuss its implications for future developments.
△ Less
Submitted 2 July, 2012; v1 submitted 12 January, 2012;
originally announced January 2012.
-
Closed-form EM for Sparse Coding and its Application to Source Separation
Authors:
Jörg Lücke,
Abdul-Saboor Sheikh
Abstract:
We define and discuss the first sparse coding algorithm based on closed-form EM updates and continuous latent variables. The underlying generative model consists of a standard `spike-and-slab' prior and a Gaussian noise model. Closed-form solutions for E- and M-step equations are derived by generalizing probabilistic PCA. The resulting EM algorithm can take all modes of a potentially multi-modal p…
▽ More
We define and discuss the first sparse coding algorithm based on closed-form EM updates and continuous latent variables. The underlying generative model consists of a standard `spike-and-slab' prior and a Gaussian noise model. Closed-form solutions for E- and M-step equations are derived by generalizing probabilistic PCA. The resulting EM algorithm can take all modes of a potentially multi-modal posterior into account. The computational cost of the algorithm scales exponentially with the number of hidden dimensions. However, with current computational resources, it is still possible to efficiently learn model parameters for medium-scale problems. Thus the model can be applied to the typical range of source separation tasks. In numerical experiments on artificial data we verify likelihood maximization and show that the derived algorithm recovers the sparse directions of standard sparse coding distributions. On source separation benchmarks comprised of realistic data we show that the algorithm is competitive with other recent methods.
△ Less
Submitted 2 March, 2012; v1 submitted 12 May, 2011;
originally announced May 2011.
-
Tangle analysis of difference topology experiments: applications to a Mu protein-DNA complex
Authors:
Isabel K. Darcy,
John Luecke,
Mariel Vazquez
Abstract:
We develop topological methods for analyzing difference topology experiments involving 3-string tangles. Difference topology is a novel technique used to unveil the structure of stable protein-DNA complexes involving two or more DNA segments. We analyze such experiments for the Mu protein-DNA complex. We characterize the solutions to the corresponding tangle equations by certain knotted graphs.…
▽ More
We develop topological methods for analyzing difference topology experiments involving 3-string tangles. Difference topology is a novel technique used to unveil the structure of stable protein-DNA complexes involving two or more DNA segments. We analyze such experiments for the Mu protein-DNA complex. We characterize the solutions to the corresponding tangle equations by certain knotted graphs. By investigating planarity conditions on these graphs we show that there is a unique biologically relevant solution. That is, we show there is a unique rational tangle solution, which is also the unique solution with small crossing number.
△ Less
Submitted 22 October, 2007;
originally announced October 2007.
-
Knots with unknotting number 1 and essential Conway spheres
Authors:
Cameron McA Gordon,
John Luecke
Abstract:
For a knot K in S^3, let T(K) be the characteristic toric sub-orbifold of the orbifold (S^3,K) as defined by Bonahon and Siebenmann. If K has unknotting number one, we show that an unknotting arc for K can always be found which is disjoint from T(K), unless either K is an EM-knot (of Eudave-Munoz) or (S^3,K) contains an EM-tangle after cutting along T(K). As a consequence, we describe exactly wh…
▽ More
For a knot K in S^3, let T(K) be the characteristic toric sub-orbifold of the orbifold (S^3,K) as defined by Bonahon and Siebenmann. If K has unknotting number one, we show that an unknotting arc for K can always be found which is disjoint from T(K), unless either K is an EM-knot (of Eudave-Munoz) or (S^3,K) contains an EM-tangle after cutting along T(K). As a consequence, we describe exactly which large algebraic knots (ie algebraic in the sense of Conway and containing an essential Conway sphere) have unknotting number one and give a practical procedure for deciding this (as well as determining an unknotting crossing). Among the knots up to 11 crossings in Conway's table which are obviously large algebraic by virtue of their description in the Conway notation, we determine which have unknotting number one. Combined with the work of Ozsvath-Szabo, this determines the knots with 10 or fewer crossings that have unknotting number one. We show that an alternating, large algebraic knot with unknotting number one can always be unknotted in an alternating diagram.
As part of the above work, we determine the hyperbolic knots in a solid torus which admit a non-integral, toroidal Dehn surgery. Finally, we show that having unknotting number one is invariant under mutation.
△ Less
Submitted 29 June, 2009; v1 submitted 11 January, 2006;
originally announced January 2006.
-
Strongly n-trivial Knots
Authors:
Hugh Howards,
John Luecke
Abstract:
A knot k is called ``strongly (n-1)-trivial.'' if there exists a projection of k, such that one can choose n crossings of the projection with the property that making the crossing changes corresponding to any of the $2^{n}-1$ nontrivial combinations of the selected crossings turns the original knot into the unknot. We prove that given any non-trivial knot k of genus g, k fails to be strongly n-t…
▽ More
A knot k is called ``strongly (n-1)-trivial.'' if there exists a projection of k, such that one can choose n crossings of the projection with the property that making the crossing changes corresponding to any of the $2^{n}-1$ nontrivial combinations of the selected crossings turns the original knot into the unknot. We prove that given any non-trivial knot k of genus g, k fails to be strongly n-trivial for all $n, n \geq 3g-1$.
△ Less
Submitted 28 April, 2000;
originally announced April 2000.
-
Toroidal and Boundary-Reducing Dehn Fillings
Authors:
C. McA. Gordon,
J. Luecke
Abstract:
Let M be a simple 3-manifold with a toral boundary component partial_0 M. If Dehn filling M along partial_0 M one way produces a toroidal manifold and Dehn filling M along partial_0 M another way produces a boundary-reducible manifold, then we show that the absolute value of the intersection number on partial_0 M of the two filling slopes is at most two. In the special case that the boundary-red…
▽ More
Let M be a simple 3-manifold with a toral boundary component partial_0 M. If Dehn filling M along partial_0 M one way produces a toroidal manifold and Dehn filling M along partial_0 M another way produces a boundary-reducible manifold, then we show that the absolute value of the intersection number on partial_0 M of the two filling slopes is at most two. In the special case that the boundary-reducing filling is actually a solid torus and the intersection number between the filling slopes is two, more is said to describe the toroidal filling.
△ Less
Submitted 27 January, 1998;
originally announced January 1998.