-
Diffeomorphic interpolation for efficient persistence-based topological optimization
Authors:
Mathieu Carriere,
Marc Theveneau,
Théo Lacombe
Abstract:
Topological Data Analysis (TDA) provides a pipeline to extract quantitative topological descriptors from structured objects. This enables the definition of topological loss functions, which assert to what extent a given object exhibits some topological properties. These losses can then be used to perform topological optimizationvia gradient descent routines. While theoretically sounded, topologica…
▽ More
Topological Data Analysis (TDA) provides a pipeline to extract quantitative topological descriptors from structured objects. This enables the definition of topological loss functions, which assert to what extent a given object exhibits some topological properties. These losses can then be used to perform topological optimizationvia gradient descent routines. While theoretically sounded, topological optimization faces an important challenge: gradients tend to be extremely sparse, in the sense that the loss function typically depends on only very few coordinates of the input object, yielding dramatically slow optimization schemes in practice.Focusing on the central case of topological optimization for point clouds, we propose in this work to overcome this limitation using diffeomorphic interpolation, turning sparse gradients into smooth vector fields defined on the whole space, with quantifiable Lipschitz constants. In particular, we show that our approach combines efficiently with subsampling techniques routinely used in TDA, as the diffeomorphism derived from the gradient computed on a subsample can be used to update the coordinates of the full input object, allowing us to perform topological optimization on point clouds at an unprecedented scale. Finally, we also showcase the relevance of our approach for black-box autoencoder (AE) regularization, where we aim at enforcing topological priors on the latent spaces associated to fixed, pre-trained, black-box AE models, and where we show thatlearning a diffeomorphic flow can be done once and then re-applied to new data in linear time (while vanilla topological optimization has to be re-run from scratch). Moreover, reverting the flow allows us to generate data by sampling the topologically-optimized latent space directly, yielding better interpretability of the model.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Topological Node2vec: Enhanced Graph Embedding via Persistent Homology
Authors:
Yasuaki Hiraoka,
Yusuke Imoto,
Killian Meehan,
Théo Lacombe,
Toshiaki Yachimura
Abstract:
Node2vec is a graph embedding method that learns a vector representation for each node of a weighted graph while seeking to preserve relative proximity and global structure. Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph. To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persisten…
▽ More
Node2vec is a graph embedding method that learns a vector representation for each node of a weighted graph while seeking to preserve relative proximity and global structure. Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph. To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram (PD) of the resulting embedding as closely as possible to that of the input graph. Following results in computational optimal transport, we carefully adapt entropic regularization to PD metrics, allowing us to measure the discrepancy between PDs in a differentiable way. Our modified loss function can then be minimized through gradient descent to reconstruct both the geometry and the topology of the input graph. We showcase the benefits of this approach using demonstrative synthetic examples.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks
Authors:
Charles Arnal,
Felix Hensel,
Mathieu Carrière,
Théo Lacombe,
Hiroaki Kurihara,
Yuichi Ike,
Frédéric Chazal
Abstract:
Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In this article, we propose a new family of representations, called MAGD…
▽ More
Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In this article, we propose a new family of representations, called MAGDiff, that we extract from any given neural network classifier and that allows for efficient covariate data shift detection without the need to train a new model dedicated to this task. These representations are computed by comparing the activation graphs of the neural network for samples belonging to the training distribution and to the target distribution, and yield powerful data- and task-adapted statistics for the two-sample tests commonly used for data set shift detection. We demonstrate this empirically by measuring the statistical powers of two-sample Kolmogorov-Smirnov (KS) tests on several different data sets and shift types, and showing that our novel representations induce significant improvements over a state-of-the-art baseline relying on the network output.
△ Less
Submitted 12 May, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds
Authors:
Thibault de Surrel,
Felix Hensel,
Mathieu Carrière,
Théo Lacombe,
Yuichi Ike,
Hiroaki Kurihara,
Marc Glisse,
Frédéric Chazal
Abstract:
The use of topological descriptors in modern machine learning applications, such as Persistence Diagrams (PDs) arising from Topological Data Analysis (TDA), has shown great potential in various domains. However, their practical use in applications is often hindered by two major limitations: the computational complexity required to compute such descriptors exactly, and their sensitivity to even low…
▽ More
The use of topological descriptors in modern machine learning applications, such as Persistence Diagrams (PDs) arising from Topological Data Analysis (TDA), has shown great potential in various domains. However, their practical use in applications is often hindered by two major limitations: the computational complexity required to compute such descriptors exactly, and their sensitivity to even low-level proportions of outliers. In this work, we propose to bypass these two burdens in a data-driven setting by entrusting the estimation of (vectorization of) PDs built on top of point clouds to a neural network architecture that we call RipsNet. Once trained on a given data set, RipsNet can estimate topological descriptors on test data very efficiently with generalization capacity. Furthermore, we prove that RipsNet is robust to input perturbations in terms of the 1-Wasserstein distance, a major improvement over the standard computation of PDs that only enjoys Hausdorff stability, yielding RipsNet to substantially outperform exactly-computed PDs in noisy settings. We showcase the use of RipsNet on both synthetic and real-world data. Our open-source implementation is publicly available at https://github.com/hensel-f/ripsnet and will be included in the Gudhi library.
△ Less
Submitted 4 February, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
A Gradient Sampling Algorithm for Stratified Maps with Applications to Topological Data Analysis
Authors:
Jacob Leygonie,
Mathieu Carrière,
Théo Lacombe,
Steve Oudot
Abstract:
We introduce a novel gradient descent algorithm extending the well-known Gradient Sampling methodology to the class of stratifiably smooth objective functions, which are defined as locally Lipschitz functions that are smooth on some regular pieces-called the strata-of the ambient Euclidean space. For this class of functions, our algorithm achieves a sub-linear convergence rate. We then apply our m…
▽ More
We introduce a novel gradient descent algorithm extending the well-known Gradient Sampling methodology to the class of stratifiably smooth objective functions, which are defined as locally Lipschitz functions that are smooth on some regular pieces-called the strata-of the ambient Euclidean space. For this class of functions, our algorithm achieves a sub-linear convergence rate. We then apply our method to objective functions based on the (extended) persistent homology map computed over lower-star filters, which is a central tool of Topological Data Analysis. For this, we propose an efficient exploration of the corresponding stratification by using the Cayley graph of the permutation group. Finally, we provide benchmark and novel topological optimization problems, in order to demonstrate the utility and applicability of our framework.
△ Less
Submitted 3 September, 2021; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Topological Uncertainty: Monitoring trained neural networks through persistence of activation graphs
Authors:
Théo Lacombe,
Yuichi Ike,
Mathieu Carriere,
Frédéric Chazal,
Marc Glisse,
Yuhei Umeda
Abstract:
Although neural networks are capable of reaching astonishing performances on a wide variety of contexts, properly training networks on complicated tasks requires expertise and can be expensive from a computational perspective. In industrial applications, data coming from an open-world setting might widely differ from the benchmark datasets on which a network was trained. Being able to monitor the…
▽ More
Although neural networks are capable of reaching astonishing performances on a wide variety of contexts, properly training networks on complicated tasks requires expertise and can be expensive from a computational perspective. In industrial applications, data coming from an open-world setting might widely differ from the benchmark datasets on which a network was trained. Being able to monitor the presence of such variations without retraining the network is of crucial importance. In this article, we develop a method to monitor trained neural networks based on the topological properties of their activation graphs. To each new observation, we assign a Topological Uncertainty, a score that aims to assess the reliability of the predictions by investigating the whole network instead of its final layer only, as typically done by practitioners. Our approach entirely works at a post-training level and does not require any assumption on the network architecture, optimization scheme, nor the use of data augmentation or auxiliary datasets; and can be faithfully applied on a large range of network architectures and data types. We showcase experimentally the potential of Topological Uncertainty in the context of trained network selection, Out-Of-Distribution detection, and shift-detection, both on synthetic and real datasets of images and graphs.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Modal features for image texture classification
Authors:
Thomas Lacombe,
Hugues Favreliere,
Maurice Pillet
Abstract:
Feature extraction is a key step in image processing for pattern recognition and machine learning processes. Its purpose lies in reducing the dimensionality of the input data through the computing of features which accurately describe the original information. In this article, a new feature extraction method based on Discrete Modal Decomposition (DMD) is introduced, to extend the group of space an…
▽ More
Feature extraction is a key step in image processing for pattern recognition and machine learning processes. Its purpose lies in reducing the dimensionality of the input data through the computing of features which accurately describe the original information. In this article, a new feature extraction method based on Discrete Modal Decomposition (DMD) is introduced, to extend the group of space and frequency based features. These new features are called modal features. Initially aiming to decompose a signal into a modal basis built from a vibration mechanics problem, the DMD projection is applied to images in order to extract modal features with two approaches. The first one, called full scale DMD, consists in exploiting directly the decomposition resulting coordinates as features. The second one, called filtering DMD, consists in using the DMD modes as filters to obtain features through a local transformation process. Experiments are performed on image texture classification tasks including several widely used data bases, compared to several classic feature extraction methods. We show that the DMD approach achieves good classification performances, comparable to the state of the art techniques, with a lower extraction time.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures
Authors:
Mathieu Carrière,
Frédéric Chazal,
Yuichi Ike,
Théo Lacombe,
Martin Royer,
Yuhei Umeda
Abstract:
Persistence diagrams, the most common descriptors of Topological Data Analysis, encode topological properties of data and have already proved pivotal in many different applications of data science. However, since the (metric) space of persistence diagrams is not Hilbert, they end up being difficult inputs for most Machine Learning techniques. To address this concern, several vectorization methods…
▽ More
Persistence diagrams, the most common descriptors of Topological Data Analysis, encode topological properties of data and have already proved pivotal in many different applications of data science. However, since the (metric) space of persistence diagrams is not Hilbert, they end up being difficult inputs for most Machine Learning techniques. To address this concern, several vectorization methods have been put forward that embed persistence diagrams into either finite-dimensional Euclidean space or (implicit) infinite dimensional Hilbert space with kernels. In this work, we focus on persistence diagrams built on top of graphs. Relying on extended persistence theory and the so-called heat kernel signature, we show how graphs can be encoded by (extended) persistence diagrams in a provably stable way. We then propose a general and versatile framework for learning vectorizations of persistence diagrams, which encompasses most of the vectorization techniques used in the literature. We finally showcase the experimental strength of our setup by achieving competitive scores on classification tasks on real-life graph datasets.
△ Less
Submitted 8 March, 2020; v1 submitted 19 April, 2019;
originally announced April 2019.
-
Generative Adversarial Networks for geometric surfaces prediction in injection molding
Authors:
Pierre Nagorny,
Thomas Lacombe,
Hugues Favreliere,
Maurice Pillet,
Eric Pairel,
Ronan Le Goff,
Marlene Wali,
Jerome Loureaux,
Patrice Kiener
Abstract:
Geometrical and appearance quality requirements set the limits of the current industrial performance in injection molding. To guarantee the product's quality, it is necessary to adjust the process settings in a closed loop. Those adjustments cannot rely on the final quality because a part takes days to be geometrically stable. Thus, the final part geometry must be predicted from measurements on ho…
▽ More
Geometrical and appearance quality requirements set the limits of the current industrial performance in injection molding. To guarantee the product's quality, it is necessary to adjust the process settings in a closed loop. Those adjustments cannot rely on the final quality because a part takes days to be geometrically stable. Thus, the final part geometry must be predicted from measurements on hot parts. In this paper, we use recent success of Generative Adversarial Networks (GAN) with the pix2pix network architecture to predict the final part geometry, using only hot parts thermographic images, measured right after production. Our dataset is really small, and the GAN learns to translate thermography to geometry. We firstly study prediction performances using different image similarity comparison algorithms. Moreover, we introduce the innovative use of Discrete Modal Decomposition (DMD) to analyze network predictions. The DMD is a geometrical parameterization technique using a modal space projection to geometrically describe surfaces. We study GAN performances to retrieve geometrical parameterization of surfaces.
△ Less
Submitted 29 January, 2019;
originally announced January 2019.
-
Understanding the Topology and the Geometry of the Space of Persistence Diagrams via Optimal Partial Transport
Authors:
Vincent Divol,
Théo Lacombe
Abstract:
Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal…
▽ More
Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g.\ persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.
△ Less
Submitted 28 May, 2024; v1 submitted 10 January, 2019;
originally announced January 2019.
-
Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport
Authors:
Théo Lacombe,
Marco Cuturi,
Steve Oudot
Abstract:
Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data. Despite several appealing properties, incorporating PDs in learning pipelines can be challenging because their natural geometry is not Hilbertian. Indeed, this was recently exemplified in a string of papers which show that the simple task of averaging a few PDs can be computationally prohibitive…
▽ More
Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data. Despite several appealing properties, incorporating PDs in learning pipelines can be challenging because their natural geometry is not Hilbertian. Indeed, this was recently exemplified in a string of papers which show that the simple task of averaging a few PDs can be computationally prohibitive. We propose in this article a tractable framework to carry out standard tasks on PDs at scale, notably evaluating distances, estimating barycenters and performing clustering. This framework builds upon a reformulation of PD metrics as optimal transport (OT) problems. Doing so, we can exploit recent computational advances: the OT problem on a planar grid, when regularized with entropy, is convex can be solved in linear time using the Sinkhorn algorithm and convolutions. This results in scalable computations that can stream on GPUs. We demonstrate the efficiency of our approach by carrying out clustering with diagrams metrics on several thousands of PDs, a scale never seen before in the literature.
△ Less
Submitted 13 November, 2018; v1 submitted 21 May, 2018;
originally announced May 2018.