-
Any Graph is a Mapper Graph
Authors:
Enrique G Alvarado,
Robin Belton,
Kang-Ju Lee,
Sourabh Palande,
Sarah Percival,
Emilie Purvine,
Sarah Tymochko
Abstract:
The Mapper algorithm is a popular tool for visualization and data exploration in topological data analysis. We investigate an inverse problem for the Mapper algorithm: Given a dataset $X$ and a graph $G$, does there exist a set of Mapper parameters such that the output Mapper graph of $X$ is isomorphic to $G$? We provide constructions that affirmatively answer this question. Our results demonstrat…
▽ More
The Mapper algorithm is a popular tool for visualization and data exploration in topological data analysis. We investigate an inverse problem for the Mapper algorithm: Given a dataset $X$ and a graph $G$, does there exist a set of Mapper parameters such that the output Mapper graph of $X$ is isomorphic to $G$? We provide constructions that affirmatively answer this question. Our results demonstrate that it is possible to engineer Mapper parameters to generate a desired graph.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Authors:
Elizabeth Fons,
Rachneet Kaur,
Soham Palande,
Zhen Zeng,
Svitlana Vyetrenko,
Tucker Balch
Abstract:
Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a compre…
▽ More
Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a comprehensive taxonomy of time series features, a critical framework that delineates various characteristics inherent in time series data. Leveraging this taxonomy, we have systematically designed and synthesized a diverse dataset of time series, embodying the different outlined features. This dataset acts as a solid foundation for assessing the proficiency of LLMs in comprehending time series. Our experiments shed light on the strengths and limitations of state-of-the-art LLMs in time series understanding, revealing which features these models readily comprehend effectively and where they falter. In addition, we uncover the sensitivity of LLMs to factors including the formatting of the data, the position of points queried within a series and the overall time series length.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
$G$-Mapper: Learning a Cover in the Mapper Construction
Authors:
Enrique Alvarado,
Robin Belton,
Emily Fischer,
Kang-Ju Lee,
Sourabh Palande,
Sarah Percival,
Emilie Purvine
Abstract:
The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cove…
▽ More
The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly fast.
△ Less
Submitted 4 March, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Sketching Merge Trees for Scientific Data Visualization
Authors:
Mingzhe Li,
Sourabh Palande,
Lin Yan,
Bei Wang
Abstract:
Merge trees are a type of topological descriptors that record the connectivity among the sublevel sets of scalar fields. They are among the most widely used topological tools in visualization. In this paper, we are interested in sketching a set of merge trees. That is, given a large set T of merge trees, we would like to find a much smaller basis set S such that each tree in T can be approximately…
▽ More
Merge trees are a type of topological descriptors that record the connectivity among the sublevel sets of scalar fields. They are among the most widely used topological tools in visualization. In this paper, we are interested in sketching a set of merge trees. That is, given a large set T of merge trees, we would like to find a much smaller basis set S such that each tree in T can be approximately reconstructed from a linear combination of merge trees in S. A set of high-dimensional vectors can be sketched via matrix sketching techniques such as principal component analysis and column subset selection. However, up until now, topological descriptors such as merge trees have not been known to be sketchable. We develop a framework for sketching a set of merge trees that combines the Gromov-Wasserstein probabilistic matching with techniques from matrix sketching. We demonstrate the applications of our framework in sketching merge trees that arise from time-varying scientific simulations. Specifically, our framework obtains a much smaller representation of a large set of merge trees for downstream analysis and visualization. It is shown to be useful in identifying good representatives and outliers with respect to a chosen basis. Finally, our work shows a promising direction of utilizing randomized linear algebra within scientific visualization.
△ Less
Submitted 30 May, 2021; v1 submitted 8 January, 2021;
originally announced January 2021.
-
TopoAct: Visually Exploring the Shape of Activations in Deep Learning
Authors:
Archit Rathore,
Nithin Chalapathi,
Sourabh Palande,
Bei Wang
Abstract:
Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e., combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we…
▽ More
Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e., combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we aim to obtain a global view of what neurons detect by studying their activations. In particular, we develop visualizations that show the shape of the activation space, the organizational principle behind neuron activations, and the relationships of these activations within a layer. Applying tools from topological data analysis, we present TopoAct, a visual exploration system to study topological summaries of activation vectors. We present exploration scenarios using TopoAct that provide valuable insights into learned representations of neural networks. We expect TopoAct to give a topological perspective that enriches the current toolbox of neural network analysis, and to provide a basis for network architecture diagnosis and data anomaly detection.
△ Less
Submitted 12 April, 2021; v1 submitted 13 December, 2019;
originally announced December 2019.
-
Spectral Sparsification of Simplicial Complexes for Clustering and Label Propagation
Authors:
Braxton Osting,
Sourabh Palande,
Bei Wang
Abstract:
As a generalization of the use of graphs to describe pairwise interactions, simplicial complexes can be used to model higher-order interactions between three or more objects in complex systems. There has been a recent surge in activity for the development of data analysis methods applicable to simplicial complexes, including techniques based on computational topology, higher-order random processes…
▽ More
As a generalization of the use of graphs to describe pairwise interactions, simplicial complexes can be used to model higher-order interactions between three or more objects in complex systems. There has been a recent surge in activity for the development of data analysis methods applicable to simplicial complexes, including techniques based on computational topology, higher-order random processes, generalized Cheeger inequalities, isoperimetric inequalities, and spectral methods. In particular, spectral learning methods (e.g. label propagation and clustering) that directly operate on simplicial complexes represent a new direction for analyzing such complex datasets.
To apply spectral learning methods to massive datasets modeled as simplicial complexes, we develop a method for sparsifying simplicial complexes that preserves the spectrum of the associated Laplacian matrices. We show that the theory of Spielman and Srivastava for the sparsification of graphs extends to simplicial complexes via the up Laplacian. In particular, we introduce a generalized effective resistance for simplices, provide an algorithm for sparsifying simplicial complexes at a fixed dimension, and give a specific version of the generalized Cheeger inequality for weighted simplicial complexes. Finally, we introduce higher-order generalizations of spectral clustering and label propagation for simplicial complexes and demonstrate via experiments the utility of the proposed spectral sparsification method for these applications.
△ Less
Submitted 1 February, 2019; v1 submitted 28 August, 2017;
originally announced August 2017.