Search | arXiv e-print repository

The distance function to a finite set is a topological Morse function

Abstract: In this short note, we show that the distance function to any finite set $X\subset \mathbb{R}^n$ is a topological Morse function, regardless of whether $X$ is in general position. We also precisely characterize its topological critical points and their indices, and relate them to the differential critical points of the function. In this short note, we show that the distance function to any finite set $X\subset \mathbb{R}^n$ is a topological Morse function, regardless of whether $X$ is in general position. We also precisely characterize its topological critical points and their indices, and relate them to the differential critical points of the function. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.10645 [pdf, other]

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models

Authors: Louis Abraham, Charles Arnal, Antoine Marie

Abstract: Large Language Models have recently been applied to text annotation tasks from social sciences, equalling or surpassing the performance of human workers at a fraction of the cost. However, no inquiry has yet been made on the impact of prompt selection on labelling accuracy. In this study, we show that performance greatly varies between prompts, and we apply the method of automatic prompt optimizat… ▽ More Large Language Models have recently been applied to text annotation tasks from social sciences, equalling or surpassing the performance of human workers at a fraction of the cost. However, no inquiry has yet been made on the impact of prompt selection on labelling accuracy. In this study, we show that performance greatly varies between prompts, and we apply the method of automatic prompt optimization to systematically craft high quality prompts. We also provide the community with a simple, browser-based implementation of the method at https://prompt-ultra.github.io/ . △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2406.14919 [pdf, other]

Wasserstein convergence of Čech persistence diagrams for samplings of submanifolds

Authors: Charles Arnal, David Cohen-Steiner, Vincent Divol

Abstract: Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifo… ▽ More Čech Persistence diagrams (PDs) are topological descriptors routinely used to capture the geometry of complex datasets. They are commonly compared using the Wasserstein distances $OT_{p}$; however, the extent to which PDs are stable with respect to these metrics remains poorly understood. We partially close this gap by focusing on the case where datasets are sampled on an $m$-dimensional submanifold of $\mathbb{R}^{d}$. Under this manifold hypothesis, we show that convergence with respect to the $OT_{p}$ metric happens exactly when $p\gt m$. We also provide improvements upon the bottleneck stability theorem in this case and prove new laws of large numbers for the total $α$-persistence of PDs. Finally, we show how these theoretical findings shed new light on the behavior of the feature maps on the space of PDs that are used in ML-oriented applications of Topological Data Analysis. △ Less

Submitted 12 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

MSC Class: 55N31; 62R40

arXiv:2406.02128 [pdf, other]

Iteration Head: A Mechanistic Study of Chain-of-Thought

Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe

Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particul… ▽ More Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2402.13079 [pdf, ps, other]

Mode Estimation with Partial Feedback

Authors: Charles Arnal, Vivien Cabannes, Vianney Perchet

Abstract: The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments. These new learning pipelines call for new theoretical frameworks. In this paper, we formalize core aspects of weakly supervised and active learning with a simple problem: the estimation of the mode of a distribution using partial feedback. We show how entropy coding allows for… ▽ More The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments. These new learning pipelines call for new theoretical frameworks. In this paper, we formalize core aspects of weakly supervised and active learning with a simple problem: the estimation of the mode of a distribution using partial feedback. We show how entropy coding allows for optimal information acquisition from partial feedback, develop coarse sufficient statistics for mode identification, and adapt bandit algorithms to our new setting. Finally, we combine those contributions into a statistically and computationally efficient solution to our problem. △ Less

Submitted 20 February, 2024; originally announced February 2024.

MSC Class: 62L05; 62B86; 62D10; 62B10

arXiv:2311.13845 [pdf, other]

Touring sampling with pushforward maps

Authors: Vivien Cabannes, Charles Arnal

Abstract: The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it… ▽ More The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it might prove useful to overcome some of the current challenges in sampling with diffusion models, such as long inference time due to diffusion simulation, or the lack of diversity in generated samples. △ Less

Submitted 20 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: 5 pages

Journal ref: ICASSP, 2024

arXiv:2305.13271 [pdf, other]

MAGDiff: Covariate Data Set Shift Detection via Activation Graphs of Deep Neural Networks

Authors: Charles Arnal, Felix Hensel, Mathieu Carrière, Théo Lacombe, Hiroaki Kurihara, Yuichi Ike, Frédéric Chazal

Abstract: Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In this article, we propose a new family of representations, called MAGD… ▽ More Despite their successful application to a variety of tasks, neural networks remain limited, like other machine learning methods, by their sensitivity to shifts in the data: their performance can be severely impacted by differences in distribution between the data on which they were trained and that on which they are deployed. In this article, we propose a new family of representations, called MAGDiff, that we extract from any given neural network classifier and that allows for efficient covariate data shift detection without the need to train a new model dedicated to this task. These representations are computed by comparing the activation graphs of the neural network for samples belonging to the training distribution and to the target distribution, and yield powerful data- and task-adapted statistics for the two-sample tests commonly used for data set shift detection. We demonstrate this empirically by measuring the statistical powers of two-sample Kolmogorov-Smirnov (KS) tests on several different data sets and shift types, and showing that our novel representations induce significant improvements over a state-of-the-art baseline relying on the network output. △ Less

Submitted 12 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Showing 1–7 of 7 results for author: Arnal, C