Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Cabannes, V

.
  1. arXiv:2407.18134  [pdf, other

    cs.CV cs.LG

    $\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

    Authors: Vlad Sobal, Mark Ibrahim, Randall Balestriero, Vivien Cabannes, Diane Bouchacourt, Pietro Astolfi, Kyunghyun Cho, Yann LeCun

    Abstract: Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss - an objective matching related samples - underlies methods from self-supervised to multimodal learning. Contrastive losses, however, can be viewed more broadly as modifying a similarity graph to indicate how samples should relate in the embedding space. This view reveals a shortcoming… ▽ More

    Submitted 11 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2406.02128  [pdf, other

    cs.LG cs.AI cs.CL

    Iteration Head: A Mechanistic Study of Chain-of-Thought

    Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe

    Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particul… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2402.18724  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Associative Memories with Gradient Descent

    Authors: Vivien Cabannes, Berfin Simsek, Alberto Bietti

    Abstract: This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic grow… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  4. arXiv:2402.13079  [pdf, ps, other

    stat.ML cs.IR cs.IT cs.LG

    Mode Estimation with Partial Feedback

    Authors: Charles Arnal, Vivien Cabannes, Vianney Perchet

    Abstract: The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments. These new learning pipelines call for new theoretical frameworks. In this paper, we formalize core aspects of weakly supervised and active learning with a simple problem: the estimation of the mode of a distribution using partial feedback. We show how entropy coding allows for… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    MSC Class: 62L05; 62B86; 62D10; 62B10

  5. arXiv:2311.13845  [pdf, other

    cs.LG cs.AI stat.ML

    Touring sampling with pushforward maps

    Authors: Vivien Cabannes, Charles Arnal

    Abstract: The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it… ▽ More

    Submitted 20 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: 5 pages

    Journal ref: ICASSP, 2024

  6. arXiv:2310.02984  [pdf, other

    stat.ML cs.AI cs.CL cs.LG cs.NE

    Scaling Laws for Associative Memories

    Authors: Vivien Cabannes, Elvis Dohmatob, Alberto Bietti

    Abstract: Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the stati… ▽ More

    Submitted 20 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    ACM Class: I.2.6; G.1.6

  7. arXiv:2306.11928  [pdf, ps, other

    stat.ML cs.LG math.ST

    Open Problem: Learning with Variational Objectives on Measures

    Authors: Vivien Cabannes, Carles Domingo-Enrich

    Abstract: The theory of statistical learning has focused on variational objectives expressed on functions. In this note, we discuss motivations to write similar objectives on measures, in particular to discuss out-of-distribution generalization and weakly-supervised learning. It raises a natural question: can one cast usual statistical learning results to objectives expressed on measures? Does the resulting… ▽ More

    Submitted 16 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: IEEE Big Data, 2023

  8. arXiv:2306.00802  [pdf, other

    stat.ML cs.CL cs.LG

    Birth of a Transformer: A Memory Viewpoint

    Authors: Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, Leon Bottou

    Abstract: Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We stu… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  9. arXiv:2306.00742  [pdf, other

    cs.LG cs.AI stat.ML

    The Galerkin method beats Graph-Based Approaches for Spectral Algorithms

    Authors: Vivien Cabannes, Francis Bach

    Abstract: Historically, the machine learning community has derived spectral decompositions from graph-based approaches. We break with this approach and prove the statistical and computational superiority of the Galerkin method, which consists in restricting the study to a small set of test functions. In particular, we introduce implementation tricks to deal with differential operators in large dimensions wi… ▽ More

    Submitted 26 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Journal ref: AISTATS 2024

  10. arXiv:2305.16014  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    How many samples are needed to leverage smoothness?

    Authors: Vivien Cabannes, Stefano Vigogna

    Abstract: A core principle in statistical learning is that smoothness of target functions allows to break the curse of dimensionality. However, learning a smooth function seems to require enough samples close to one another to get meaningful estimate of high-order derivatives, which would be hard in machine learning problems where the ratio between number of data and input dimension is relatively small. By… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 34 pages, 13 figures

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: NeurIPS 2023

  11. arXiv:2303.15256  [pdf, other

    cs.LG cs.AI cs.HC

    Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

    Authors: Vivien Cabannes, Leon Bottou, Yann Lecun, Randall Balestriero

    Abstract: Self-Supervised Learning (SSL) has emerged as the solution of choice to learn transferable representations from unlabeled data. However, SSL requires to build samples that are known to be semantically akin, i.e. positive views. Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we… ▽ More

    Submitted 29 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 8 main pages, 20 totals, 10 figures

    ACM Class: I.2.6

  12. arXiv:2302.02774  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    The SSL Interplay: Augmentations, Inductive Bias, and Generalization

    Authors: Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann LeCun, Alberto Bietti

    Abstract: Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architect… ▽ More

    Submitted 1 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023

  13. On minimal variations for unsupervised representation learning

    Authors: Vivien Cabannes, Alberto Bietti, Randall Balestriero

    Abstract: Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks. It has been approached with many techniques, such as manifold learning, diffusion maps, or more recently self-supervised learning. Those techniques are arguably all based on the underlying assumption that target functions, associated with future downstream tasks, have low variations in d… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 5 pages, 1 figure; 1 table

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1-5,

  14. arXiv:2209.11629  [pdf, other

    cs.LG math.ST stat.ML

    From Weakly Supervised Learning to Active Learning

    Authors: Vivien Cabannes

    Abstract: Applied mathematics and machine computations have raised a lot of hope since the recent success of supervised learning. Many practitioners in industries have been trying to switch from their old paradigms to machine learning. Interestingly, those data scientists spend more time scrapping, annotating and cleaning data than fine-tuning models. This thesis is motivated by the following question: can… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: PhD Thesis, Ecole Normale Superieure, 2022

  15. arXiv:2205.13255  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Active Labeling: Streaming Stochastic Gradients

    Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

    Abstract: The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 38 pages (9 main pages), 9 figures

    MSC Class: 68T37 ACM Class: G.3

  16. arXiv:2205.10055  [pdf, other

    stat.ML cs.AI cs.LG

    A Case of Exponential Convergence Rates for SVM

    Authors: Vivien Cabannes, Stefano Vigogna

    Abstract: Classification is often the first problem described in introductory machine learning classes. Generalization guarantees of classification have historically been offered by Vapnik-Chervonenkis theory. Yet those guarantees are based on intractable algorithms, which has led to the theory of surrogate methods in classification. Guarantees offered by surrogate methods are based on calibration inequalit… ▽ More

    Submitted 22 May, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: 16 pages, 6 figures

    MSC Class: 68T05 ACM Class: G.3

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, 2023, PMLR 206:359-374

  17. arXiv:2102.02789  [pdf, other

    cs.LG cs.AI stat.ML

    Disambiguation of weak supervision with exponential convergence rates

    Authors: Vivien Cabannes, Francis Bach, Alessandro Rudi

    Abstract: Machine learning approached through supervised learning requires expensive annotation of data. This motivates weakly supervised learning, where data are annotated with incomplete yet discriminative information. In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets. We review a disambiguation principle to rec… ▽ More

    Submitted 15 July, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: 22 pages; 6 figures

    MSC Class: 68Q32 ACM Class: I.2.6; G.3; F.2.2

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  18. arXiv:2102.00760  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Fast rates in structured prediction

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression. Bounding the original error, between estimate and solution, by the surrogate error endows discrete problems with convergence rates already shown for continuous instances. Yet, current approaches do not leverage the fact that discrete problems are essentia… ▽ More

    Submitted 15 July, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 14 main pages, 3 main figures, 43 pages, 4 figures (with appendix)

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: Conference on Learning Theory, PMLR 134, 2021

  19. arXiv:2010.13864  [pdf, other

    cs.GR cs.AI cs.CV cs.LG

    Diptychs of human and machine perceptions

    Authors: Vivien Cabannes, Thomas Kerdreux, Louis Thiry

    Abstract: We propose visual creations that put differences in algorithms and humans \emph{perceptions} into perspective. We exploit saliency maps of neural networks and visual focus of humans to create diptychs that are reinterpretations of an original image according to both machine and human attentions. Using those diptychs as a qualitative evaluation of perception, we discuss some crucial issues of curre… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: 7 pages, 36 images

    Journal ref: creativity workshop NeurIPS 2020

  20. arXiv:2009.04324  [pdf, other

    stat.ML cs.LG

    Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

    Authors: Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

    Abstract: As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning. This is the aim of semi-supervised learning. To benefit from the access to unlabelled data, it is natural to diffuse smoothly knowledge of labelled data to unlabelled one. This induces to the use of Laplacian regularization. Yet, current i… ▽ More

    Submitted 29 November, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: 38 pages, 6 figures

    Journal ref: NeurIPS 2021

  21. arXiv:2003.00920  [pdf, other

    cs.LG cs.AI stat.ML

    Structured Prediction with Partial Labelling through the Infimum Loss

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Annotating datasets is one of the main costs in nowadays supervised learning. The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect, as partial labelling. This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one. The problem of supervised learning with partial lab… ▽ More

    Submitted 9 September, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 8 pages for main paper, 27 with main paper, 13 figures, 3 tables

    MSC Class: 68Q32 ACM Class: I.2.6; G.3

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1230-1239, 2020

  22. arXiv:1910.04386  [pdf, other

    cs.GR cs.AI cs.HC

    Dialog on a canvas with a machine

    Authors: Vivien Cabannes, Thomas Kerdreux, Louis Thiry, Tina Campana, Charly Ferrandes

    Abstract: We propose a new form of human-machine interaction. It is a pictorial game consisting of interactive rounds of creation between artists and a machine. They repetitively paint one after the other. At its rounds, the computer partially completes the drawing using machine learning algorithms, and projects its additions directly on the canvas, which the artists are free to insert or modify. Alongside… ▽ More

    Submitted 13 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Accepted for poster at creativity workshop NeurIPS 2019

    Journal ref: creativity workshop NeurIPS 2020