Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Baratin, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09357  [pdf, other

    cs.LG q-bio.BM

    Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees

    Authors: Alexia Jolicoeur-Martineau, Aristide Baratin, Kisoo Kwon, Boris Knyazev, Yan Zhang

    Abstract: Generating novel molecules is challenging, with most representations leading to generative models producing many invalid molecules. Spanning Tree-based Graph Generation (STGG) is a promising approach to ensure the generation of valid molecules, outperforming state-of-the-art SMILES and graph diffusion models for unconditional generation. In the real world, we want to be able to generate molecules… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/SamsungSAILMontreal/AnyMolGenCritic

  2. arXiv:2405.18296  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

    Authors: Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli

    Abstract: Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations. Current understanding of bias formation mostly focuses on the initial and final stages of learning, leaving a gap in knowledge regarding the transient dynamics. To address this gap, this paper explores the evolution of bias in a teacher-student s… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2403.07688  [pdf, other

    cs.LG cs.AI

    Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

    Authors: Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

    Abstract: When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity a… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2402.13368  [pdf, other

    cs.LG cs.CV

    Unsupervised Concept Discovery Mitigates Spurious Correlations

    Authors: Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi

    Abstract: Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases. Addressing this challenge typically involves methods relying on prior knowledge and group annotation to remove spurious correlations, which may not be readily available in many applications. In this paper, we establish a novel connection between unsupervised object-centric lear… ▽ More

    Submitted 16 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Journal ref: ICLM 2024

  5. arXiv:2310.08513  [pdf, other

    cs.NE cs.AI q-bio.NC

    How connectivity structure shapes rich and lazy learning in neural circuits

    Authors: Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie

    Abstract: In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biolo… ▽ More

    Submitted 19 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  6. arXiv:2307.16704  [pdf, other

    cs.LG cs.AI

    Lookbehind-SAM: k steps back, 1 step forward

    Authors: Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar

    Abstract: Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and loss sharpness as a minimax objective. In this work, we increase the efficiency of the maximization and minimization parts of SAM's objective to achieve a better loss-sharpness trade-off. By taking inspiration from the Lookahead optimizer, which uses multiple de… ▽ More

    Submitted 16 May, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: ICML 2024

  7. arXiv:2307.09638  [pdf, other

    cs.LG cs.AI

    Promoting Exploration in Memory-Augmented Adam using Critical Momenta

    Authors: Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar

    Abstract: Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergence and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages e… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Published in Transactions on Machine Learning Research

  8. arXiv:2212.01674  [pdf, other

    cs.CV cs.AI cs.LG

    CrossSplit: Mitigating Label Noise Memorization through Data Splitting

    Authors: Jihye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien

    Abstract: We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labelled dataset. CrossSplit combines two main ing… ▽ More

    Submitted 26 April, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted to ICML 2023

  9. arXiv:2209.09658  [pdf, other

    cs.LG stat.ML

    Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

    Authors: Thomas George, Guillaume Lajoie, Aristide Baratin

    Abstract: Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficu… ▽ More

    Submitted 21 November, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: 25 pages, 14 figures

    Journal ref: TMLR 2022 - Transactions on Machine Learning Research, 12/2022

  10. arXiv:2206.01251  [pdf, other

    cs.LG cs.AI cs.CV

    Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

    Authors: Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

    Abstract: We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressivene… ▽ More

    Submitted 14 November, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Journal ref: TMLR 2023 -- Transactions of Machine Learning Research, 11/2023

  11. arXiv:2102.05628  [pdf, ps, other

    stat.ML cs.LG

    On the Regularity of Attention

    Authors: James Vuckovic, Aristide Baratin, Remi Tachet des Combes

    Abstract: Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual defini… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Conference version of arXiv:2007.02876

  12. arXiv:2008.00938  [pdf, other

    cs.LG stat.ML

    Implicit Regularization via Neural Feature Alignment

    Authors: Aristide Baratin, Thomas George, César Laurent, R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien

    Abstract: We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a regularization effect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. This can be interpreted as a combined mechanism of feature selection and compression. By extrapolating a new analysis of Rad… ▽ More

    Submitted 16 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: AISTATS 2021

  13. arXiv:2007.02876  [pdf, ps, other

    stat.ML cs.LG

    A Mathematical Theory of Attention

    Authors: James Vuckovic, Aristide Baratin, Remi Tachet des Combes

    Abstract: Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-atte… ▽ More

    Submitted 20 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

  14. arXiv:1810.08591  [pdf, other

    cs.LG stat.ML

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Authors: Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tr… ▽ More

    Submitted 18 December, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

    Journal ref: ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena

  15. arXiv:1806.08734  [pdf, other

    stat.ML cs.LG

    On the Spectral Bias of Neural Networks

    Authors: Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

    Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuatio… ▽ More

    Submitted 31 May, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: 23 pages

    Journal ref: ICML 2019

  16. arXiv:1801.04062  [pdf, other

    cs.LG stat.ML

    MINE: Mutual Information Neural Estimation

    Authors: Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm

    Abstract: We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be… ▽ More

    Submitted 14 August, 2021; v1 submitted 12 January, 2018; originally announced January 2018.

    Comments: 19 pages, 6 figures

    Journal ref: ICML 2018

  17. arXiv:1801.04055  [pdf, other

    cs.LG stat.ML

    A3T: Adversarially Augmented Adversarial Training

    Authors: Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of d… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.

    Comments: accepted for an oral presentation in Machine Deception Workshop, NIPS 2017