Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Suau, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12824  [pdf, other

    cs.CL cs.AI

    Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models

    Authors: Xavier Suau, Pieter Delobelle, Katherine Metcalf, Armand Joulin, Nicholas Apostoloff, Luca Zappella, Pau Rodríguez

    Abstract: An important issue with Large Language Models (LLMs) is their undesired ability to generate toxic language. In this work, we show that the neurons responsible for toxicity can be determined by their power to discriminate toxic sentences, and that toxic language can be mitigated by reducing their activation levels proportionally to this power. We propose AUROC adaptation (AurA), an intervention tha… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ICML 2024, 8 pages + appendix

  2. arXiv:2309.16318  [pdf, other

    cs.LG

    DeepPCR: Parallelizing Sequential Operations in Neural Networks

    Authors: Federico Danieli, Miguel Sarabia, Xavier Suau, Pau Rodríguez, Luca Zappella

    Abstract: Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a compu… ▽ More

    Submitted 27 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

  3. arXiv:2307.13813  [pdf, other

    stat.ML cs.AI cs.LG

    How to Scale Your EMA

    Authors: Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb

    Abstract: Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functio… ▽ More

    Submitted 7 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Spotlight at NeurIPS 2023, 53 pages, 32 figures, 17 tables

  4. arXiv:2307.10907  [pdf, other

    cs.LG

    The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

    Authors: Borja Rodríguez-Gálvez, Arno Blaas, Pau Rodríguez, Adam Goliński, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella

    Abstract: The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term… ▽ More

    Submitted 9 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 18 pages: 9 of main text, 2 of references, and 7 of supplementary material [Updated typo in page 6 (Section 3.2)]. Appears in the proceedings of ICML 2023

  5. arXiv:2306.16058  [pdf, other

    cs.LG cs.AI

    DUET: 2D Structured and Approximately Equivariant Representations

    Authors: Xavier Suau, Federico Danieli, T. Anderson Keller, Arno Blaas, Chen Huang, Jason Ramapuram, Dan Busbridge, Luca Zappella

    Abstract: Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which… ▽ More

    Submitted 17 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted at ICML 2023

  6. arXiv:2211.08282  [pdf, other

    cs.LG cs.AI

    Homomorphic Self-Supervised Learning

    Authors: T. Anderson Keller, Xavier Suau, Luca Zappella

    Abstract: In this work, we observe that many existing self-supervised learning algorithms can be both unified and generalized when seen through the lens of equivariant representations. Specifically, we introduce a general framework we call Homomorphic Self-Supervised Learning, and theoretically show how it may subsume the use of input-augmentations provided an augmentation-homomorphic feature extractor. We… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  7. arXiv:2202.08946  [pdf, other

    cs.HC cs.AI cs.LG

    Symphony: Composing Interactive Interfaces for Machine Learning

    Authors: Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, Dominik Moritz

    Abstract: Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to b… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

    ACM Class: H.2.m; I.7.m

  8. arXiv:2202.03586  [pdf, other

    cs.CV cs.AI cs.LG

    Fair SA: Sensitivity Analysis for Fairness in Face Recognition

    Authors: Aparna R. Joshi, Xavier Suau, Nivedha Sivakumar, Luca Zappella, Nicholas Apostoloff

    Abstract: As the use of deep learning in high impact domains becomes ubiquitous, it is increasingly important to assess the resilience of models. One such high impact domain is that of face recognition, with real world applications involving images affected by various degradations, such as motion blur or high exposure. Moreover, images captured across different attributes, such as gender and race, can also… ▽ More

    Submitted 9 February, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 8 pages, 5 figures, to be published in NeurIPS 2021 Workshop, Algorithmic Fairness through the Lens of Causality and Robustness

  9. arXiv:2111.12427  [pdf, other

    cs.LG cs.CV

    Challenges of Adversarial Image Augmentations

    Authors: Arno Blaas, Xavier Suau, Jason Ramapuram, Nicholas Apostoloff, Luca Zappella

    Abstract: Image augmentations applied during training are crucial for the generalization performance of image classifiers. Therefore, a large body of research has focused on finding the optimal augmentation policy for a given task. Yet, RandAugment [2], a simple random augmentation policy, has recently been shown to outperform existing sophisticated policies. Only Adversarial AutoAugment (AdvAA) [11], an ap… ▽ More

    Submitted 3 December, 2021; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: To appear at the ICBINB 2021 Neurips Workshop

  10. arXiv:2110.02802  [pdf, other

    cs.CL

    Self-conditioning pre-trained language models

    Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff

    Abstract: In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such c… ▽ More

    Submitted 14 June, 2023; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: 8 pages and supplementary material, accepted at ICML 2022

  11. arXiv:2110.00552  [pdf, other

    cs.LG

    Stochastic Contrastive Learning

    Authors: Jason Ramapuram, Dan Busbridge, Xavier Suau, Russ Webb

    Abstract: While state-of-the-art contrastive Self-Supervised Learning (SSL) models produce results competitive with their supervised counterparts, they lack the ability to infer latent variables. In contrast, prescribed latent variable (LV) models enable attributing uncertainty, inducing task specific compression, and in general allow for more interpretable representations. In this work, we introduce LV app… ▽ More

    Submitted 30 November, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted to 2nd Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2021), Sydney, Australia

  12. arXiv:2005.07647  [pdf, other

    cs.AI cs.CL cs.LG

    Finding Experts in Transformer Models

    Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff

    Abstract: In this work we study the presence of expert units in pre-trained Transformer Models (TM), and how they impact a model's performance. We define expert units to be neurons that are able to classify a concept with a given average precision, where a concept is represented by a binary set of sentences containing the concept (or not). Leveraging the OneSec dataset (Scarlini et al., 2019), we compile a… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  13. arXiv:1807.10585  [pdf, ps, other

    cs.CV

    Filter Distillation for Network Compression

    Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff

    Abstract: In this paper we introduce Principal Filter Analysis (PFA), an easy to use and effective method for neural network compression. PFA exploits the correlation between filter responses within network layers to recommend a smaller network that maintain as much as possible the accuracy of the full model. We propose two algorithms: the first allows users to target compression to specific network propert… ▽ More

    Submitted 11 December, 2019; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: 10 pages, 3 figures, Deep neural network compression, spectral analysis, machine learning

    Journal ref: WACV 2020