Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Katharopoulos, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.01093  [pdf, other

    cs.LG cs.CL

    Specialized Language Models with Cheap Inference from Limited Domain Data

    Authors: David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun

    Abstract: Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets. This work formalizes these constraints and distinguishes four important variables: the pretraining budget (for training before the target domain is known), the specialization budget (for training after the target domain is known), the infer… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  2. arXiv:2311.00613  [pdf, other

    cs.SD cs.LG eess.AS

    Controllable Music Production with Diffusion Models and Guidance Gradients

    Authors: Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson

    Abstract: We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic ch… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  3. arXiv:2301.07836  [pdf, other

    cs.CV cs.AI

    Masked Autoencoding Does Not Help Natural Language Supervision at Scale

    Authors: Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

    Abstract: Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks. Recent works such as M3AE and SLIP have suggested that these approaches can be effectively combined, but most notably their results use small pre-training datasets (<50M samples) and don't effectively reflect the large-scale regim… ▽ More

    Submitted 15 May, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023

  4. arXiv:2103.10429  [pdf, other

    cs.CV

    Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

    Authors: Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, Sanja Fidler

    Abstract: Impressive progress in 3D shape extraction led to representations that can capture object geometries with high fidelity. In parallel, primitive-based methods seek to represent objects as semantically consistent part arrangements. However, due to the simplicity of existing primitive representations, these methods fail to accurately reconstruct 3D shapes using a small number of primitives/parts. We… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: To appear in CVPR 2021

  5. arXiv:2007.04825  [pdf, other

    cs.LG stat.ML

    Fast Transformers with Clustered Attention

    Authors: Apoorv Vyas, Angelos Katharopoulos, François Fleuret

    Abstract: Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, grou… ▽ More

    Submitted 29 September, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

  6. arXiv:2006.16236  [pdf, other

    cs.LG stat.ML

    Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

    Authors: Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret

    Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from… ▽ More

    Submitted 31 August, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML 2020, project at https://linear-transformers.com/

  7. arXiv:1905.03711  [pdf, other

    cs.CV cs.LG stat.ML

    Processing Megapixel Images with Deep Attention-Sampling Models

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Existing deep architectures cannot operate on very large signals such as megapixel images due to computational and memory constraints. To tackle this limitation, we propose a fully differentiable end-to-end trainable model that samples and processes only a fraction of the full resolution input image. The locations to process are sampled from an attention distribution computed from a low resolution… ▽ More

    Submitted 17 July, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: Presented in ICML 2019. Code is available at https://github.com/idiap/attention-sampling

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3282-3291, 2019

  8. arXiv:1803.00942  [pdf, other

    cs.LG

    Not All Samples Are Created Equal: Deep Learning with Importance Sampling

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to… ▽ More

    Submitted 28 October, 2019; v1 submitted 2 March, 2018; originally announced March 2018.

    Comments: Accepted at ICML 2018 (short oral)

  9. arXiv:1706.08580  [pdf, other

    cs.LG stat.ML

    Learning Local Feature Aggregation Functions with Backpropagation

    Authors: Angelos Katharopoulos, Despoina Paschalidou, Christos Diou, Anastasios Delopoulos

    Abstract: This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of… ▽ More

    Submitted 26 June, 2017; originally announced June 2017.

    Comments: In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017)

  10. arXiv:1706.00043  [pdf, other

    cs.LG

    Biased Importance Sampling for Deep Neural Network Training

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems. However, the lack of an efficient way to calculate the importance still hinders its application to Deep Learning. In this paper, we show that the loss value can be used as an alternative importance metric, and propose a way to efficiently approximate it for a deep model, using a small mo… ▽ More

    Submitted 13 September, 2017; v1 submitted 31 May, 2017; originally announced June 2017.