Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Gat, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2407.15595  [pdf, other

    cs.LG cs.AI

    Discrete Flow Matching

    Authors: Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman

    Abstract: Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers severa… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2406.10970  [pdf, other

    cs.SD eess.AS

    Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation

    Authors: Or Tal, Alon Ziv, Itai Gat, Felix Kreuk, Yossi Adi

    Abstract: We present JASCO, a temporally controlled text-to-music generation model utilizing both symbolic and audio-based conditions. JASCO can generate high-quality music samples conditioned on global text descriptions along with fine-grained local controls. JASCO is based on the Flow Matching modeling paradigm together with a novel conditioning method. This allows music generation controlled both locally… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2406.06508  [pdf, other

    cs.CV cs.AI cs.GR

    Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

    Authors: Sigal Raab, Inbar Gat, Nathan Sala, Guy Tevet, Rotem Shalev-Arkushin, Ohad Fried, Amit H. Bermano, Daniel Cohen-Or

    Abstract: Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space; hence, they primarily center on handlin… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Video: https://www.youtube.com/watch?v=s5oo3sKV0YU, Project page: https://monkeyseedocg.github.io, Code: https://github.com/MonkeySeeDoCG/MoMo-code

  5. arXiv:2402.14017  [pdf, other

    cs.LG

    D-Flow: Differentiating through Flows for Controlled Generation

    Authors: Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, Yaron Lipman

    Abstract: Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for t… ▽ More

    Submitted 21 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  6. arXiv:2402.05755  [pdf, other

    cs.CL cs.SD eess.AS

    SpiRit-LM: Interleaved Spoken and Written Language Model

    Authors: Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux

    Abstract: We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  7. arXiv:2401.04577  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Masked Audio Generation using a Single Non-Autoregressive Transformer

    Authors: Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

    Abstract: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we predict spans of masked tokens obtained from a masking scheduler, while during inference we gradually construct the output sequence using several decoding steps. T… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  8. arXiv:2309.16429  [pdf, other

    cs.LG cs.AI

    Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

    Authors: Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

    Abstract: We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresp… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 9 pages, 6 figures

  9. arXiv:2308.12950  [pdf, other

    cs.CL

    Code Llama: Open Foundation Models for Code

    Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom , et al. (1 additional authors not shown)

    Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama… ▽ More

    Submitted 31 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  10. arXiv:2308.05725  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

    Authors: Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

    Abstract: Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthes… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  11. arXiv:2306.05284  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Simple and Controllable Music Generation

    Authors: Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

    Abstract: We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchicall… ▽ More

    Submitted 29 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Published at Neurips 2023

  12. arXiv:2305.13050  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

    Authors: Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

    Abstract: In recent years, image generation has shown a great leap in performance, where diffusion models play a central role. Although generating high-quality images, such models are mainly conditioned on textual descriptions. This begs the question: "how can we adopt such models to be conditioned on other modalities?". In this paper, we propose a novel method utilizing latent diffusion models trained for… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH 2023

  13. arXiv:2305.13009  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Textually Pretrained Speech Language Models

    Authors: Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

    Abstract: Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model de… ▽ More

    Submitted 30 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  14. arXiv:2305.12393  [pdf, other

    cs.LG cs.NE

    Layer Collaboration in the Forward-Forward Algorithm

    Authors: Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan

    Abstract: Backpropagation, which uses the chain rule, is the de-facto standard algorithm for optimizing neural networks nowadays. Recently, Hinton (2022) proposed the forward-forward algorithm, a promising alternative that optimizes neural nets layer-by-layer, without propagating gradients throughout the network. Although such an approach has several advantages over back-propagation and shows promising resu… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  15. arXiv:2210.06143  [pdf, ps, other

    cs.LG stat.ML

    On the Importance of Gradient Norm in PAC-Bayesian Bounds

    Authors: Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan

    Abstract: Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we follow an alternative approach: we relax uniform bounds assumptions by using on-average bounded loss… ▽ More

    Submitted 2 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 22. arXiv admin note: text overlap with arXiv:2002.09866

  16. arXiv:2209.15483  [pdf, other

    cs.CL cs.LG eess.AS

    Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

    Authors: Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

    Abstract: Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensi… ▽ More

    Submitted 29 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

  17. arXiv:2206.05700  [pdf, other

    cs.LG

    A Functional Information Perspective on Model Interpretation

    Authors: Itai Gat, Nitay Calderon, Roi Reichart, Tamir Hazan

    Abstract: Contemporary predictive models are hard to interpret as their deep nets exploit numerous complex relations between input elements. This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input. We rely on the log-Sobolev inequality that bounds the functional entropy by the… ▽ More

    Submitted 14 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: Accepted to ICML 2022

  18. arXiv:2203.00613  [pdf

    cs.CL cs.LG cs.SD eess.AS

    Towards a Common Speech Analysis Engine

    Authors: Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory

    Abstract: Recent innovations in self-supervised representation learning have led to remarkable advances in natural language processing. That said, in the speech processing domain, self-supervised representation learning-based systems are not yet considered state-of-the-art. We propose leveraging recent advances in self-supervised-based speech processing to create a common speech analysis engine. Such an eng… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: ICASSP 2022

  19. arXiv:2202.03896  [pdf

    cs.SD cs.AI cs.LG eess.AS

    Speech Emotion Recognition using Self-Supervised Features

    Authors: Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz

    Abstract: Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation. In this paper we introduce a modular End-to- End (E2E) SER system based on an Upstream + Downstream architecture paradigm, which allows easy use/integration o… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 5 pages, 4 figures, 2 tables, ICASSP 2022

  20. arXiv:2202.01252  [pdf, other

    cs.LG

    Speaker Normalization for Self-supervised Speech Emotion Recognition

    Authors: Itai Gat, Hagai Aronowitz, Weizhong Zhu, Edmilson Morais, Ron Hoory

    Abstract: Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion r… ▽ More

    Submitted 6 November, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: ICASSP 22

  21. arXiv:2112.04895  [pdf, other

    cs.LG cs.CV

    Latent Space Explanation by Intervention

    Authors: Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan

    Abstract: The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output. While this property serves to fit the training data well, it also obscures the mechanism that drives prediction. This study aims to reveal hidden concepts by employing an intervention mechanism that shifts the predicted class based on discrete variational autoencoders.… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: Accepted to AAAI22

  22. arXiv:2110.14375  [pdf, other

    cs.LG cs.CV cs.MM

    Perceptual Score: What Data Modalities Does Your Model Perceive?

    Authors: Itai Gat, Idan Schwartz, Alexander Schwing

    Abstract: Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  23. arXiv:2106.04484  [pdf, other

    cs.CV cs.CL cs.LG

    Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

    Authors: Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart

    Abstract: Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with. To understand and better measure the generalization capabilities of VQA systems, we look at their robustness to counterfactually augmented data. Our proposed augmentations are designed to make a fo… ▽ More

    Submitted 17 September, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: ACL 2021. Our code and data are available at https://danrosenberg.github.io/rad-measure/

  24. arXiv:2010.10802  [pdf, other

    cs.CV cs.LG

    Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

    Authors: Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan

    Abstract: Many recent datasets contain a variety of different data modalities, for instance, image, question, and answer data in visual question answering (VQA). When training deep net classifiers on those multi-modal datasets, the modalities get exploited at different scales, i.e., some modalities can more easily contribute to the classification results than others. This is suboptimal because the classifie… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020