Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Launay, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.16867  [pdf, other

    cs.CL cs.AI

    The Falcon Series of Open Language Models

    Authors: Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, Guilherme Penedo

    Abstract: We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurr… ▽ More

    Submitted 29 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  2. arXiv:2306.01116  [pdf, other

    cs.CL cs.AI

    The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

    Authors: Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, Julien Launay

    Abstract: Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers. This curation process is believed to be necessary to produce performant models with broad zero-shot generalization abilities. However, as larger models requiring pretraining on trillions of tokens are considered, it is unclea… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  3. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  4. arXiv:2210.15424  [pdf, other

    cs.CL cs.AI cs.LG

    What Language Model to Train if You Have One Million GPU Hours?

    Authors: Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Biderman, Hady Elsahar, Niklas Muennighoff, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy

    Abstract: The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notabl… ▽ More

    Submitted 7 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  5. arXiv:2210.14593  [pdf, other

    cs.LG cs.AI cs.CL cs.NE stat.ML

    Scaling Laws Beyond Backpropagation

    Authors: Matthew J. Filipovich, Alessandro Cappelli, Daniel Hesslow, Julien Launay

    Abstract: Alternatives to backpropagation have long been studied to better understand how biological brains may learn. Recently, they have also garnered interest as a way to train neural networks more efficiently. By relaxing constraints inherent to backpropagation (e.g., symmetric feedforward and feedback weights, sequential updates), these methods enable promising prospects, such as local learning. Howeve… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: I Can't Believe It's Not Better Workshop, NeurIPS 2022

  6. arXiv:2204.05832  [pdf, other

    cs.CL cs.LG stat.ML

    What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

    Authors: Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel

    Abstract: Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives used across state-of-the-art models differ significantly, and there has been limited systematic comparison of these factors. In this work, we present a large-sc… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  7. arXiv:2110.08554  [pdf, other

    cs.CL

    PAGnol: An Extra-Large French Generative Model

    Authors: Julien Launay, Elena Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah

    Abstract: Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French langu… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  8. arXiv:2109.11928  [pdf, other

    stat.ML cs.LG

    Is the Number of Trainable Parameters All That Actually Matters?

    Authors: Amélie Chatelain, Amine Djeghri, Daniel Hesslow, Julien Launay, Iacopo Poli

    Abstract: Recent work has identified simple empirical scaling laws for language models, linking compute budget, dataset size, model size, and autoregressive modeling loss. The validity of these simple power laws across orders of magnitude in model scale provides compelling evidence that larger models are also more capable models. However, scaling up models under the constraints of hardware and infrastructur… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  9. arXiv:2108.04217  [pdf, other

    cs.CV cs.LG

    ROPUST: Improving Robustness through Fine-tuning with Photonic Processors and Synthetic Gradients

    Authors: Alessandro Cappelli, Julien Launay, Laurent Meunier, Ruben Ohana, Iacopo Poli

    Abstract: Robustness to adversarial attacks is typically obtained through expensive adversarial training with Projected Gradient Descent. Here we introduce ROPUST, a remarkably simple and efficient method to leverage robust pre-trained models and further increase their robustness, at no cost in natural accuracy. Our technique relies on the use of an Optical Processing Unit (OPU), a photonic co-processor, an… ▽ More

    Submitted 6 July, 2021; originally announced August 2021.

    Comments: 12 pages, 7 figures

  10. arXiv:2107.11814  [pdf, other

    cs.AR cs.ET

    LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

    Authors: Charles Brossollet, Alessandro Cappelli, Igor Carron, Charidimos Chaintoutis, Amélie Chatelain, Laurent Daudet, Sylvain Gigan, Daniel Hesslow, Florent Krzakala, Julien Launay, Safa Mokaadi, Fabien Moreau, Kilian Müller, Ruben Ohana, Gustave Pariente, Iacopo Poli, Elena Tommasone

    Abstract: We introduce LightOn's Optical Processing Unit (OPU), the first photonic AI accelerator chip available on the market for at-scale Non von Neumann computations, reaching 1500 TeraOPS. It relies on a combination of free-space optics with off-the-shelf components, together with a software API allowing a seamless integration within Python-based processing pipelines. We discuss a variety of use cases… ▽ More

    Submitted 25 July, 2021; originally announced July 2021.

    Comments: Proceedings IEEE Hot Chips 33, 2021

  11. arXiv:2106.03645  [pdf, other

    cs.LG cs.CR

    Photonic Differential Privacy with Direct Feedback Alignment

    Authors: Ruben Ohana, Hamlet J. Medina Ruiz, Julien Launay, Alessandro Cappelli, Iacopo Poli, Liva Ralaivola, Alain Rakotomamonjy

    Abstract: Optical Processing Units (OPUs) -- low-power photonic chips dedicated to large scale random projections -- have been used in previous work to train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative to backpropagation. Here, we demonstrate how to leverage the intrinsic noise of optical random projections to build a differentially private DFA mechanism, making OPUs… ▽ More

    Submitted 25 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Journal ref: NeurIPS 2021

  12. Adversarial Robustness by Design through Analog Computing and Synthetic Gradients

    Authors: Alessandro Cappelli, Ruben Ohana, Julien Launay, Laurent Meunier, Iacopo Poli, Florent Krzakala

    Abstract: We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs a nonlinear fixed random transformation, where the parameters are unknown and impossible to retrieve with sufficient precision for large enough dimensions. In the… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Journal ref: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing,

  13. arXiv:2012.06373  [pdf, other

    cs.LG cs.AI cs.AR cs.NE stat.ML

    Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment

    Authors: Julien Launay, Iacopo Poli, Kilian Müller, Gustave Pariente, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan

    Abstract: The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute n… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 6 pages, 2 figures, 1 table. Oral at the Beyond Backpropagation Workshop, NeurIPS 2020

  14. arXiv:2006.12878  [pdf, other

    stat.ML cs.LG cs.NE

    Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

    Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala

    Abstract: Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and arc… ▽ More

    Submitted 11 December, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 23 pages, 6 figures, 10 tables. For associated code, see https://github.com/lightonai/dfa-scales-to-modern-deep-learning. Poster at NeurIPS 2020

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 9346--9360, 2020

  15. arXiv:2006.01475  [pdf, other

    cs.LG cs.ET eess.IV stat.ML

    Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

    Authors: Julien Launay, Iacopo Poli, Kilian Müller, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan

    Abstract: As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally-implemented neural networks. We rely on direct feedback… ▽ More

    Submitted 3 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: 2 pages, 1 figure

  16. arXiv:1906.04554  [pdf, other

    stat.ML cs.LG cs.NE

    Principled Training of Neural Networks with Direct Feedback Alignment

    Authors: Julien Launay, Iacopo Poli, Florent Krzakala

    Abstract: The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated - have garnered interest. These methods not only better portray how biological brains are learning, but al… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: 10 pages, 4 figures, 4 tables, github repo at: https://github.com/lightonai/principled-dfa-training