Skip to main content

Showing 1–20 of 20 results for author: Paganini, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2304.13164  [pdf, other

    cs.LG cs.AI

    Towards Compute-Optimal Transfer Learning

    Authors: Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu

    Abstract: The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  4. arXiv:2302.10258  [pdf, other

    cs.LG cs.AI stat.ME

    Neural Algorithmic Reasoning with Causal Regularisation

    Authors: Beatrice Bevilacqua, Kyriacos Nikiforou, Borja Ibarz, Ioana Bica, Michela Paganini, Charles Blundell, Jovana Mitrovic, Petar Veličković

    Abstract: Recent work on neural algorithmic reasoning has investigated the reasoning capabilities of neural networks, effectively demonstrating they can learn to execute classical algorithms on unseen data coming from the train distribution. However, the performance of existing neural reasoners significantly degrades on out-of-distribution (OOD) test data, where inputs have larger sizes. In this work, we ma… ▽ More

    Submitted 3 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: ICML 2023, Camera Ready; 17 pages, 7 figures

  5. arXiv:2202.01169  [pdf, other

    cs.CL cs.LG

    Unified Scaling Laws for Routed Language Models

    Authors: Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu , et al. (1 additional authors not shown)

    Abstract: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better… ▽ More

    Submitted 9 February, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Fixing typos and affiliation clarity

  6. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  7. arXiv:2112.04426  [pdf, other

    cs.CL cs.LG

    Improving language models by retrieving from trillions of tokens

    Authors: Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan , et al. (3 additional authors not shown)

    Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d… ▽ More

    Submitted 7 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Fix incorrect reported numbers in Table 14

  8. arXiv:2009.09936  [pdf, other

    cs.CV cs.CY cs.LG

    Prune Responsibly

    Authors: Michela Paganini

    Abstract: Irrespective of the specific definition of fairness in a machine learning application, pruning the underlying model affects it. We investigate and document the emergence and exacerbation of undesirable per-class performance imbalances, across tasks and architectures, for almost one million categories considered across over 100K image classification models that undergo a pruning process.We demonstr… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  9. arXiv:2007.04091  [pdf, other

    cs.LG stat.ML

    Bespoke vs. Prêt-à-Porter Lottery Tickets: Exploiting Mask Similarity for Trainable Sub-Network Finding

    Authors: Michela Paganini, Jessica Zosa Forde

    Abstract: The observation of sparse trainable sub-networks within over-parametrized networks - also known as Lottery Tickets (LTs) - has prompted inquiries around their trainability, scaling, uniqueness, and generalization properties. Across 28 combinations of image classification tasks and architectures, we discover differences in the connectivity structure of LTs found through different iterative pruning… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: text overlap with arXiv:2001.05050

  10. arXiv:2006.07484  [pdf, other

    cs.SE cs.LG

    dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration

    Authors: Michela Paganini, Jessica Zosa Forde

    Abstract: Many research directions in machine learning, particularly in deep learning, involve complex, multi-stage experiments, commonly involving state-mutating operations acting on models along multiple paths of execution. Although machine learning frameworks provide clean interfaces for defining model architectures and unbranched flows, burden is often placed on the researcher to track experimental prov… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 4 pages, 3 code listings, 1 figure

  11. arXiv:2004.13770  [pdf, other

    cs.LG stat.ML

    Streamlining Tensor and Network Pruning in PyTorch

    Authors: Michela Paganini, Jessica Forde

    Abstract: In order to contrast the explosion in size of state-of-the-art machine learning models that can be attributed to the empirical advantages of over-parametrization, and due to the necessity of deploying fast, sustainable, and private on-device models on resource-constrained devices, the community has focused on techniques such as pruning, quantization, and distillation as central strategies for mode… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 5 pages, 1 figure, 5 code listings. Published as a workshop paper at ICLR 2020

  12. arXiv:2001.05050  [pdf, other

    cs.LG stat.ML

    On Iterative Neural Network Pruning, Reinitialization, and the Similarity of Masks

    Authors: Michela Paganini, Jessica Forde

    Abstract: We examine how recently documented, fundamental phenomena in deep learning models subject to pruning are affected by changes in the pruning procedure. Specifically, we analyze differences in the connectivity structure and learning dynamics of pruned models found through a set of common iterative pruning techniques, to address questions of uniqueness of trainable, high-sparsity sub-networks, and th… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 8 pages, 8 figures, plus 5 appendices with additional figures and tables

  13. arXiv:1906.02773  [pdf, other

    stat.ML cs.LG cs.NE

    One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

    Authors: Ari S. Morcos, Haonan Yu, Michela Paganini, Yuandong Tian

    Abstract: The success of lottery ticket initializations (Frankle and Carbin, 2019) suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these "winning ticket" initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generali… ▽ More

    Submitted 27 October, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019

  14. arXiv:1904.10922  [pdf, ps, other

    cs.LG stat.ML

    The Scientific Method in the Science of Machine Learning

    Authors: Jessica Zosa Forde, Michela Paganini

    Abstract: In the quest to align deep learning with the sciences to address calls for rigor, safety, and interpretability in machine learning systems, this contribution identifies key missing pieces: the stages of hypothesis formulation and testing, as well as statistical and systematic uncertainty estimation -- core tenets of the scientific method. This position paper discusses the ways in which contemporar… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: 4 pages + 1 appendix. Presented at the ICLR 2019 Debugging Machine Learning Models workshop

  15. arXiv:1903.05082  [pdf

    hep-ex cs.LG hep-ph

    Machine Learning Solutions for High Energy Physics: Applications to Electromagnetic Shower Generation, Flavor Tagging, and the Search for di-Higgs Production

    Authors: Michela Paganini

    Abstract: This thesis demonstrate the efficacy of designing and developing machine learning (ML) algorithms to selected use cases that encompass many of the outstanding challenges in the field of experimental high energy physics. Although simple implementations of neural networks and boosted decision trees have been used in high energy physics for a long time, the field of ML has quickly evolved by devising… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: 413 pages, 10 chapters

  16. arXiv:1807.02876  [pdf, other

    physics.comp-ph cs.LG hep-ex stat.ML

    Machine Learning in High Energy Physics Community White Paper

    Authors: Kim Albertsson, Piero Altoe, Dustin Anderson, John Anderson, Michael Andrews, Juan Pedro Araque Espinosa, Adam Aurisano, Laurent Basara, Adrian Bevan, Wahid Bhimji, Daniele Bonacorsi, Bjorn Burkle, Paolo Calafiura, Mario Campanelli, Louis Capps, Federico Carminati, Stefano Carrazza, Yi-fan Chen, Taylor Childers, Yann Coadou, Elias Coniavitis, Kyle Cranmer, Claire David, Douglas Davis, Andrea De Simone , et al. (103 additional authors not shown)

    Abstract: Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d… ▽ More

    Submitted 16 May, 2019; v1 submitted 8 July, 2018; originally announced July 2018.

    Comments: Editors: Sergei Gleyzer, Paul Seyfert and Steven Schramm

  17. arXiv:1712.10321  [pdf, other

    hep-ex cs.LG hep-ph stat.ML

    CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

    Authors: Michela Paganini, Luke de Oliveira, Benjamin Nachman

    Abstract: The precise modeling of subatomic particle interactions and propagation through matter is paramount for the advancement of nuclear and particle physics searches and precision measurements. The most computationally expensive step in the simulation pipeline of a typical experiment at the Large Hadron Collider (LHC) is the detailed modeling of the full complexity of physics processes that govern the… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

    Comments: 14 pages, 4 tables, 13 figures; version accepted by Physical Review D (PRD)

    Journal ref: Phys. Rev. D 97, 014021 (2018)

  18. arXiv:1711.08813  [pdf, other

    hep-ex cs.LG physics.data-an

    Controlling Physical Attributes in GAN-Accelerated Simulation of Electromagnetic Calorimeters

    Authors: Luke de Oliveira, Michela Paganini, Benjamin Nachman

    Abstract: High-precision modeling of subatomic particle interactions is critical for many fields within the physical sciences, such as nuclear physics and high energy particle physics. Most simulation pipelines in the sciences are computationally intensive -- in a variety of scientific fields, Generative Adversarial Networks have been suggested as a solution to speed up the forward component of simulation,… ▽ More

    Submitted 23 November, 2017; originally announced November 2017.

    Comments: 7 pages, 5 figures, in proceedings of the 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017)

  19. arXiv:1711.08811  [pdf, other

    hep-ex cs.LG

    Machine Learning Algorithms for $b$-Jet Tagging at the ATLAS Experiment

    Authors: Michela Paganini

    Abstract: The separation of $b$-quark initiated jets from those coming from lighter quark flavors ($b$-tagging) is a fundamental tool for the ATLAS physics program at the CERN Large Hadron Collider. The most powerful $b$-tagging algorithms combine information from low-level taggers, exploiting reconstructed track and vertex information, into machine learning classifiers. The potential of modern deep learnin… ▽ More

    Submitted 23 November, 2017; originally announced November 2017.

    Comments: 7 pages, 5 figures, in proceedings of the 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017)

    Report number: ATL-PHYS-PROC-2017-211

  20. arXiv:1711.03573  [pdf, other

    hep-ex cs.DC cs.LG physics.data-an

    Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC

    Authors: Wahid Bhimji, Steven Andrew Farrell, Thorsten Kurth, Michela Paganini, Prabhat, Evan Racah

    Abstract: There has been considerable recent activity applying deep convolutional neural nets (CNNs) to data from particle physics experiments. Current approaches on ATLAS/CMS have largely focussed on a subset of the calorimeter, and for identifying objects or particular particle types. We explore approaches that use the entire calorimeter, combined with track information, for directly conducting physics an… ▽ More

    Submitted 29 November, 2017; v1 submitted 9 November, 2017; originally announced November 2017.

    Comments: Presented at ACAT 2017 Conference, Submitted to J. Phys. Conf. Ser