Skip to main content

Showing 1–50 of 82 results for author: Roberts, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03483  [pdf, other

    math.DS cs.MS math.AP

    Construct accurate multi-continuum micromorphic homogenisations in multi-D space-time with computer algebra

    Authors: A. J. Roberts

    Abstract: Homogenisation empowers the efficient macroscale system level prediction of physical problems with intricate microscale structures. Here we develop an innovative powerful, rigorous and flexible framework for asymptotic homogenisation of dynamics at the finite scale separation of real physics, with proven results underpinned by modern dynamical systems theory. The novel systematic approach removes… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 35B27; 74H10; 37L10; 35B40

  2. arXiv:2404.03626  [pdf, other

    cs.CL cs.LG

    Training LLMs over Neurally Compressed Text

    Authors: Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

    Abstract: In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it were possible to train LLMs directly over neurally compressed text, this would confer advantages in training and serving efficiency, as well as easier h… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  3. arXiv:2404.01413  [pdf, other

    cs.LG cs.AI cs.CL cs.ET stat.ML

    Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

    Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

    Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  4. arXiv:2403.17887  [pdf, other

    cs.CL cs.LG stat.ML

    The Unreasonable Ineffectiveness of the Deeper Layers

    Authors: Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

    Abstract: We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed. To prune these models, we identify the optimal block of layers to prune by considering similarity across layers; then, to "heal" the damage… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 + 10 pages, 5 + 4 figures

    Report number: MIT-CTP/5694

  5. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  6. arXiv:2402.01700  [pdf

    cs.CL cs.AI

    Question answering systems for health professionals at the point of care -- a systematic review

    Authors: Gregory Kell, Angus Roberts, Serge Umansky, Linglong Qian, Davide Ferrari, Frank Soboczenski, Byron Wallace, Nikhil Patel, Iain J Marshall

    Abstract: Objective: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. Materials and method… ▽ More

    Submitted 24 January, 2024; originally announced February 2024.

    Comments: Accepted to the Journal of the American Medical Informatics Association (JAMIA)

  7. arXiv:2401.04792  [pdf, other

    cs.CR

    REACT: Autonomous Intrusion Response System for Intelligent Vehicles

    Authors: Mohammad Hamad, Andreas Finkenzeller, Michael Kühr, Andrew Roberts, Olaf Maennel, Vassilis Prevelakis, Sebastian Steinhorst

    Abstract: Autonomous and connected vehicles are rapidly evolving, integrating numerous technologies and software. This progress, however, has made them appealing targets for cybersecurity attacks. As the risk of cyber threats escalates with this advancement, the focus is shifting from solely preventing these attacks to also mitigating their impact. Current solutions rely on vehicle security operation center… ▽ More

    Submitted 16 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: 20 pages

  8. arXiv:2311.10912  [pdf, ps, other

    cs.HC

    Opportunities in Mental Health Support for Informal Dementia Caregivers Suffering from Verbal Agitation

    Authors: Taewook Kim, Hyeok Kim, Angela Roberts, Maia Jacobs, Matthew Kay

    Abstract: People with dementia (PwD) often present verbal agitation such as cursing, screaming, and persistently complaining. Verbal agitation can impose mental distress on informal caregivers (e.g., family, friends), which may cause severe mental illnesses, such as depression and anxiety disorders. To improve informal caregivers' mental health, we explore design opportunities by interviewing 11 informal ca… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 26 pages, 1 figure, 2 tables. Accepted to PACM HCI (CSCW 2024)

  9. arXiv:2311.06477  [pdf, other

    cs.CY

    Report of the 1st Workshop on Generative AI and Law

    Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

    Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More

    Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

  10. arXiv:2310.07765  [pdf, other

    cs.LG hep-ph hep-th stat.ML

    Feature Learning and Generalization in Deep Networks with Orthogonal Weights

    Authors: Hannah Day, Yonatan Kahn, Daniel A. Roberts

    Abstract: Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We s… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: v2: numerical experiments updated with more data, plots updated to match, conclusions unchanged. 30+12 pages, 20 figures

    Report number: MIT-CTP/5625

  11. arXiv:2309.02237  [pdf

    cs.LG

    Sample Size in Natural Language Processing within Healthcare Research

    Authors: Jaya Chaturvedi, Diana Shamsutdinova, Felix Zimmer, Sumithra Velupillai, Daniel Stahl, Robert Stewart, Angus Roberts

    Abstract: Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. W… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Submitted to Journal of Biomedical Informatics

  12. arXiv:2308.09226  [pdf, other

    cs.CE

    Efficient computational homogenisation of 2D beams of heterogeneous elasticity using the patch scheme

    Authors: Thien Tran-Duc, J. E. Bunder, A. J. Roberts

    Abstract: Modern 'smart' materials have complex heterogeneous microscale structure, often with unknown macroscale closure but one we need to realise for large scale engineering and science. The multiscale Equation-Free Patch Scheme empowers us to non-intrusively, efficiently, and accurately predict the large scale, system level, solutions through computations on only small sparse patches of the given detail… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  13. arXiv:2308.08904  [pdf

    cs.LG cs.AI

    Development of a Knowledge Graph Embeddings Model for Pain

    Authors: Jaya Chaturvedi, Tao Wang, Sumithra Velupillai, Robert Stewart, Angus Roberts

    Abstract: Pain is a complex concept that can interconnect with other concepts such as a disorder that might cause pain, a medication that might relieve pain, and so on. To fully understand the context of pain experienced by either an individual or across a population, we may need to examine all concepts related to pain and the relationships between them. This is especially useful when modeling pain that has… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted at AMIA 2023, New Orleans

  14. arXiv:2305.13169  [pdf, other

    cs.CL cs.LG

    A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

    Authors: Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David Mimno, Daphne Ippolito

    Abstract: Pretraining is the preliminary and fundamental step in developing capable language models (LM). Despite this, pretraining data design is critically under-documented and often guided by empirically unsupported intuitions. To address this, we pretrain 28 1.5B parameter decoder-only models, training on data curated (1) at different times, (2) with varying toxicity and quality filters, and (3) with di… ▽ More

    Submitted 13 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  15. arXiv:2304.09151  [pdf, other

    cs.CL

    UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

    Authors: Hyung Won Chung, Noah Constant, Xavier Garcia, Adam Roberts, Yi Tay, Sharan Narang, Orhan Firat

    Abstract: Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance between different languages. However previous work has not systematically evaluated the efficacy of different pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax, that delivers more uniform coverage of head languages while mit… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  16. arXiv:2304.01240  [pdf

    cs.CL cs.LG

    Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach

    Authors: Jaya Chaturvedi, Sumithra Velupillai, Robert Stewart, Angus Roberts

    Abstract: Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous… ▽ More

    Submitted 5 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: 5 pages, 2 tables, submitted to MEDINFO 2023 conference

  17. arXiv:2301.13688  [pdf, other

    cs.AI cs.CL cs.LG

    The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

    Authors: Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts

    Abstract: We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniqu… ▽ More

    Submitted 14 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  18. arXiv:2301.12662  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    SingSong: Generating musical accompaniments from singing

    Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

    Abstract: We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  19. arXiv:2301.11325  [pdf, other

    cs.SD cs.LG eess.AS

    MusicLM: Generating Music From Text

    Authors: Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank

    Abstract: We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous s… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Supplementary material at https://google-research.github.io/seanet/musiclm/examples and https://kaggle.com/datasets/googleai/musiccaps

  20. arXiv:2212.10562  [pdf, other

    cs.CL cs.CV

    Character-Aware Models Improve Visual Text Rendering

    Authors: Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

    Abstract: Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text e… ▽ More

    Submitted 3 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  21. arXiv:2211.14218  [pdf, ps, other

    math.CO cs.DM math.PR

    Shotgun assembly of random graphs

    Authors: Tom Johnston, Gal Kronenberg, Alexander Roberts, Alex Scott

    Abstract: In the graph shotgun assembly problem, we are given the balls of radius $r$ around each vertex of a graph and asked to reconstruct the graph. We study the shotgun assembly of the Erdős-Rényi random graph $\mathcal G(n,p)$ from a wide range of values of $r$. We determine the threshold for reconstructibility for each $r\geq 3$, extending and improving substantially on results of Mossel and Ross for… ▽ More

    Submitted 3 June, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 36 pages, 3 figures

  22. arXiv:2211.09760  [pdf, other

    cs.LG math.OC stat.ML

    VeLO: Training Versatile Learned Optimizers by Scaling Up

    Authors: Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein

    Abstract: While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. M… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  23. arXiv:2211.08411  [pdf, other

    cs.CL cs.LG

    Large Language Models Struggle to Learn Long-Tail Knowledge

    Authors: Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel

    Abstract: The Internet contains a wealth of knowledge -- from the birthdays of historical figures to tutorials on how to code -- all of which may be learned by language models. However, while certain pieces of information are ubiquitous on the web, others appear extremely rarely. In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-trai… ▽ More

    Submitted 27 July, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: ICML 2023 Camera Ready Version

  24. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  25. arXiv:2211.01786  [pdf, other

    cs.CL cs.AI cs.LG

    Crosslingual Generalization through Multitask Finetuning

    Authors: Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, Colin Raffel

    Abstract: Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused on English data and models. We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0. We find finetuning large multilingual language models on English tasks wi… ▽ More

    Submitted 29 May, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 9 main pages (119 with appendix), 16 figures and 11 tables

  26. arXiv:2210.16859  [pdf, other

    cs.LG hep-th stat.ML

    A Solvable Model of Neural Scaling Laws

    Authors: Alexander Maloney, Daniel A. Roberts, James Sully

    Abstract: Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource. To understand this better, we first identify the necessary properties allowing such sca… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 73 + 23 pages, 14 + 5 figures

    Report number: MIT-CTP/5463

  27. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  28. arXiv:2206.14639  [pdf, other

    eess.AS cs.LG cs.SD

    DDKtor: Automatic Diadochokinetic Speech Analysis

    Authors: Yael Segal, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet

    Abstract: Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unanno… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022

  29. arXiv:2206.05408  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-instrument Music Synthesis with Spectrogram Diffusion

    Authors: Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

    Abstract: An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on any music but with minimal control and slow generat… ▽ More

    Submitted 12 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

  30. arXiv:2204.05832  [pdf, other

    cs.CL cs.LG stat.ML

    What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

    Authors: Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel

    Abstract: Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives used across state-of-the-art models differ significantly, and there has been limited systematic comparison of these factors. In this work, we present a large-sc… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  31. arXiv:2204.02311  [pdf, other

    cs.CL

    PaLM: Scaling Language Modeling with Pathways

    Authors: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin , et al. (42 additional authors not shown)

    Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Tran… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  32. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  33. arXiv:2201.08239  [pdf, other

    cs.CL cs.AI

    LaMDA: Language Models for Dialog Applications

    Authors: Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao , et al. (35 additional authors not shown)

    Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotat… ▽ More

    Submitted 10 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  34. Articulatory Coordination for Speech Motor Tracking in Huntington Disease

    Authors: Matthew Perez, Amrit Romana, Angela Roberts, Noelle Carlozzi, Jennifer Ann Miner, Praveen Dayalu, Emily Mower Provost

    Abstract: Huntington Disease (HD) is a progressive disorder which often manifests in motor impairment. Motor severity (captured via motor score) is a key component in assessing overall HD severity. However, motor score evaluation involves in-clinic visits with a trained medical professional, which are expensive and not always accessible. Speech analysis provides an attractive avenue for tracking HD severity… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

  35. arXiv:2109.00908  [pdf, ps, other

    cs.IT math.CO

    Binary self-dual codes of various lengths with new weight enumerators from a modified bordered construction and neighbours

    Authors: Joe Gildea, Adrian Korban, Adam Michael Roberts, Alexander Tylyshchak

    Abstract: In this work, we define a modification of a bordered construction for self-dual codes which utilises $λ$-circulant matrices. We provide the necessary conditions for the construction to produce self-dual codes over finite commutative Frobenius rings of characteristic 2. Using the modified construction together with the neighbour construction, we construct many binary self-dual codes of lengths 54,… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2108.09184, arXiv:2106.12355, arXiv:2102.10354

    MSC Class: 94B05; 15B10; 15B33

  36. New binary self-dual codes of lengths 56, 62, 78, 92 and 94 from a bordered construction

    Authors: Joe Gildea, Adrian Korban, Adam Michael Roberts, Alexander Tylyshchak

    Abstract: In this paper, we present a new bordered construction for self-dual codes which employs $λ$-circulant matrices. We give the necessary conditions for our construction to produce self-dual codes over a finite commutative Frobenius ring of characteristic 2. Moreover, using our bordered construction together with the well-known building-up and neighbour methods, we construct many binary self-dual code… ▽ More

    Submitted 3 February, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

    Comments: corrected typos; other minor corrections. arXiv admin note: substantial text overlap with arXiv:2102.10354, arXiv:2106.12355, arXiv:2102.12326

    MSC Class: 94B05; 15B10; 15B33

  37. arXiv:2108.05056  [pdf, ps, other

    cs.IT

    Group LCD and Group Reversible LCD Codes

    Authors: Steven T. Dougherty, Joe Gildea, Adrian Korban, Adam M. Roberts

    Abstract: In this paper, we give a new method for constructing LCD codes. We employ group rings and a well known map that sends group ring elements to a subring of the $n \times n$ matrices to obtain LCD codes. Our construction method guarantees that our LCD codes are also group codes, namely, the codes are ideals in a group ring. We show that with a certain condition on the group ring element $v,$ one can… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: 17 pages

    MSC Class: 94B05

  38. New binary self-dual codes of lengths 80, 84 and 96 from composite matrices

    Authors: Joe Gildea, Adrian Korban, Adam Michael Roberts

    Abstract: In this work, we apply the idea of composite matrices arising from group rings to derive a number of different techniques for constructing self-dual codes over finite commutative Frobenius rings. By applying these techniques over different alphabets, we construct best known singly-even binary self-dual codes of lengths 80, 84 and 96 as well as doubly-even binary self-dual codes of length 96 that w… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:2102.10354

    MSC Class: 94B05; 16S34; 15B10; 15B33

  39. arXiv:2106.10165  [pdf, other

    cs.LG cs.AI hep-th stat.ML

    The Principles of Deep Learning Theory

    Authors: Daniel A. Roberts, Sho Yaida, Boris Hanin

    Abstract: This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are… ▽ More

    Submitted 24 August, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: 471 pages, to be published by Cambridge University Press; v2: hyperlinks fixed, index added

    Report number: MIT-CTP/5306

    Journal ref: Cambridge University Press (2022)

  40. arXiv:2105.13626  [pdf, other

    cs.CL

    ByT5: Towards a token-free future with pre-trained byte-to-byte models

    Authors: Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel

    Abstract: Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: they can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pi… ▽ More

    Submitted 7 March, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: To be published in TACL 2022

  41. arXiv:2104.13554  [pdf, other

    cs.CE

    Mesoscale simulation of woven composite design decisions

    Authors: Lincoln N. Collins, Scott A. Roberts

    Abstract: Characterizing the connection between material design decisions/parameters and their effective properties allows for accelerated materials development and optimization. We present a global sensitivity analysis of woven composite thermophysical properties, including density, volume fraction, thermal conductivity, specific heat, moduli, permeability, and tortuosity, predicted using mesoscale finite… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

  42. arXiv:2104.04874  [pdf, ps, other

    cs.LG stat.ML

    SGD Implicitly Regularizes Generalization Error

    Authors: Daniel A. Roberts

    Abstract: We derive a simple and model-independent formula for the change in the generalization gap due to a gradient descent update. We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient descent acts to regularize generalization error by decorrelating nearby… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: First appeared at the "Workshop on Integration of Deep Learning Theories" at NeurIPS in 2018 and has been available since then at https://research.fb.com/publications/sgd-implicitly-regularizes-generalization-error/

  43. arXiv:2104.00008  [pdf, other

    hep-th cs.AI cs.LG physics.hist-ph stat.ML

    Why is AI hard and Physics simple?

    Authors: Daniel A. Roberts

    Abstract: We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theor… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

    Comments: written for a special issue of Machine Learning: Science and Technology as an invited perspective piece

    Report number: MIT-CTP/5269

  44. Quaternary Hermitian self-dual codes of lengths 26, 32, 36, 38 and 40 from modifications of well-known circulant constructions

    Authors: Adam Michael Roberts

    Abstract: In this work, we give three new techniques for constructing Hermitian self-dual codes over commutative Frobenius rings with a non-trivial involutory automorphism using $λ$-circulant matrices. The new constructions are derived as modifications of various well-known circulant constructions of self-dual codes. Applying these constructions together with the building-up construction, we construct many… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.10354

  45. arXiv:2102.11972  [pdf, other

    cs.LG cs.CL

    Do Transformer Modifications Transfer Across Implementations and Applications?

    Authors: Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

    Abstract: The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we f… ▽ More

    Submitted 10 September, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear at EMNLP 2021 as a conference paper

  46. New binary self-dual codes of lengths 56, 58, 64, 80 and 92 from a modification of the four circulant construction

    Authors: Joe Gildea, Adrian Korban, Adam Michael Roberts

    Abstract: In this work, we give a new technique for constructing self-dual codes over commutative Frobenius rings using $λ$-circulant matrices. The new construction was derived as a modification of the well-known four circulant construction of self-dual codes. Applying this technique together with the building-up construction, we construct singly-even binary self-dual codes of lengths 56, 58, 64, 80 and 92… ▽ More

    Submitted 23 June, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: corrected typos; added references

    MSC Class: 94B05; 15B10; 15B33

  47. arXiv:2102.08380  [pdf, other

    hep-ph cs.LG stat.ML

    Topological Obstructions to Autoencoding

    Authors: Joshua Batson, C. Grace Haaf, Yonatan Kahn, Daniel A. Roberts

    Abstract: Autoencoders have been proposed as a powerful tool for model-independent anomaly detection in high-energy physics. The operating principle is that events which do not belong to the space of training data will be reconstructed poorly, thus flagging them as anomalies. We point out that in a variety of examples of interest, the connection between large reconstruction error and anomalies is not so cle… ▽ More

    Submitted 3 May, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 24 + 20 pages, 26 figures; no autoencoders were harmed in the making of this project. v2: JHEP published version

    Report number: MIT-CTP/5264

    Journal ref: JHEP04(2021)280

  48. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

  49. Quantifying the unknown impact of segmentation uncertainty on image-based simulations

    Authors: Michael C. Krygier, Tyler LaBonte, Carianne Martinez, Chance Norris, Krish Sharma, Lincoln N. Collins, Partha P. Mukherjee, Scott A. Roberts

    Abstract: Image-based simulation, the use of 3D images to calculate physical quantities, fundamentally relies on image segmentation to create the computational geometry. However, this process introduces image segmentation uncertainty because there is a variety of different segmentation tools (both manual and machine-learning-based) that will each produce a unique and valid segmentation. First, we demonstrat… ▽ More

    Submitted 9 September, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Journal ref: Nature Communications 12, 5414 (2021)

  50. arXiv:2012.08919  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism

    Authors: Denisa A. O. Roberts

    Abstract: This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. The goal is building multilingual systems that retrieve in evidence-rich languages to verify claims in evidence-poor languages that are more commonly targeted by disinformation. To this end, our EnmBERT fact verificati… ▽ More

    Submitted 19 January, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: Accepted ECIR 2021