Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Leahy, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.12312  [pdf, other

    cs.LG cs.AI

    Interpreting Neural Networks through the Polytope Lens

    Authors: Sid Black, Lee Sharkey, Leo Grinsztajn, Eric Winsor, Dan Braun, Jacob Merizian, Kip Parker, Carlos Ramón Guevara, Beren Millidge, Gabriel Alfour, Connor Leahy

    Abstract: Mechanistic interpretability aims to explain what a neural network has learned at a nuts-and-bolts level. What are the fundamental primitives of neural network representations? Previous mechanistic descriptions have used individual neurons or their linear combinations to understand the representations a network has learned. But there are clues that neurons and their linear combinations are not the… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: 22/11/22 initial upload

  2. arXiv:2204.06745  [pdf, other

    cs.CL

    GPT-NeoX-20B: An Open-Source Autoregressive Language Model

    Authors: Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

    Abstract: We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and trainin… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: To appear in the Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models

  3. arXiv:2101.00027  [pdf, other

    cs.CL

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    Authors: Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy

    Abstract: Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets -- both existing and new… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.