Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Andersch, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.11068  [pdf, other

    cs.LG cs.AI cs.DC q-bio.QM

    ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

    Authors: Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch

    Abstract: AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute res… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2205.05198  [pdf, other

    cs.LG cs.CL

    Reducing Activation Recomputation in Large Transformer Models

    Authors: Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. Activation recomputation is commonly used to work around memory capacity constraints. Rather than storing activations for backpropagation, they are traditionally recomp… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  3. arXiv:1804.10223  [pdf, other

    cs.NE cs.DC cs.LG

    Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

    Authors: Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie

    Abstract: Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network. Following recent work in simplifying these networks with model pruning and a novel mapping of work onto GPUs, we design an efficient implementation for sparse RNNs. We investigate several optimizations and tradeoffs: Lamport timest… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: Published as a conference paper at ICLR 2018