Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Wanderman-Milne, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2011.03641  [pdf, other

    cs.LG cs.DC

    Exploring the limits of Concurrency in ML Training on Google TPUs

    Authors: Sameer Kumar, James Bradbury, Cliff Young, Yu Emma Wang, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing

    Abstract: Recent results in language understanding using neural networks have required training hardware of unprecedentedscale, with thousands of chips cooperating on a single training run. This paper presents techniques to scaleML models on the Google TPU Multipod, a mesh with 4096 TPU-v3 chips. We discuss model parallelism toovercome scaling limitations from the fixed batch size in data parallelism, commu… ▽ More

    Submitted 15 March, 2021; v1 submitted 6 November, 2020; originally announced November 2020.