Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Nebabu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.02579  [pdf, other

    cond-mat.dis-nn cs.LG

    Geometric Dynamics of Signal Propagation Predict Trainability of Transformers

    Authors: Aditya Cowsik, Tamra Nebabu, Xiao-Liang Qi, Surya Ganguli

    Abstract: We investigate forward signal propagation and gradient back propagation in deep, randomly initialized transformers, yielding simple necessary and sufficient conditions on initialization hyperparameters that ensure trainability of deep transformers. Our approach treats the evolution of the representations of $n$ tokens as they propagate through the transformer layers in terms of a discrete time dyn… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.