We are excited to share LayerSkip: we speed up LLMs by a novel self-speculative decoding approach where we run earlier layers of a model, and verify/correct using the remaining layers. 🚀 Achieved up to 2.16x speedup on Llama 7B during inference 🔎 Evaluated on different sized Llamas on a large set of natural language and coding tasks 👨🍳 Our training recipe: 🦘Skip later layers stochastically during training so that the model is less reliant on later layers 💨🏃♂️ Rotating early exit loss to make model’s LM head understand embeddings of all layers 👍 No additional heads or weights introduced 📊 Showing results of our training recipe on different scenarios: pretraining from scratch, continual pretraining of existing models, finetuning on domain-specific or task-specific data 🔗 Links: Paper: https://lnkd.in/gdcyY4bn Code: Coming Soon! X Thread: https://lnkd.in/g3gCkwYe Authors: Mostafa Elhoushi*, Akshat Shrivastava*, Diana L., Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge A., Saurabh Agarwal, Ahmed Roman, Ahmed Aly, Beidi Chen, Carole-Jean Wu
Awesome work! Congrats, Mostafa!
Congratulations Mostafa! Great work!
Post-Doctoral Fellow at Dana-Farber Cancer Institute / Harvard Medical School / Broad Institute.
2moGreat Job Mostafa Basha!