Mostafa Elhoushi on LinkedIn: We are excited to share LayerSkip: we speed up LLMs by a novel…

Research Engineer, FAIR at Meta

2mo Edited

We are excited to share LayerSkip: we speed up LLMs by a novel self-speculative decoding approach where we run earlier layers of a model, and verify/correct using the remaining layers. 🚀 Achieved up to 2.16x speedup on Llama 7B during inference 🔎 Evaluated on different sized Llamas on a large set of natural language and coding tasks 👨🍳 Our training recipe: 🦘Skip later layers stochastically during training so that the model is less reliant on later layers 💨🏃♂️ Rotating early exit loss to make model’s LM head understand embeddings of all layers 👍 No additional heads or weights introduced 📊 Showing results of our training recipe on different scenarios: pretraining from scratch, continual pretraining of existing models, finetuning on domain-specific or task-specific data 🔗 Links: Paper: https://lnkd.in/gdcyY4bn Code: Coming Soon! X Thread: https://lnkd.in/g3gCkwYe Authors: Mostafa Elhoushi*, Akshat Shrivastava*, Diana L., Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge A., Saurabh Agarwal, Ahmed Roman, Ahmed Aly, Beidi Chen, Carole-Jean Wu

4 Comments

Ahmed Roman

Post-Doctoral Fellow at Dana-Farber Cancer Institute / Harvard Medical School / Broad Institute.

2mo

Great Job Mostafa Basha!

1 Reaction

Ali Emara

staff software engineer (ios) at facebook | ex: photoshop @ adobe

2mo

Awesome work! Congrats, Mostafa!

Medhat O.

Computational Software Engineer @ Schlumberger | Designing Autonomous Well Drilling Solutions

Mostafa Elhoushi’s Post

Explore topics