Mostafa Elhoushi’s Post

View profile for Mostafa Elhoushi, graphic

Research Engineer, FAIR at Meta

We are excited to share LayerSkip: we speed up LLMs by a novel self-speculative decoding approach where we run earlier layers of a model, and verify/correct using the remaining layers. 🚀 Achieved up to 2.16x speedup on Llama 7B during inference  🔎 Evaluated on different sized Llamas on a large set of natural language and coding tasks 👨🍳 Our training recipe: 🦘Skip later layers stochastically during training so that the model is less reliant on later layers 💨🏃♂️ Rotating early exit loss to make model’s LM head understand embeddings of all layers 👍 No additional heads or weights introduced 📊 Showing results of our training recipe on different scenarios: pretraining from scratch, continual pretraining of existing models, finetuning on domain-specific or task-specific data 🔗 Links: Paper: https://lnkd.in/gdcyY4bn Code: Coming Soon! X Thread: https://lnkd.in/g3gCkwYe Authors: Mostafa Elhoushi*, Akshat Shrivastava*, Diana L., Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge A., Saurabh Agarwal, Ahmed Roman, Ahmed Aly, Beidi Chen, Carole-Jean Wu

  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image
  • No alternative text description for this image
Ahmed Roman

Post-Doctoral Fellow at Dana-Farber Cancer Institute / Harvard Medical School / Broad Institute.

2mo

Great Job Mostafa Basha!

Ali Emara

staff software engineer (ios) at facebook | ex: photoshop @ adobe

2mo

Awesome work! Congrats, Mostafa!

Like
Reply
Medhat O.

Computational Software Engineer @ Schlumberger | Designing Autonomous Well Drilling Solutions

2mo

Congratulations Mostafa! Great work!

See more comments

To view or add a comment, sign in

Explore topics