Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Chinnakonduru, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10855  [pdf, other

    cs.CL cs.AI

    Weighted Grouped Query Attention in Transformers

    Authors: Sai Sena Chinnakonduru, Astarag Mohapatra

    Abstract: The attention mechanism forms the foundational blocks for transformer language models. Recent approaches show that scaling the model achieves human-level performance. However, with increasing demands for scaling and constraints on hardware memory, the inference costs of these models remain high. To reduce the inference time, Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) were propos… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.