Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Smirnov, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12057  [pdf, other

    cs.CL cs.AI

    NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2

    Authors: Tengfei Xue, Xuefeng Li, Roman Smirnov, Tahir Azim, Arash Sadrieh, Babak Pahlavan

    Abstract: Retrieval-augmented generation (RAG) techniques are widely used today to retrieve and present information in a conversational format. This paper presents a set of enhancements to traditional RAG techniques, focusing on large language models (LLMs) fine-tuned and hosted on AWS Trainium and Inferentia2 AI chips via SageMaker. These chips are characterized by their elasticity, affordability, and effi… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    ACM Class: I.2.7