Use Llama models

Llama is a collection of open models developed by Meta that you can fine-tune and deploy on Vertex AI. Llama offers pre-trained and instruction-tuned generative text and multimodal models for assistant-like chat. You can deploy Llama 3.2, Llama 3.1, Llama 3, and Llama 2 models on Vertex AI.

Llama 3.2

Llama 3.2 enables developers to build and deploy the latest generative AI models and applications that use Llama's capabilities to ignite new innovations, such as image reasoning. Llama 3.2 is also designed to be more accessible for on-device applications. The following list highlights Llama 3.2 features:

Offers a more private and personalized AI experience, with on-device processing for smaller models.
Offers models that are designed to be more efficient, with reduced latency and improved performance, making them suitable for a wide range of applications.
Built on top of the Llama Stack, which makes building and deploying applications easier. Llama Stack is a standardized interface for building canonical toolchain components and agentic applications.
Supports vision tasks, with a new model architecture that integrates image encoder representations into the language model.

The 1B and 3B models are lightweight text-only models that support on-device use cases such as multilingual local knowledge retrieval, summarization, and rewriting.

Llama 11B and 90B models are small and medium-sized multimodal models with image reasoning. For example, they can analyze visual data from charts to provide more accurate responses and extract details from images to generate text descriptions.

For more information, see the Llama 3.2 model card in Model Garden.

Considerations

When using the 11B and 90B, there are no restriction when you send text-only prompts. However, if you include an image in your prompt, the image must be at beginning of your prompt, and you can include only one image. You cannot, for example, include some text and then an image.

Llama 3.1

Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pre-trained and instruction-tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text-only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

For more information, see the Llama 3.1 model card in Model Garden.

Llama 3

The Llama 3 instruction-tuned models are a collection of LLMs optimized for dialogue use cases. Llama 3 models outperform many of the available open source chat models on common industry benchmarks.

For more information, see the Llama 3 model card in Model Garden.

Llama 2

The Llama 2 LLMs is a collection of pre-trained and fine-tuned generative text models, ranging in size from 7B to 70B parameters.

For more information, see the Llama 2 model card in Model Garden.

Code Llama

Meta's Code Llama models are designed for code synthesis, understanding, and instruction.

For more information, see the Code Llama model card in Model Garden.

Llama Guard 3

Llama Guard 3 builds on the capabilities of Llama Guard 2, adding three new categories: Defamation, Elections, and Code Interpreter Abuse. Additionally, this model is multilingual and has a prompt format that is consistent with Llama 3 or later instruct models.

For more information, see the Llama Guard model card in Model Garden.

Resources

For more information about Model Garden, see Explore AI models in Model Garden.