MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services. They’re all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent the state of the art in AI.
MLPerf Inference v4.1 measures inference performance on nine different benchmarks, including several large language models (LLMs), text-to-image, natural language processing, recommenders, computer vision, and medical image segmentation.
MLPerf Training v4.0 measures training performance on nine different benchmarks, including LLM pre-training, LLM fine-tuning, text-to-image, graph neural network (GNN), computer vision, medical image segmentation, and recommendation.
MLPerf HPC v3.0 measures training performance across four different scientific computing use cases, including climate atmospheric river identification, cosmology parameter prediction, quantum molecular modeling, and protein structure prediction.
The NVIDIA accelerated computing platform, powered by NVIDIA HopperTM GPUs and NVIDIA Quantum-2 InfiniBand networking, delivered the highest performance on every benchmark in MLPerf Training v4.0. On the LLM benchmark, NVIDIA more than tripled performance in just one year, through a record submission scale of 11,616 H100 GPUs and software optimizations. NVIDIA also delivered 1.8X more performance on the text-to-image benchmark in just seven months. And, on the newly-added LLM fine-tuning and graph neural network benchmarks, NVIDIA set the bar. NVIDIA achieved these exceptional results through relentless full-stack engineering at data center scale.
MLPerf™ Training v3.1 and v4.0 results retrieved from www.mlperf.org. on June 12, 2024, from the following entries: NVIDIA + CoreWeave 3.0-2003, NVIDIA 4.0-0007. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
The NVIDIA platform continues to demonstrate unmatched performance and versatility in MLPerf Training v4.0. NVIDIA delivered the highest performance on all nine benchmarks, and set new records on the following benchmarks: LLM, LLM fine-tuning, text-to-image, graph neural network, and object detection (light weight).
Benchmark | Time to Train |
---|---|
LLM (GPT-3 175B) | 3.4 minutes |
LLM Fine-Tuning (Llama 2 70B-LoRA) | 1.5 minutes |
Text-to-Image (Stable Diffusion v2) | 1.4 minutes |
Graph Neural Network (R-GAT) | 1.1 minutes |
Recommender (DLRM-DCNv2) | 1.0 minutes |
Natural Language Processing (BERT) | 0.1 minutes |
Image Classification (ResNet-50 v1.5) | 0.2 minutes |
Object Detection (RetinaNet) | 0.8 minutes |
Biomedical Image Segmentation (3D U-Net) | 0.8 minutes |
MLPerf™ Training v4.0 results retrieved from www.mlperf.org on June 12, 2024, from the following entries: NVIDIA 4.0-0058, NVIDIA 4.0-0053, NVIDIA 4.0-0007, NVIDIA 4.0-0054, NVIDIA 4.0-0053, NVIDIA + CoreWeave 4.0-0008, NVIDIA 4.0-0057, NVIDIA 4.0-0056, NVIDIA 4.0-0067. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
In its MLPerf Inference debut, the NVIDIA Blackwell platform with the NVIDIA Quasar Quantization System delivered up to 4X higher LLM performance compared to the prior generation H100 Tensor Core GPU. Among available solutions, the NVIDIA H200 Tensor Core GPU, based on the NVIDIA Hopper architecture, delivered the highest performance per GPU for generative AI, including on all three LLM benchmarks, which included Llama 2 70B, GPT-J and the newly added mixture-of-experts LLM, Mixtral 8x7B, as well as on the Stable Diffusion XL text-to-image benchmark. Through relentless software optimization, H200’s performance increased by up to 27 percent in less than six months. For generative AI at the edge, NVIDIA Jetson Orin™ delivered outstanding results, boosting GPT-J throughput by more than 6X and reducing latency by 2.4X just in one round.
MLPerf Inference v4.1 Closed, Data Center. Results retrieved from www.mlperf.org on August 28, 2024. Blackwell results measured on single GPU and retrieved from entry 4.1-0074 in the Closed, Preview category. H100 results from entry 4.1-0043 in the Closed, Available category on an 8x H100 system and divided by GPU count for per-GPU comparison. Per-GPU throughput is not a primary metric of MLPerf Inference. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
Benchmark | Offline | Server |
---|---|---|
Llama 2 70B | 34,864 tokens/second | 32,790 tokens/second |
Mixtral 8x7B | 59,022 tokens/second | 57,177 tokens/second |
GPT-J | 20,086 tokens/second | 19,243 tokens/second |
Stable Diffusion XL | 17.42 samples/second | 16.78 queries/second |
DLRMv2 99% | 637,342 samples/second | 585,202 queries/second |
DLRMv2 99.9% | 390,953 samples/second | 370,083 queries/second |
BERT 99% | 73,310 samples/second | 57,609 queries/second |
BERT 99.9% | 63,950 samples/second | 51,212 queries/second |
RetinaNet | 14,439 samples/second | 13,604 queries/second |
ResNet-50 v1.5 | 756,960 samples/second | 632,229 queries/second |
3D U-Net | 54.71 samples/second | Not part of benchmark |
MLPerf Inference v4.1 Closed, Data Center. Results retrieved from www.mlperf.org on August 28, 2024. All results using eight GPUs and retrieved from the following entries: 4.1-0046, 4.1-0048, 4.1-0050. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
The NVIDIA H100 Tensor Core supercharged the NVIDIA platform for HPC and AI in its MLPerf HPC v3.0 debut, enabling up to 16X faster time to train in just three years and delivering the highest performance on all workloads across both time-to-train and throughput metrics. The NVIDIA platform was also the only one to submit results for every MLPerf HPC workload, which span climate segmentation, cosmology parameter prediction, quantum molecular modeling, and the latest addition, protein structure prediction. The unmatched performance and versatility of the NVIDIA platform makes it the instrument of choice to power the next wave of AI-powered scientific discovery.
NVIDIA Full-Stack Innovation Fuels Performance Gains
MLPerf™ HPC v3.0 Results retrieved from www.mlperf.org on November 8, 2023. Results retrieved from entries 0.7-406, 0.7-407, 1.0-1115, 1.0-1120, 1.0-1122, 2.0-8005, 2.0-8006, 3.0-8006, 3.0-8007, 3.0-8008. CosmoFlow score in v1.0 is normalized to new RCPs introduced in MLPerf HPC v2.0. Scores for v0.7, v1.0, and v2.0 are adjusted to remove data staging time from the benchmark, consistent with new rules adopted for v3.0 to enable fair comparisons between the submission rounds. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
MLPerf™ HPC v3.0 Results retrieved from www.mlperf.org on November 8, 2023. Results retrieved from entries 3.0-8004, 3.0-8009, and 3.0-8010. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
The complexity of AI demands a tight integration between all aspects of the platform. As demonstrated in MLPerf’s benchmarks, the NVIDIA AI platform delivers leadership performance with the world’s most advanced GPU, powerful and scalable interconnect technologies, and cutting-edge software—an end-to-end solution that can be deployed in the data center, in the cloud, or at the edge with amazing results.
An essential component of NVIDIA’s platform and MLPerf training and inference results, the NGC™ catalog is a hub for GPU-optimized AI, HPC, and data analytics software that simplifies and accelerates end-to-end workflows. With over 150 enterprise-grade containers—including workloads for generative AI, conversational AI, and recommender systems; hundreds of AI models; and industry-specific SDKs that can be deployed on premises, in the cloud, or at the edge—NGC enables data scientists, researchers, and developers to build best-in-class solutions, gather insights, and deliver business value faster than ever.
Achieving world-leading results across training and inference requires infrastructure that’s purpose-built for the world’s most complex AI challenges. The NVIDIA AI platform delivered leading performance powered by the NVIDIA Blackwell platform, the Hopper platform, NVLink™, NVSwitch™, and Quantum InfiniBand. These are at the heart of the NVIDIA data center platform, the engine behind our benchmark performance.
In addition, NVIDIA DGX™ systems offer the scalability, rapid deployment, and incredible compute power that enable every enterprise to build leadership-class AI infrastructure.
NVIDIA Jetson Orin offers unparalleled AI compute, large unified memory, and comprehensive software stacks, delivering superior energy efficiency to drive the latest generative AI applications. It’s capable of fast inference for any generative AI models powered by the transformer architecture, providing superior edge performance on MLPerf.
Learn more about our data center training and inference performance.