8 C
New York
Wednesday, March 25, 2026

MLCommons Releases MLPerf AI Coaching v5.1 Outcomes


Right this moment, MLCommons introduced new outcomes for the MLPerf Coaching v5.1 benchmark suite, highlighting the fast evolution and rising richness of the AI ecosystem in addition to vital efficiency enhancements from new generations of techniques.

Go right here to view the complete outcomes for MLPerf Coaching v5.1 and discover further details about the benchmarks.

The MLPerf Coaching benchmark suite includes full system checks that stress fashions, software program, and {hardware} for a spread of machine studying (ML) functions. The open-source and peer-reviewed benchmark suite supplies a stage taking part in subject for competitors that drives innovation, efficiency, and vitality effectivity for the complete business.

Model 5.1 set new information for variety of the techniques submitted. Contributors on this spherical of the benchmark submitted 65 distinctive techniques, that includes 12 totally different {hardware} accelerators and quite a lot of software program frameworks. Practically half of the submissions had been multi-node, which is an 86 % enhance from the model 4.1 spherical one yr in the past. The multi-node submissions employed a number of totally different community architectures, many incorporating customized options.

This spherical recorded substantial efficiency enhancements over the model 5.0 outcomes for 2 benchmark checks centered on generative AI eventualities, outpacing the speed of enchancment predicted by Moore’s Regulation.

Relative efficiency enhancements throughout the MLPerf Coaching benchmarks, normalized to the Moore’s Regulation trendline on the cut-off date when every benchmark was launched. (Supply: MLCommons)

“Extra decisions of {hardware} techniques permit prospects to match techniques on state-of-the-art MLPerf benchmarks and make knowledgeable shopping for choices,” mentioned Shriya Rishab, co-chair of the MLPerf Coaching working group. “{Hardware} suppliers are utilizing MLPerf as a technique to showcase their merchandise in multi-node settings with nice scaling effectivity,  and the efficiency enhancements recorded on this spherical exhibit that the colourful innovation within the AI ecosystem is making an enormous distinction.”

The MLPerf Coaching v5.1 spherical contains efficiency outcomes from 20 submitting organizations: AMD, ASUSTeK, Cisco, Datacrunch, Dell, Giga Computing, HPE, Krai, Lambda, Lenovo, MangoBoost, MiTAC, Nebius, NVIDIA, Oracle, Quanta Cloud Expertise, Supermicro, Supermicro + MangoBoost, College of Florida, Wiwynn. “We’d particularly prefer to welcome first-time MLPerf Coaching submitters, Datacrunch, College of Florida, and Wiwynn” mentioned David Kanter, Head of MLPerf at MLCommons.

The sample of submissions additionally reveals an rising emphasis on benchmarks centered on generative AI (genAI) duties, with a 24 % enhance in submissions for the Llama 2 70B LoRA benchmark, and a 15 % enhance for the brand new Llama 3.1 8B benchmark over the take a look at it changed (BERT). “Taken collectively, the elevated submissions to genAI benchmarks and the sizable efficiency enhancements recorded in these checks make it clear that the group is closely centered on genAI eventualities, to some extent on the expense of different potential functions of AI expertise,” mentioned Kanter. “We’re proud to be delivering these sorts of key insights into the place the sector is headed that permit all stakeholders to make extra knowledgeable choices.”

Strong participation by a broad set of business stakeholders strengthens the AI ecosystem as an entire and helps to make sure that the benchmark is serving the group’s wants. We invite submitters and different stakeholders to affix the MLPerf Coaching working group and assist us proceed to evolve the benchmark.

MLPerf Coaching v5.1 Updates 2 Benchmarks

The gathering of checks within the suite is curated to maintain tempo with the sector, with particular person checks added, up to date, or eliminated as deemed obligatory by a panel of consultants from the AI group.

Within the 5.1 benchmark launch, two earlier checks had been changed with new ones that higher symbolize the state-of-the-art expertise options for a similar activity. Particularly: Llama 3.1 8B replaces BERT; and Flux.1 replaces Secure Diffusion v2.

Llama 3.1 8B is a benchmark take a look at for pretraining a big language mannequin (LLM). It belongs to the identical “herd” of fashions because the Llama 3.1 405B benchmark already within the suite, however because it has fewer trainable parameters, it may be run on only a single node and deployed to a broader vary of techniques. This makes the take a look at accessible to a wider vary of potential submitters, whereas remaining a great proxy for the efficiency of bigger clusters. Extra particulars on the Llama 3.1 8B benchmark could be discovered on this white paper https://mlcommons.org/2025/10/training-llama-3-1-8b/.

Flux.1 is a transformer-based text-to-image benchmark. Since Secure Diffusion v2 was launched into the MLPerf Coaching suite in 2023, text-to-image fashions have advanced in two vital methods: they’ve built-in a transformer structure into the diffusion course of, and their parameter counts have grown by an order of magnitude. Flux.1, incorporating a transformer-based 11.9 billion–parameter mannequin, displays the present cutting-edge in generative AI for text-to-image duties.  This white paper https://mlcommons.org/2025/10/training-flux1/ supplies extra info on the Flux.1 benchmark.

“The sector of AI is a transferring goal, continually evolving with new eventualities and capabilities,” mentioned Paul Baumstarck, co-chair of the MLPerf Coaching working group. “We are going to proceed to evolve the MLPerf Coaching benchmark suite to make sure that we’re measuring what’s vital to the group, each at this time and tomorrow.”



Related Articles

Latest Articles