Meta should have made it clearer that “Llama-4-Maverick-03-26-Experimental” was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.
https://twitter.com/lmarena_ai/status/1909397817434816562
https://twitter.com/lmarena_ai/status/1909397817434816562
X (formerly Twitter)
lmarena.ai (formerly lmsys.org) (@lmarena_ai) on X
We've seen questions from the community about the latest release of Llama-4 on Arena. To ensure full transparency, we're releasing 2,000+ head-to-head battle results for public review. This includes user prompts, model responses, and user preferences. (link…